Kenneth Steimel

Student of Computational Linguistics and High Performance Computing

View the Project on GitHub

Building an AMD Deep Learning Machine - Software Stack

Operating System

I used Ubuntu 19.04 partially because I wanted to try out the April release of Ubuntu and I knew that the newer kernels were more compatible with Vega (the amdgpu driver is merged into the kernel after 4.19 which reduces installation headaches) and the Ryzen CPU.

A note about docker

If you are not a fan of docker, for security or whatever reason, I don’t advise that you use Ubuntu 19.04. This release has only python 3.7 and there are, at the moment, a few issues with running rocm 2.3 with python 3.7. This doesn’t seem to be a problem on python 3.5 or 3.6 and with older versions of ROCm. However, the performance improvement with the newer version of ROCm is pretty substantial so I would use a version of Ubuntu where you can downgrade your python version instead.

Initial Software Stack

First, the debian repository has to be added

wget -qO - http://repo.radeon.com/rocm/apt/debian/rocm.gpg.key | sudo apt-key add -
echo 'deb [arch=amd64] http://repo.radeon.com/rocm/apt/debian/ xenial main' | sudo tee /etc/apt/sources.list.d/rocm.list

Then, the appropriate packages are installed.

sudo apt update
sudo apt install rocm-libs miopen-hip cxlactivitylogger
sudo apt install rocm-dev

Because we are using the amdgpu drivers in kernel 5.0 that ships with Ubuntu 19.04, we need to add the following udev rule.

echo 'SUBSYSTEM=="kfd", KERNEL=="kfd", TAG+="uaccess", GROUP="video"' | sudo tee /etc/udev/rules.d/70-kfd.rules

Docker install

I added the following line into my ~/.bash_rc file to allow for quick launching of the container:

alias drun='sudo docker run -it --network=host \
--device=/dev/kfd \
--device=/dev/dri \
--group-add video \
--cap-add=SYS_PTRACE \
--security-opt \
seccomp=unconfined \
-v $HOME/dockerx:/dockerx'

To launch the container, you then simply run drun rocm/tensorflow to drop into your container. The first time you run this, it will pull the images from dockerhub. After that, it will use the cached image.

Shot of full build