GPU server installation driver, cuda, cudnn and tensorflow

System version compatibility requirements

centos7.2 cuda9.0 cudnn7.4
centos7.5 cuda9.2 cudnn7.4

Install gcc

yum -y install gcc gcc-c++ kernel-devel

package manage-overview
https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html#package-manager-overview

1. Install gpu graphics card driver

View nvidia gpu information

# nvidia-smi

2. Install nvidia detection

2.1 Add ElRepo source

# rpm --import https://www.elrepo.org/RPM-GPG-KEY-elrepo.org
# rpm --import https://www.elrepo.org/RPM-GPG-KEY-elrepo.org

# rpm -Uvh https://www.elrepo.org/elrepo-release-7.0-3.el7.elrepo.noarch.rpm

2.2. Install graphics card driver and check

yum install nvidia-detect

2.3 Operation

# nvidia-detect -v
Probing for supported NVIDIA devices...
[10de:15f8] NVIDIA Corporation Device 15f8
This device requires the current 410.78 NVIDIA driver kmod-nvidia
[10de:15f8] NVIDIA Corporation Device 15f8
This device requires the current 410.78 NVIDIA driver kmod-nvidia
[102b:0538] Matrox Electronics Systems Ltd. Device 0538

2.4. Edit grub files
vim /etc/default/grub
Add in “GRUB_CMDLINE_LINUX”

rd.driver.blacklist=nouveau nouveau.modeset=0

The modified files are as follows:

GRUB_TIMEOUT=5
GRUB_DISTRIBUTOR="$(sed 's, release .*$,,g' /etc/system-release)"
GRUB_DEFAULT=saved
GRUB_DISABLE_SUBMENU=true
GRUB_TERMINAL_OUTPUT="console"
GRUB_CMDLINE_LINUX="crashkernel=auto rd.lvm.lv=centos/root rd.lvm.lv=centos/swap rd.driver.blacklist=nouveau nouveau.modeset=0 rhgb quiet"
GRUB_DISABLE_RECOVERY="true"

Then generate the configuration

grub2-mkconfig -o /boot/grub2/grub.cfg

2.5. Create blacklist

vim /etc/modprobe.d/blacklist.conf

Add to

blacklist nouveau

2.6. Update configuration

mv /boot/initramfs-$(uname -r).img /boot/initramfs-$(uname -r)-nouveau.img
dracut /boot/initramfs-$(uname -r).img $(uname -r)

2.7. Restart

reboot

2.8. Confirm that nouveau is disabled

lsmod | grep nouveau

If there is no output, the disablement is successful.
3. Install cuda
cuda download address:

https://developer.nvidia.com/cuda-toolkit

# sh cuda_9.0.176_384.81_linux.run

If you appear to be running an x server please exit x before installing
Execute init 3 to enter the command line mode, kill the x server, and then execute the installation command

===========
= Summary =
===========
Driver: Installed
Toolkit: Installed in /usr/local/cuda-9.0
Samples: Installed in /root, but missing recommended libraries

Please make sure that
 - PATH includes /usr/local/cuda-9.0/bin
 - LD_LIBRARY_PATH includes /usr/local/cuda-9.0/lib64, or, add /usr/local/cuda-9.0/lib64 to /etc/ld.so.conf and run ldconfig as root

To uninstall the CUDA Toolkit, run the uninstall script in /usr/local/cuda-9.0/bin
To uninstall the NVIDIA Driver, run nvidia-uninstall

Please see CUDA_Installation_Guide_Linux.pdf in /usr/local/cuda-9.0/doc/pdf for detailed information on setting up CUDA.

Logfile is /tmp/cuda_install_7874.log

Verify whether CUDA 9.0 is installed successfully
Terminal input:

nvcc -V

You can see the version information of cuda

Then try to run the example that comes with cuda:

cd /usr/local/cuda-9.0/samples/1_Utilities/deviceQuery
make
./deviceQuery

You can see that the output is successful

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 10.0, CUDA Runtime Version = 10.0, NumDevs = 2
Result = PASS

uninstall

To uninstall the CUDA Toolkit, run the uninstall script in /usr/local/cuda-9.0/bin
To uninstall the NVIDIA Driver, run nvidia-uninstall

3. Install cudnnv7

https://docs.nvidia.com/deeplearning/sdk/cudnn-install/index.html

After the download is complete, unzip it to the Cuda directory and execute the following commands in sequence:

tar -xzvf cudnn-9.0-linux-x64-v7.4.1.5.tgz
sudo cp cuda/include/cudnn.h /usr/local/cuda/include
sudo cp cuda/lib64/libcudnn* /usr/local/cuda/lib64
sudo chmod a + r /usr/local/cuda/include/cudnn.h /usr/local/cuda/lib64/libcudnn*

Just run a small Demo.

If the Examples and User Guide package is installed, we can find the mnistCUDNN small example located at /usr/src/cudnn_samples_v7.
Copy it to any folder in your home/yourdir

$cp -r /usr/src/cudnn_samples_v7/ $HOME

Enter mnistCUDNN

$ cd $HOME/cudnn_samples_v7/mnistCUDNN

compile

$make clean & amp; & amp; make

run

$ ./mnistCUDNN

If the installation is successful, you will see the result like this

Test passed!
In fact, you can also cmake your caffe/build, and you can quickly test whether the installation is successful.

13. Install the gpu version of TensorFlow (configure the accelerator first)

$ sudo pip install tensorflow-gpu

The root user creates a new .pip directory in the root directory and creates the file pip.conf (/root/.pip/pip.conf) in the directory. The configuration content is as follows. The Tsinghua source used here is quite fast:

[global]
index-url=https://pypi.tuna.tsinghua.edu.cn/simple

The configuration is complete. No operations are required. You can install any desired tools directly through pip install. Let’s compare again (the screenshot immediately after entering pip install tensorflow is as shown in the figure below).

14. Test TensorFlow
After going through the obstacles ahead, we finally reached the test step. Isn’t it very happy?

[root@gpuserver ~]# python
Python 2.7.5 (default, Nov 20 2015, 02:00:19)
[GCC 4.8.5 20150623 (Red Hat 4.8.5-4)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow as tf
>>> hello = tf.constant('Hello, TensorFlow!')
>>> sess = tf.Session()
2018-12-12 17:10:51.572488: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 AVX512F FMA
>>> sess = tf.Session()
>>> print(sess.run(hello))
Hello, TensorFlow!
>>>

If you can run the above small example correctly, then congratulations, the GPU version of TensorFlow has been installed successfully. What are you waiting for? Let’s build it quickly!

centos7.2 installation pip

yum install -y epel-release
yum install -y python-pip

6. Install kernel-devel

yum -y install kernel-devel

centos7.2 configuration graphical interface startup

# systemctl get-default
multi-user.target
# systemctl set-default graphical.target

appendix:
1. cuda installation process record

Installing the NVIDIA display driver...
Installing the CUDA Toolkit in /usr/local/cuda-10.0 ...
Missing recommended library: libGLU.so
Missing recommended library: libX11.so
Missing recommended library: libXi.so
Missing recommended library: libXmu.so

Installing the CUDA Samples in /root ...
Copying samples to /root/NVIDIA_CUDA-10.0_Samples now...
Finished copying samples.

===========
= Summary =
===========

Driver: Installed
Toolkit: Installed in /usr/local/cuda-10.0
Samples: Installed in /root, but missing recommended libraries

Please make sure that
 - PATH includes /usr/local/cuda-10.0/bin
 - LD_LIBRARY_PATH includes /usr/local/cuda-10.0/lib64, or, add /usr/local/cuda-10.0/lib64 to /etc/ld.so.conf and run ldconfig as root

To uninstall the CUDA Toolkit, run the uninstall script in /usr/local/cuda-10.0/bin
To uninstall the NVIDIA Driver, run nvidia-uninstall

Please see CUDA_Installation_Guide_Linux.pdf in /usr/local/cuda-10.0/doc/pdf for detailed information on setting up CUDA.

Logfile is /tmp/cuda_install_16878.log