Install cuda, cudnn and cupy packages on the Linux server, and reinstall cuda, cudnn and other packages

Updated on December 7, 2023

Install the cuda tool package and cuDnn library on the server, as well as install the cupy package

This article is also very good, you can also refer to it

https://blog.csdn.net/qq_33200967/article/details/80689543

(If you want to uninstall the original graphics driver and cuda, please see the next chapter)

Reference article:

https://blog.csdn.net/weixin_43677710/article/details/131813795?spm=1001.2101.3001.6650.1 & amp;utm_medium=distribute.pc_relevant.none-task-blog-2~default~CTRLIST~Rate-1- 131813795-blog-112294463.235^v38^pc_relevant_sort &depth_1-utm_source=distribute.pc_relevant.none-task-blog-2~default~CTRLIST~Rate-1-131813795-blog-112294463.235^v38^pc_relevant_sort &ut m_relevant_index= 2

Before installation, enter nvidia-smi to see the supported version of cuda, and then decide how many versions of cudatoolkit to install. The version of the toolkit should be lower than the highest supported version of the graphics card driver

https://developer.nvidia.com/cuda-downloads. If you don’t want to download to the local machine and then upload to the server, you can use the wget command. Sometimes it may not be connected because of the external network. You can add –no-proxy after wget.

Continue to see what version of cudnn library to install https://developer.nvidia.com/rdp/cudnn-archive

After installing cudatookit, enter nvcc –version to see if there is any output. If there is, it means the installation is complete. If not, go to bashrc and modify the environment variables.

Similar to the following

export PATH=$PATH:/usr/local/cuda-11.7/bin

export LD_LIBRARY_PATH=/usr/local/cuda-11.7/lib64KaTeX parse error: Expected ‘}’, got ‘EOF’ at end of input: …LIBRARY_PATH: + :{LD_LIBRARY_PATH}}

When installing the cudnn library, you need to unzip it first.

tar -xf cudnn-linux-x86_64-8.8.0.121_cuda11-archive.tar.xz

After unzipping, you need to copy the cuDNN file to the CUDA installation directory. Typically, the installation path for CUDA is /usr/local/cuda/. You can use the following command to copy the file to the correct location:

sudo cp The package name extracted in the previous step/include/cudnn*.h /usr/local/cuda (or cuda-plus version number)/include
sudo cp The package name decompressed in the previous step/lib64/libcudnn* /usr/local/cuda (or cuda-plus version number)/lib64

Set file permissions (optional): You can set appropriate file permissions for copied files to ensure that they can be read and executed. Use the chmod command to set file permissions. For example:

sudo chmod a + r /usr/local/cuda/include/cudnn*.h
sudo chmod a + r /usr/local/cuda/lib64/libcudnn*

Then go to this website to find the cupy package that matches the cuda version.

https://pypi.org/search/?q=cupy &page=1

The default command pip install cupy will get stuck and cannot be installed for a long time.

If you want to update the graphics card driver and uninstall the original cuda

1. First uninstall the previous lower version of the driver.

Enter in the terminal: sudo apt-get purge nvidia*

Then enter: nvidia-smi

If no information is displayed, the uninstallation is successful. Go to the next step.

2. Download the driver according to the graphics card model at https://www.nvidia.cn/Download/Find.aspx?lang=cn & amp;QNF=1
Then open a terminal in the download directory and enter:

sudo sh NVIDIA-Linux-XX.run

Close xserver first, otherwise the installation may fail.
systemctl stop gdm.service

After the installation is complete, start X Server.
systemctl start gdm.service

I encountered an error

An NVIDIA kernel module nvidia-uvm’ appears to already be loaded in your kernel. This may be because it is in use

First

sudo service lightdm stop
sudo stop nvidia-digits-server
sudo service docker stop
sudo rmmod nvidia-uvm

Again

sudo lsof -n -w /dev/nvidia*
sudo kill -9 PID

When killing the process above, kill all the detected processes.

sudo sh cuda_12.1.1_530.30.02_linux.run

Now it prompts that the old package manager is still there

The tips are as follows:

Existing package manager installation of the driver found. It is strongly │ │ recommended that you remove this before continuing

implement

First

sudo apt-get --purge remove "*cuda*" "*cublas*" "*cufft*" "*cufile*" "*curand*" \
 "*cusolver*" "*cusparse*" "*gds-tools*" "*npp*" "*nvjpeg*" "nsight*" "*nvvm*"

Again

sudo apt-get autoremove

sudo apt-get autoremove is a command used to automatically remove no longer needed software packages and their related dependencies in Ubuntu or Debian-based Linux systems.

When you install a package using the apt-get command, it automatically resolves the package’s dependencies and installs other required packages. However, sometimes you may uninstall a package while its dependencies remain on the system.

The autoremove command is used to check for software packages and their related dependencies that are no longer needed in the system and automatically delete them. These no longer needed packages are usually caused by you uninstalling a package but its dependencies are still there, or you updated the package so that the old version of the dependencies is no longer needed.

By running the sudo apt-get autoremove command, the system analyzes and removes useless packages and dependencies to free up disk space and keep the system clean.

Please note that you should be careful when using the autoremove command and make sure you know the package that will be removed and its associated dependencies to avoid accidentally removing other required packages.

Uncheck this option during installation

After installation, the prompt is as follows: Environment variables need to be set.

Please make sure that

  • PATH includes /usr/local/cuda-12.1/bin
  • LD_LIBRARY_PATH includes /usr/local/cuda-12.1/lib64, or, add /usr/local/cuda-12.1/lib64 to /etc/ld.so.conf and run ldconfig as root

export PATH=$PATH:/usr/local/cuda-11.7/bin

export LD_LIBRARY_PATH=/usr/local/cuda-11.7/lib64KaTeX parse error: Expected ‘}’, got ‘EOF’ at end of input: …LIBRARY_PATH: + :{LD_LIBRARY_PATH}}

Then the source can be updated

——————-Dividing line————————-
The following is the version I wrote before. It would be a pity to delete it. Let’s keep it.

The following is a simplified version for reference.
URL needed:
https://developer.nvidia.com/cuda-downloads
https://developer.nvidia.com/rdp/cudnn-archive

Before installation, enter nvidia-smi to see the version of the hardware cuda, then decide how many versions of cudatoolkit to install, and then continue to see what version of the cudnn library to install.

After installing cudatookit, enter nvcc –version to see if there is any output. If there is, it means the installation is complete. If not, go to bashrc and modify the environment variables.

Need to execute when installing cudnn library

tar -xf cudnn-linux-x86_64-8.8.0.121_cuda11-archive.tar.xz

After unzipping, you need to copy the cuDNN file to the CUDA installation directory. Typically, the installation path for CUDA is /usr/local/cuda/. You can use the following command to copy the file to the correct location:

sudo cp The package name extracted in the previous step/include/cudnn*.h /usr/local/cuda/include
sudo cp The package name extracted in the previous step /lib64/libcudnn* /usr/local/cuda/lib64

Set file permissions (optional): You can set appropriate file permissions for copied files to ensure that they can be read and executed. Use the chmod command to set file permissions. For example:

sudo chmod a + r /usr/local/cuda/include/cudnn*.h
sudo chmod a + r /usr/local/cuda/lib64/libcudnn*

The default command pip install cupy will get stuck and cannot be installed for a long time, so give up this method.
Come to this website to find the cupy package that matches the cuda version and the corresponding commands.
https://pypi.org/search/?q=cupy &page=1