Software testing/test development丨UbuntuServer environment preparation

Click here for more relevant information

Premise

The existing device is a combination of i5 + 4090, with Ubuntu 22.04.3 LTS Server version installed. The following installation steps are based on this system and configuration.

System preparation

Check whether gcc is installed

Execute gcc -v in the command line
If the normal input is as shown in the figure, it means that the software required for the environment has been successfully installed.

If the output content is the following text, it means that the environment is missing
Command 'gcc' not found, but can be installed with:apt install gcc
Installation command: sudo apt-get install build-essential
If the download is very slow, you can modify the apt source to a domestic source. Tsinghua source is very fast. Tsinghua source configuration tutorial.

Disable Ubuntu’s original graphics driver (to avoid conflicts with Nvidia graphics drivers)****

Create or edit the /etc/modprobe.d/blacklist-nouveau.conf file and add the following code to it.

blacklist nouveau
blacklist lbm-nouveau
options nouveau modeset=0
alias nouveau off
alias lbm-nouveau off

After saving, execute the following command on the command line to turn off the system’s own graphics driver to avoid conflicts with Nvidia’s graphics driver.

echo options nouveau modeset=0 | sudo tee -a /etc/modprobe.d/nouveau-kms.conf

Restart the kernel and verify whether the shutdown is successful

sudo update-initramfs -u
sudo reboot

After restarting, execute the command lsmod | grep nouveau. If there is no screen output, it means nouveau is successfully disabled.

Install graphics card driver

Find the appropriate driver version

Execute the command ubuntu-drivers devices and find the appropriate version based on the matching driver displayed by the command, as shown below

Since the server version system is installed on this machine, the server version driver is used.

Install driver

Execute the command sudo apt install nvidia-driver-535-server to install the driver
After the installation is completed, execute the nvidia-smi command to view the GPU data interface normally, indicating that the installation is successful, as shown below

Install CUDA

Find the appropriate version (it is not recommended to install the latest version of CUDA, pay attention to pytorch support)

First, find the highest compatible CUDA version based on the output of the command nvidia-smi in the previous step. Then, based on the CUDA support information on the Pytorch official website, determine that the CUDA version installed this time is 11.8. At this stage, pytorch Support for CUDA is mainly in versions 11.7 and 11.8. The higher version 12.1 is a preview version (nightly). For stability, it is recommended to use 11.8 first, or find a suitable version according to the graphics card model and the pytorch official website.

Download CUDA

Find the corresponding version of CUDA at NVIDIA’s official developer download address. After selecting according to the system version, kernel and other information, you can directly get the installation file or installation command. In this example, runfile is used to install.

wget https://developer.download.nvidia.com/compute/cuda/11.8.0/local_installers/cuda_11.8.0_520.61.05_linux.run

Note: Due to the large size of the CUDA file, it will occupy a large capacity under /tmp (about 5G +) during installation. It may cause an error to be launched due to insufficient capacity in /tmp. This can be done in /etc A new /tmp mounting method is added to /fstab to temporarily mount part of the memory to the /tmp directory to modify the capacity to avoid installation failure.

# Find the original /tmp mounting settings and comment it out by adding # in front of it
# /dev/disk/by-uuid/fba367b1-46bd-4b28-b769-5db19dce6129 /tmp ext4 defaults 0 1
# Add a new line to mount /tmp into memory. Confirm the memory size in advance to avoid insufficient memory. The new /tmp size is set in size.
tmpfs /tmp tmpfs nodev,nosuid,size=10G 0 0

After modification, you need to restart the machine, and then use the df -h command to check whether the size of the /tmp directory has changed.

Install CUDA

After setting the size of the /tmp folder, switch to the folder where you downloaded the run file before, and execute the command to install CUDA

sudo sh cuda_11.8.0_520.61.05_linux.run

Because the graphics card driver has been installed before, additional prompts will appear when installing CUDA. It is recommended to uninstall the driver before continuing the installation. Just select Continue here.

The next step is a user agreement, enter accept.

Notice! The last step is to select the content to be installed. When prompted that the driver already exists, we did not process the driver additionally, so in this step you must select the Driver option and use Enter to remove the selected X to avoid exceptions caused by repeated driver installations.

You can then select Install for the final installation.

Configuring environment variables

After the installation is complete, follow the installation prompts to configure the directory into the environment variable.

Edit the environment file sudo vim /etc/profile that is valid for all users, and add the following statement (the specific path in the statement will be modified according to the path prompted during installation):

export PATH=/usr/local/cuda-11.8/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda-11.8/lib64:$LD_LIBRARY_PATH

Use the command to load source /etc/profile directly, or restart the server to make the configuration take effect.
Use the command nvcc -V to verify the installation. The following figure indicates that the configuration has been successful.
If you modified the mounting location of /tmp during the preparation steps, remember to restore it to its original mode. It will take effect after restarting.
At this point, CUDA is completely installed.