Pytorch installation (Anaconda configuration virtual environment) (GPU version)

(2021.05.24) specify graphics card in pytorch

https://zhuanlan.zhihu.com/p/166161217

(2021.05.24) pytorch\cuda\cudatoolkit\graphics driver relationship

(2021.05.24) Solve the problem of incompatibility between cuda version and graphics card

https://www.jianshu.com/p/ac70300b598b told me that the lower version of pytorch, the cuda version is 9.0, and the latest graphics card cannot be used.
https://blog.csdn.net/weixin_42069606/article/details/105198845 and https://blog.csdn.net/kellyroslyn/article/details/109668001 talked about the relationship between pytorch\cuda\graphics driver, etc.

(2021.05.23) How to install the old version of pytorch?

See official instructions: here. It seems that versions before 1.0.0 cannot be installed directly from conda install pytorch==x.x.x -c pytorch, but you can download whl and install it locally or directly online.

Offline download address of pytorch version: https://download.pytorch.org/whl/torch_stable.html

After checking the list of whl files, I found that there are two suitable ones:
cu90/torch-0.3.0.post4-cp27-cp27m-linux_x86_64.whl
cu90/torch-0.3.0.post4-cp27-cp27mu-linux_x86_64.whl

After running the command pip install http://download.pytorch.org/whl/cu90/torch-0.3.0.post4-cp27-cp27m-linux_x86_64.whl, an error is reported ERROR: torch-0.3 .0.post4-cp27-cp27m-linux_x86_64.whl is not a supported wheel on this platform.Indicating that this version is not suitable for your own system installation, so replace the cp27mu version for installation (cp27 stands for python2.7), success!

1. The basic process of installing pytorch-gpu:

1. Create a virtual environment dedicated to pytorch, configure jupyter notebook, etc.

The principle, specific steps and problems encountered in this step are shown in the back of the article

2. Install NVIDIA graphics driver

Running CUDA applications requires the system to have a GPU (hardware) and GPU driver (software) compatible with CUDA Toolkit, so this step is required.
First check your graphics card model in ‘Computer-Properties-Device Manager-Display Adapter’, mine is NVIDIA GeForce 940MX. Then download the corresponding NVIDIA driver, the address is: https://www.nvidia.cn/Download/index.aspx?lang=cn, after the installation is complete, use nvidia-smi to display the basic information of the graphics card , there is a successful installation.

3. Install CUDA

CUDA is NVIDIA’s computing framework based on parallel programming and instruction set architecture, which can utilize parallel computing engines.

4. Install cuDNN

NVIDIA cuDNN is a GPU-accelerated library for deep neural networks. The access address is https://developer.nvidia.com/rdp/cudnn-archive, you need to register first to download.
Note that the CUDA version, cuDNN version, and the CUDA version of pytorch you want to choose must be consistent, and the driver version must be greater than or equal to this version.

These two steps can refer to the previous part of this article.

This article sorts out the relationship between graphics card, graphics card driver, nvcc, cuda driver, cudatoolkit, cudnn: https://cloud.tencent.com/developer/article/1536738

5. Install pytorch-gpu

Just select the corresponding version on the pytorch official website. I used the conda command, and removed the last -c pytorch, so that I can use the faster Tsinghua source.

6. Verification

torch.__version__ # 1.6.0
torch.cuda.is_available() # False
torch.version.cuda # At first it was None, after the gpu was configured it was 10.2.89
torch.__file__ # Check which directory the torch is in, e.g. the virtual environment on the server is '/home/fanyuxuan/workspace/anaconda3/envs/LatticeLSTM/lib/python2.7/site-packages/torch/__init__ .pyc'

Second, virtual environment

1. Interpretation of python virtual environment

I often see people talking about the conda virtual environment. This is actually not true. It is called a python virtual environment.

The python virtual environment can be created and managed through the virtualenv module or the pipenv module, or it can be managed through conda. Each virtual environment has a set of python and other packages that need to be used. These python and package versions can be different, achieving the effect of “isolation”. The essence of using different virtual environments is to use different sys.path when python is running, so as to access different folders and use different versions of python and packages.

I use conda to create and manage. The common commands are as follows.

conda env list # View all virtual environments (and their folder paths)
conda list # View all packages installed in the current environment

conda install numpy # install package
conda uninstall numpy # Uninstall the package
conda update numpy # update package

conda create --user --name pytorch --display-name "Python (pytorch)" # Create a virtual environment
conda activate pytorch # Activate the virtual environment, close with conda deactivate
conda remove -n pytorch # delete virtual environment

2. The solution to “the virtual environment is not displayed in the New drop-down box in jupyter notebook (that is, how to add/switch virtual environment in jupyter notebook)”:

Short version:
① Activate the specified virtual environment
②Install the ipykernel module (if not) conda install ipykernel
(Some people say that there is an additional step here: create a kernel file in a virtual environment: conda install -n pytorch ipykernel, I seem to have succeeded without any use)
③Restart jupyter notebook, you should be able to see the virtual environment under New and Kernel-Change Kernel
④ If not, then manually add your own environment to the ipython kernel:
python -m ipykernel install --user --name pytorch (virtual environment name) --display-name "Python (pytorch)" (display-name is the name you want to display in jupyter notebook )
See “Question 2” for the exploratory version and the principle version.

3. Problems encountered in the process and solutions:

Question 1: Running the command jupyter notebook in the tensorflow virtual environment reports an error ‘AttributeError’: type object ‘IOLoop’ has no attribute initialized

Solutions from other blogs:

conda install tornado=4.5

It worked. Tornado is a python web framework, jupyter notebook requires it and can only rely on tornado of version 4.5.3 (or lower?). The jupyter installed on the anaconda navigator GUI interface defaults to the latest version of tornado, resulting in a mismatch between the versions of jupyter notebook and tornado.

Question 2: How to add/switch virtual environment in jupyter notebook

After the previous problem was solved, I opened the jupyter notebook in the tensorflow virtual environment and found that there was no option for the virtual environment, so I felt that I had to figure out this problem first.

1. First of all, I have installed editors such as jupyter notebook in the virtual environment

As for why, it’s like installing python and numpy\pandas…\pytorch in every virtual environment. This can be directly searched for install from the anaconda navigator, or the command line conda install.

Some people say that just install a jupyter notebook in the main environment, and then configure the kernel. Both methods should work. I don’t know the difference. For details, please refer to * “Some methods of running jupyter notebook in the anaconda virtual environment Detailed research”*, written in great detail, including multiple methods of installing jupyte notebook, installing only once but modifying the kernel, and different ways of modifying the kernel: https://blog.csdn.net/w55100/article/ details/88925697

2. After activating the corresponding virtual environment, I did `conda install nb_conda` and `conda install ipykernel`

Some people say that (only) the nb_conda plug-in can be installed to enable the notebook to support the virtual environment. After installing it, the conda tab will appear on the jupyter page, and the virtual environment will appear directly under new. It is not known whether nb_conda and ipykernel are two independent and feasible methods. Anyway, I’m all set.

Supplement 1): What is a kernel? What is the relationship with the virtual environment?

A kernel provides programming language support in Jupyter. IPython is the default kernel. Understand, kernel is the programming language environment supported by jupyter, besides python, there are R, Julia and so on. Different versions of python compilation environments used in different virtual environments naturally require different kernels.

Jupyter maintains a list of kernels. Kernel information exists in the kernel.json file (such as which virtual environment it is mapped to), and there are different kernel.json in different virtual environments. Jupyter switching kernel is to switch a language interpreter (that is, python execution environment) for jupyter notebook.

Supplement 2): Kernel related operations

jupyter kernelspec list # Check which kernels are currently available in jupyter
jupyter kernelspec remove xxx_kernel_name # delete the specified kernel

The function of the ipykernel package is to add the current python environment as a kernel, and the configuration file path is probably ~/anaconda3/envs/pytorch/share/jupyter/kernels/python3/kernel.json

The nb_conda package installed earlier is used for kernel management.

# Manually configure the kernel:
python -m ipykernel install --user --name pytorch (virtual environment name) --display-name "Python (pytorch)"
# Automatically configure kernel: When creating a virtual environment, install ipykernel to automatically associate the kernel with the virtual environment
conda create -n my_env python=3 ipykernel

3. Subsequent operations are the same as above 2 and 2

The kernel required for running jupyter notebook and the virtual environment created by conda are not completely connected. The environment may have been created but cannot be found after starting the notebook. This is because the kernel.json file may be missing in the virtual environment. At this time, conda install -n xxx ipykernel is necessary to create a kernel file.

4. Note:

①jupyter notebook used to be called ipython notebook, so in many places ipython refers to jupyter
②Adding the conda virtual environment to jupyter can be done through the above steps, and it can also be done directly when creating the virtual environment: conda create -n pytorch python=3.8 ipykernel, here if ipykernel is replaced by anaconda, numpy is installed by the way A commonly used package like pandas.

Question 3: After using nb_conda, open the jupyter notebook-conda tab and find that there are two default environments, the one named anaconda3 cannot be deleted, and an error EnvironmentLocationNotFound: Not a conda environment pops up

Find the envmanager.py file of the nb_conda library under the Anaconda installation path, the win system is in the directory: Anaconda3\Lib\site-packages\\ b_conda\envmanager.py, the code of lines 83-86

return {
            "environments": [root_env] + [get_info(env)
                                          for env in info['envs']]
        }

changed to

return {
            "environments": [root_env] + [get_info(env)
                                 for env in info['envs'] if env != root_env['dir']]
        }

Question 4: After reinstalling anaconda, when starting jupyter notebook in base, ImportError: DLL load failed while importing error: The specified module could not be found

It is because you forgot to add D:\anaconda3\Library\bin in the environment variable. Note that D:\anaconda3\Scripts and D:\anaconda3 are also required. If there is an error of ‘environment variable is too long to exceed xxx bytes’, it is because of PATH The total number of entries in that line is too much, you can refer to other blogs to put the contents of PATH in the external environment variable

Remarks: The python in my base environment is 3.8, the python in the tensorflow virtual environment is 3.6, and the python in the pytorch virtual environment is 3.7 (cover your face)

Question 5: CUDA is still unavailable after installation

Check: ① Check the cuda version corresponding to pytorch ② Check the cuda version ③ Check the cudnn version ④ Check the Nvidia driver version ⑤ Check the corresponding requirements of the driver and cuda version
torch.__version__:'1.6.0', torch.cuda.is_available(): False, cudatoolkit:'10.2.89' code>
When installing, use the conda install pytorch torchvision cudatoolkit=10.2 command
Run nvidia-smi on the command line: get NVIDIA-SMI: 451.67, Driver Version: 451.67, CUDA Version: 11.0
The downloaded cuda installation package is 10.2, and the corresponding installation package of cudnn is also 10.2.
Run nvcc --version: CUDA release 10.2 is obtained.

I found that NVIDIA GPU DRIVER is 11.0, and other versions are 10.2. I don’t know if there will be a mismatch:

Some people can’t use the gpu because the driver version is too low, but my version is too high.
Each CUDA Toolkit has a minimum version of the CUDA driver, see the table in Figure 1. Pay attention to the sentence The CUDA Driver API is always backwards compatible. indicates that the driver is backward compatible, so it can definitely be used if the version is too high.

Finally, I found out that the conda command (CUDA) + the pytorch downloaded from Tsinghua source turned out to be the cpu version! Sinkhole!

Use conda list to accidentally find that the third column of pytorch has a cpu_ in front of the version number!

I added Tsinghua source when I installed it, and used the -c version of the conda command conda install pytorch torchvision cudatoolkit=10.2. I checked and found that some netizens also reported that the download of pytorch-gpu from Tsinghua University turned out to be a problem with the cpu version, so I switched to conda install pytorch torchvision cudatoolkit=10.2 -c pytorch to download from the official source. You can also use pip or directly find the compressed package of the gpu version on Tsinghua source and download it locally.

It’s over! I finally solved the environment configuration problem that has plagued me for almost a month (I also blame myself for not paying attention). There are still many details that have not been understood, and I hope to continue to update this article in the future. Good luck to everyone in the environment~