Nvidia graphics card Failed to initialize NVML Driver/library version mismatch error solution

GPT has become quite popular recently, so I started to experiment with the GPU to run projects:
https://github.com/OpenTalker/SadTalker
Today I suddenly found that the program could not be used. After investigation, it was probably due to the inconsistency between the NVIDIA kernel driver version and the system driver version.

The solutions to this error are briefly summarized below.

Problem recurrence


View system driver logs
cat /var/log/dpkg.log | grep nvidia
## Problem cause analysis
The NVIDIA kernel driver version does not match the system driver version.
##Check the graphics card driver kernel version
cat /proc/driver/nvidia/version
NVRM version: NVIDIA UNIX x86_64 Kernel Module 470.223.02 Thu May 11 11:46:56 UTC 2023
GCC version: gcc version 11.4.0 (Ubuntu 11.4.0-1ubuntu1~22.04)
View installed drivers

dpkg –list | grep nvidia-*

After troubleshooting, the NVIDIA graphics card driver was automatically updated, causing the program to fail.

The following is the version rollback operation.
solution
Uninstall the existing driver and reinstall it
sudo /usr/bin/nvidia-uninstall
sudo apt-get --purge remove nvidia-*
sudo apt-get purge nvidia*
sudo apt-get purge libnvidia*
Until the command outputs nothing
sudo dpkg --list | grep nvidia-*

re-install
sudo chmod a + x NVIDIA-Linux-x86_64-470.199.02.run
sudo ./NVIDIA-Linux-x86_64-470.199.02.run -no-x-check -no-nouveau-check -no-opengl-files
     –no-opengl-files Only install driver files, not OpenGL files
     –no-x-check does not check the X service when installing the driver
     –no-nouveau-check does not check nouveau when installing the driver

Error handling:
one,

1. Download the official driver
Clear the remaining nvidia driver

apt-get remove –purge nvidia*

  1. Disable integrated nouveau driver

The graphics card driver integrated into the Ubuntu system is nouveau, which is an open source driver developed by a third party for NVIDIA. We need to block it first before installing the official NVIDIA driver.
The driver is added to the blacklist.conf, but the properties of the file do not allow modification. So you need to modify the file attributes first.

Create a file:
cat /etc/modprobe.d/blacklist-nouveau.conf
blacklist nouveau
blacklist lbm-nouveau
options nouveau modeset=0
alias nouveau off
alias lbm-nouveau off
options nouveau modeset=0
blacklist rivafb
blacklist vga16fb
blacklist nouveau
blacklist nvidiafb
blacklist rivatv

Uninstall residual files from previous installations
sudo apt-get remove –purge xserver-xorg-video-nouveau
sudo apt-get –purge remove nvidia-*

3-0 Install dependencies
sudo apt update
sudo apt install dkms build-essential linux-headers-generic

3. Start installation
Install driver
$sudo chmod a + x NVIDIA-Linux-x86_64-xxx.run
$sudo sh NVIDIA-Linux-x86_64-xxx.run
#–no-x-check Close X service
#–no-nouveau-check disable nouveau
#–no-opengl-files Do not install OpenGL files
4-0 After installation is complete
sudo update-initramfs -u kernel update
sudo reboot

Uninstall leftovers:

To forcefully uninstall these two modules, you can use the dpkg command. Please follow these steps:

  1. Open a terminal and log in to your system with administrator rights.

  2. Run the following command to uninstall the linux-modules-nvidia-450-server-6.2.0-35-generic module:

    sudo dpkg --purge linux-modules-nvidia-450-server-6.2.0-35-generic
    ```
    
    
  3. Run the following command to uninstall the linux-objects-nvidia-450-server-6.2.0-35-generic module:

    sudo dpkg --purge linux-objects-nvidia-450-server-6.2.0-35-generic
    ```
    
    Note: Replace the version number and architecture (amd64) in the command with the version actually installed on your system.
    
    
  4. When the command execution is completed, these two modules should have been forcefully uninstalled.

final effect:
Other commands:

Upgrade graphics card driver command
Enter the following command to view the driver version recommended by the system:
sudo ubuntu-drivers devices

automove command
In Ubuntu, you can use the autoremove command to automatically remove packages and dependencies that are no longer needed. These packages are usually no longer needed because you upgraded or removed other packages.

To use the autoremove command, follow these steps:

  1. Open a terminal.

  2. Log in to your system with administrator rights.

  3. Run the following command to use autoremove:

    sudo apt autoremove
    ```
    
    
  4. The command will scan your system for packages and dependencies that are no longer needed and prompt you to confirm whether to remove them. Please read the packages on the removal list carefully to make sure you no longer need them.

  5. If you confirm that you want to delete these packages, enter “Y” or “yes” and press Enter.

  6. The autoremove command will automatically remove these packages and dependencies that are no longer needed.

Please note, be careful when using the autoremove command and make sure you understand the package you are removing and its impact on your system. It is recommended to perform backup before execution and perform regular system maintenance and cleaning.

tips:
To avoid trouble, turn off automatic graphics card driver updates.

1. Disable automatic upgrades

Modify the configuration file /etc/apt/apt.conf.d/10periodic
#0 is off, 1 is on, change all values to 0
vi etc/apt/apt.conf.d/10periodic
APT::Periodic::Update-Package-Lists “0”;
APT::Periodic::Download-Upgradeable-Packages “0”;
APT::Periodic::AutocleanInterval “0”;

Excuting an order:
sudo apt-mark hold linux-image-generic linux-headers-generic

2 Use apt-mark hold

The apt-mark hold command can help us lock the version of a certain software package, thus preventing Ubuntu from automatically updating the package. In this case, we can use the apt-mark hold command to lock the version of the graphics driver software package to prevent Ubuntu from automatically updating the graphics driver. The specific steps are as follows:

Open a terminal and use the following command to view the currently installed graphics card driver package:
dpkg -l | grep -i nvidia
Execute the following command to lock the version of the package:
sudo apt-mark hold
Among them, represents the name of the graphics card driver software package that needs to be locked, for example: nvidia-driver-450.

If we need to unlock it, we can use the following command:
sudo apt-mark unhold
It should be noted that this method will only disable Ubuntu from automatically updating the graphics driver. If we need to install a newer version of the graphics driver, we need to manually execute the apt-get update and apt-get upgrade commands.