Ubuntu18.4 (later changed to 20.4) deploys chatglm2 and performs fine-tuning based on P-Tuning v2

Ubuntu18.4 supports cuda11 but not cuda12, ubuntu20.4 supports cuda12! !

Download driver

NVIDIA graphics card driver official download address
Download the corresponding driver and place it in a certain directory.

Before installing the NVIDIA graphics driver in a Linux system, it is recommended to uninstall the graphics driver nouveau that comes with the Linux system.

disable nouveau
First, edit the blacklist configuration.

vim /etc/modprobe.d/blacklist.conf

Add the following two lines at the end of the file.

blacklist nouveau
options nouveau modeset=0

Then, enter the command below to update and reboot.

update-initramfs -u
reboot

After restarting, enter the following command to verify whether the disablement is successful. If successful, there will be no output from this line of command.

lsmod | grep nouveau

Driver installation

First, use apt to uninstall the existing driver. The command is as follows.

apt-get purge nvidia*

gcc is missing

JSolution:

sudo apt install build-essential

Then gcc -v to see if the installation is successful

My system is ubuntu18.04 (there is a problem here, Ubuntu18 can install cuda12)

To install cuda12.1, you need to upgrade the system to at least Ubuntu20.04. After the upgrade, there was a problem with apt-get upgrade. I still uninstalled and reinstalled the system.

The version number of the Ubuntu operating system. The command is as follows

lsb_release-a

You can see that the system version number of Ubuntu is 18.04

Enter the following command in the terminal command window to update the software source list.

sudo apt-get update

After completing the above software list update, use the following command to install the update package.

sudo apt-get upgrade

Restart

reboot

——-ignore

apt install update-manager-core

sudo apt dist-upgrade

sudo do-release-upgrade

Uninstalling cuda10.1 and above:

cd /usr/local/cuda-xx.x/bin/
sudo ./cuda-uninstaller
sudo rm -rf /usr/local/cuda-xx.x

There is a problem with apt-get upgrade after the upgrade. I still uninstalled and reinstalled the system.

Download and install the corresponding version of CUDA from the official website

Download the corresponding version of CUDA Toolkit according to the system supported version. For subsequent torch installation, the author selects CUDA12.1 here. Official website link
It doesn’t work, so uninstall cuda-uninstaller from the bin folder.
Select the required version and download and install it through the corresponding command (note that you need to remember the directory of the downloaded file here, you will need to find it later)

sh cuda_12.0**.run

Configure environment variables

Edit /etc/profile and add the following at the end

export CUDA_HOME=/usr/local/cuda-12.0
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda/lib64
PATH=”$CUDA_HOME/bin:$PATH”
to validate

source /etc/profile

4. Test whether the CUDA installation is successful

nvcc -V

When reinstalling cuda12.1, be careful not to select the driver because it has been installed before.

github download chatglm2-6b

Install dependencies

pip install -r requirements.txt -i https://pypi.tuna.tsinghua.edu.cn/simple

Install Git LFS

1. curl -s https://packagecloud.io/install/repositories/github/git-lfs/script.deb.sh | sudo bash

2. sudo apt-get install git-lfs

3. Verify successful installation:

Input: git lfs install

If: Git LFS initialized. appears, it means success

Download models from Hugging Face Hub

git clone https://huggingface.co/THUDM/chatglm2-6b

Model quantification
By default, the model is loaded at FP16 precision and requires approximately 13GB of video memory to run the above code. If your GPU memory is limited, you can try to load the model in quantized mode. The usage is as follows:

# Modify in web_demo.py as needed, currently only supports 4/8 bit quantization
model = AutoModel.from_pretrained(“THUDM/chatglm2-6b”, trust_remote_code=True).quantize(4).cuda()

Among them, “THUDM/chatglm2-6b” needs to be modified to the path of your local deployment.

Note: If the memory is only 8G, select int4 for model quantification.

Install transformers, gradio and mdtex2html, pip install gradio -i https://pypi.tuna.tsinghua.edu.cn/simple

If no error is reported and there is no output after inputting information, it may be a Gradio version issue.

Requires downgrading version gradio==3.39.0

Start web_demo.py

python web_demo.py

API Deployment
First you need to install additional dependencies

pip install fastapi uvicorn

Modify “THUDM/chatglm2-6b” in api.py to the local model path

tokenizer = AutoTokenizer.from_pretrained(“D:\ChatGLM2-6B”, trust_remote_code=True)
model = AutoModel.from_pretrained(“D:\ChatGLM2-6B”, trust_remote_code=True).quantize(4).cuda()

Run api.py in the repository

pythonapi.py

Fine-tuning based on P-Tuning v2

Software dependencies

In addition to the dependencies of ChatGLM2-6B, you also need to install the following dependencies to run fine-tuning

pip install rouge_chinese nltk jieba datasets -i https://pypi.tuna.tsinghua.edu.cn/simple

cd ptuning

vi train_chat.sh, modify the model address, data set address, and output model address

Parameter explanation:

PRE_SEQ_LEN=128: A variable named PRE_SEQ_LEN is defined and set to 128. The function of this variable will be used in subsequent code.

LR=2e-2: Defines a variable named LR and sets it to 2e-2, which is 0.02. This variable represents the learning rate and will be used in subsequent code.

–train_file /root/train.json: Specify the path and file name of the training data file “/root/train.json”.

–validation_file /root/verify.json: Specify the path and file name of the verification data file “/root/verify.json”.

–prompt_column content: Specify the column name “content” in the input data as a prompt.

–response_column summary: Specify the response column name in the input data as “summary”.

–overwrite_cache : A command line parameter that instructs the cache to be overwritten if it exists.

–model_name_or_path THUDM/chatglm-6b: Specify the name or path of the model used as “THUDM/chatglm-6b”.

–output_dir output/adgen-chatglm-6b-pt: Specify the path and name of the output directory as “output/adgen-chatglm-6b-pt

–overwrite_output_dir : A command line parameter that instructs the output directory to be overwritten if it exists.

–max_source_length 512: Specifies the maximum length of the input sequence to 512.

–max_target_length 512: Specifies the maximum length of the output sequence to 512.

–per_device_train_batch_size 1: Specify the training batch size of 1 for each training device.

–per_device_eval_batch_size 1 : Specify the evaluation batch size of 1 for each evaluation device.

–gradient_accumulation_steps 16: Specify the number of gradient accumulation steps to 16. Before each update step, a certain number of gradients are calculated and accumulated.

–predict_with_generate : A command line parameter that indicates to use generate mode when generating predictions for the model.

–max_steps 3000: Specify the maximum number of steps for training to 3000.

–logging_steps 10: Specify logging every 10 steps.

–save_steps 1000: Specify to save the model every 1000 steps.

–learning_rate $LR: Specify the learning rate as the value of the previously defined LR variable.

–pre_seq_len $PRE_SEQ_LEN: Specify the preset sequence length as the value of the previously defined PRE_SEQ_LEN variable.

–quantization_bit 4: Specify the number of quantization bits to be 4. This parameter may be a specific setting related to the model.

Execute training command

sh train_chat.sh
Execute sh web_demo.py in the p-tuning folder to run the fine-tuned model.

Pay attention to the model address and fine-tuning model address in web_demo.py