Task 1 Deploy the ChatGLM3-6B large model and conduct dialogue testing

Deploy the ChatGLM3-6B large model and conduct conversation testing

0 Introduction:
1 Tendong cloud project creation and environment configuration
- 1.1 Create project:
- 1.2 Configuration environment
- - 1.2.1 Enter the terminal
  - 1.2.2 Set mirror source
  - 1.2.3 Clone the project and install dependencies
2 Modify the code, change the path and start the code
3 Run the code
- 3.1 Run the gradient interface:
- 3.2 Run the streamlit interface:

0 Introduction:

This project is a joint project between datawhale and Trend Cloud Platform. This article is mainly about learning and understanding the use of large models.
The large model used is ChatGLM3-6B, which is a new generation of dialogue pre-training jointly released by Zhipu AI and Tsinghua University KEG Laboratory.
Practice models.
The overall platform feels very easy to use and facilitates model deployment.
Project address: https://github.com/THUDM/ChatGLM3

1 Trend Cloud project creation and environment configuration

1.1 Create project:

After creating an account, enter your own space and click Create Project in the upper right corner.

Name your project and select local code:

Mirror selection pytorch2.0.1, python3.9

Select the pre-trained model, click Public, and select the ChtaGLM3-6B model that you did not upload.

After selecting everything, click Create in the lower right corner and choose not to upload the code yet. I will clone the code directly later.
Click to run code

Resource configuration selection: B1.large, 24G of video memory is enough to load the model. Others do not need to be set, then
Then click Start in the lower right corner.

1.2 Configuration environment

1.2.1 Enter the terminal

After the two tools on the right have been loaded, click JupyterLabi to enter the development environment.

After entering the interface, there will be an environment interface. You can simply run the environment interface: you can see that each part of the file has its own fixed location.

Then click the small plus sign to create a new terminal.

Click terminal to enter the terminal:

1.2.2 Set mirror source

First enter tmux in the terminal to enter a new session window. Use tux to maintain terminal stability.

tmux

Upgrade apt and install unzip:

apt-get update & apt-get install unzip

Set the mirror source and upgrade pip:

git config --global url."https://gitclone.com/".insteadof https:/
pip config set global.index-url https://mirrors.ustc.edu.cn/pypi/web/simple
python3 -m pip install --upgrade pip

Note: If there are some connection errors, you can add in the middle of line 23

pip config set global.trusted-host mirrors.ustc.edu.cn

1.2.3 Clone the project and install dependencies

Clone the project: and enter the project directory:

git clone https://github.com/THUDM/ChatGLM3.git
cd ChatGLM3

I think we should create a virtual environment in it, and then install it in the virtual environment instead of installing it in the base.

Return to the terminal and install dependencies:
Modify requirements:
Double-click the requirements.txt file on the left and delete torch in it, because we already have it in our environment
torch to avoid wasting time on repeated downloads.

pip install -r requirements.txt

2 Modify the code, change the path and start the code

Modify web_demo2.py ** Modify the path to load the model ** to: …/…/pretrain, as shown in the figure below:

Modify the web_demo.py file, first modify the path code, and then modify the startup code.

Modify the startup code below to the following code:
demo.queue().launch(share=False,server_name=”0.0.0.0″,server_port=7000)

At the same time, on the right side of the interface, add the external port: 7000

3 Run code

3.1 Run gradient interface:

python web_demo.py

After loading, copy the external access link and open it in the browser:

direct.virtaicloud.com:43779

There may be some problems when opening with Google. You can switch to IE browser:

The confusing characters A and B here are caused by my mouse wheel moving up and down. They can be ignored and have no effect.

3.2 Run streamlit interface:

If you have already run Gradio, you need to kill the process first, otherwise there will not be enough memory.
Use **ctrl + c** to kill the process.
After killing the process, the video memory will not be released immediately. You can observe the GPU memory usage on the right to check the video memory release situation.

web_demo2.py has been directly modified above, so it can be run directly using streamlit:

streamlit run web_demo2.py

After running streamlit, the terminal prints two addresses. Add a port number on the right that is the same as the port number displayed on the terminal.

Add port:

Wait for the loading to complete, then copy it to the browser and open it:

Then open the browser: You can have a conversation.