Orange Pi 5 deploys chatglm2-6b model

Orange Pi 5 deploys chatglm2-6b model

Environmental information:

Deployment server: orangepi 5 16G version

System version: Ubuntu 22.04.3 LTS

Reference documentation:

Use GPU acceleration to run LLMs on Orange Pi: https://zhuanlan.zhihu.com/p/650110025

Far ahead! I will take you step by step to deploy the Tsinghua AI language model using the domestic Orange Pi, comparable to GPT. Can the Raspberry Pi do it? :https://zhuanlan.zhihu.com/p/663853222

1. Basic environment adjustment

(1) Replace apt download source
#Update software package
apt update
#install vim
apt install vim -y

#Back up the original sources.list file
cp /etc/apt/sources.list /etc/apt/sources.list_bak
vim /etc/apt/sources.list
#Insert the following information
deb http://repo.huaweicloud.com/ubuntu-ports/ jammy main restricted universe multiverse
#deb-src http://repo.huaweicloud.com/ubuntu-ports/ jammy main restricted universe multiverse

deb http://repo.huaweicloud.com/ubuntu-ports/ jammy-security main restricted universe multiverse
#deb-src http://repo.huaweicloud.com/ubuntu-ports/ jammy-security main restricted universe multiverse

deb http://repo.huaweicloud.com/ubuntu-ports/ jammy-updates main restricted universe multiverse
#deb-src http://repo.huaweicloud.com/ubuntu-ports/ jammy-updates main restricted universe multiverse

deb http://repo.huaweicloud.com/ubuntu-ports/ jammy-backports main restricted universe multiverse
#deb-src http://repo.huaweicloud.com/ubuntu-ports/ jammy-backports main restricted universe multiverse

#update package
apt update & amp; & amp; apt upgrade -y
#Install cmake
apt install cmake -y

2. Install miniforge

(1) Obtain the miniforge installation package

#Install wget
apt install wget -y

#Method 1: Do not use scientific Internet download speed may be slow
wget -e https://github.com/conda-forge/miniforge/releases/download/23.3.1-1/Miniforge3-Linux-aarch64.sh
#Method 2: After the local download is completed, upload the scp command or use desktop Winscp, etc.
scp -p22 username@ip:physical machine directory/Miniforge3-Linux-aarch64.sh ~/

The picture shows method 2

(2), install miniforge

cd ~/
sh Miniforge3-Linux-aarch64.sh

Installation process:

Enter

q just exit

Enter yes

Miniforge3 will now be installed into this location: #Miniforge3 will now be installed into this location:
/root/miniforge3

  - Press ENTER to confirm the location #-Press ENTER to confirm the location
  - Press CTRL-C to abort the installation #-Press CTRL-C to abort the installation
  - Or specify a different location below #-Or specify other locations below

If you want to save trouble, you can press Enter directly. If you want to install to another directory, enter the specific directory.

After installation is complete:

#Activate conda’s python environment
source ~/.bashrc
conda -V
conda 23.3.1

3. Download llvm

(1) Download the compressed package

#Method 1: Slow downloading
wget -e https://github.com/llvm/llvm-project/releases/download/llvmorg-17.0.2/clang + llvm-17.0.2-aarch64-linux-gnu.tar.xz
#Method 2: After the local download is completed, upload the scp command or use desktop Winscp, etc.
scp -p22 username@ip:physical machine directory/clang + llvm-17.0.2-aarch64-linux-gnu.tar.xz ~/

#decompression
tar -xvf clang + llvm-17.0.2-aarch64-linux-gnu.tar.xz

4. Download mlc-ai

(1). Obtain the project code

#Method 1: Slow downloading
git clone --recursive https://github.com/mlc-ai/relax.git tvm_unity & amp; & amp; cd tvm_unity/
#Method 2: Same as before, directly upload the compressed package
scp -p22 username@ip:physical machine directory/tvm_utity.tar.gz ~/
tar -zxvf tvm_utity.tar.gz

(2), compile

cd tvm_unity/
mkdir -p build & amp; & amp; cd build
cp ../cmake/config.cmake .

viconfig.cmake
set(CMAKE_BUILD_TYPE RelWithDebInfo) #This item is not in the file and needs to be added
set(USE_OPENCL ON) #This item can be found in the file and needs to be modified.
set(HIDE_PRIVATE_SYMBOLS ON) #This item is not in the file and needs to be added
set(USE_LLVM /root/clang + llvm-17.0.2-aarch64-linux-gnu/bin/llvm-config) #This item can be found in the file and needs to be modified.

#Install the compilation environment
apt install g++ zlib1g-dev -y
apt-get install libvcflib-tools

cmake..
#Start compiling tvm
make -j8

(3), python dependency installation

cd ../python
apt install pip git -y
#Install dependencies
pip3 install --user .
#If an error is reported during direct installation, you can specify the python installation dependency address.
pip3 install --user . -i http://pypi.douban.com/simple --trusted-host pypi.douban.com

(4), add environment variables

vim /root/.bashrc
#Add the following environment variables
export PATH="$PATH:/root/.local/bin"
#Effective environment variables
source /root/.bashrc
#Test installation
tvmc
#Output results
usage: tvmc [--config CONFIG] [-v] [--version] [-h] {run,tune,compile} ...

TVM compiler driver

options:
  --config CONFIG configuration json file
  -v, --verbose increase verbosity
  --version print the version and exit
  -h, --help show this help message and exit.

commands:
  {run,tune,compile}
    run run a compiled module
    tune auto-tune a model
    compile compile a model.

TVMC - TVM driver command-line interface

5. Install mlc-llm

(1), download source code

#Method 1: Slow downloading
git clone --recursive https://github.com/mlc-ai/mlc-llm.git & amp; & amp; cd mlc-llm
#Method 2: After the local download is completed, upload the scp command or use desktop Winscp, etc.
scp -p22 username@ip:physical machine directory/mlc-llm.tar.gz ~/

#decompression
tar -xvf mlc-llm.tar.gz
cd mlc-llm/

(2), install python dependencies

cd ~/mlc-llm/
pip3 install --user .
#If the direct installation is abnormal, you can specify the python installation dependency address.
pip3 install --user . -i http://pypi.douban.com/simple --trusted-host pypi.douban.com
#Verify after installation is complete
python3 -m mlc_llm.build –help
#Output results
usage: build.py [-h] [--model MODEL] [--hf-path HF_PATH]
                [--quantization {autogptq_llama_q4f16_0,autogptq_llama_q4f16_1,q0f16,q0f32,q3f16_0,q3f16_1,q4f16_0,q4f16_1,q4f16_2,q4f16_ft,q4f32_0,q4f32_1,q8f16_ft,q8 f16_1}] [--max-seq-len MAX_SEQ_LEN]
                [--target TARGET] [--reuse-lib REUSE_LIB] [--artifact-path ARTIFACT_PATH] [--use-cache USE_CACHE] [--convert-weight-only] [--build-model-only] [- -debug-dump] [--debug-load-script]
                [--llvm-mingw LLVM_MINGW] [--cc-path CC_PATH] [--system-lib] [--sep-embed] [--use-safetensors] [--enable-batching] [--no-cutlass -attn] [--no-cutlass-norm] [--no-cublas] [--use-cuda-graph]
                [--num-shards NUM_SHARDS] [--use-flash-attn-mqa] [--pdb] [--use-vllm-attention]
build.py: error: unrecognized arguments: –help

6. Download chatglm2-6b model

(1). Download the model

#Create a directory to store the model
mkdir -p dist/models & amp; & amp; cd dist/models
#Method 1: Download the model, you need to access the scientific Internet
git lfs install & amp; & git clone https://huggingface.co/THUDM/chatglm2-6b
#Method 2: Local upload
scp -p22 username@ip:physical machine directory/mlc-llm.tar.gz ~/
tar -zxvf chatglm2-6b.tar.gz

7. Install OpenCL driver

(1), install driver

#Step installation tutorial address, requires scientific Internet access
https://llm.mlc.ai/docs/install/gpu.html#orange-pi-5-rk3588-based-sbc
#Step content
#Download and install the Ubuntu 22.04 for your board from here
#Download and install libmali-g610.so
cd /usr/lib & amp; & amp; sudo wget https://github.com/JeffyCN/mirrors/raw/libmali/lib/aarch64-linux-gnu/libmali-valhall-g610-g6p0-x11-wayland-gbm .so

#Check if file mali_csffw.bin exists under path /lib/firmware, if not download it with command:
cd /lib/firmware & amp; & amp; sudo wget https://github.com/JeffyCN/mirrors/raw/libmali/firmware/g610/mali_csffw.bin

#Download OpenCL ICD loader and manually add libmali to ICD
apt update
apt install mesa-opencl-icd
mkdir -p /etc/OpenCL/vendors
echo "/usr/lib/libmali-valhall-g610-g6p0-x11-wayland-gbm.so" | sudo tee /etc/OpenCL/vendors/mali.icd

#Download and install libOpenCL
apt install ocl-icd-opencl-dev

#Download and install dependencies for Mali OpenCL
apt install libxcb-dri2-0 libxcb-dri3-0 libwayland-client0 libwayland-server0 libx11-xcb1

#Download and install clinfo to check if OpenCL successfully installed
apt install clinfo

#To verify you have correctly installed OpenCL runtime and Mali GPU driver, run clinfo in command line and see if you can get the GPU information. You are expect to see the following information:

clinfo
arm_release_ver: g13p0-01eac0, rk_so_ver: 3
Number of platforms 2
   Platform Name ARM Platform
   Platform Vendor ARM
   Platform Version OpenCL 2.1 v1.g6p0-01eac0.2819f9d4dbe0b5a2f89c835d8484f9cd
   Platform Profile FULL_PROFILE
   ...

#Note: The above downloaded files can be manually uploaded to the physical machine and transferred to the container, and placed in the specified directory.

8. Install rust

(1) Installation

#Download and install rust
#method 1
apt install curl -y
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
#method2
apt install rustc cargo -y
#Add environment variables
vim /root/.bashrc
#Add content
export PATH="$PATH:/root/.cargo/bin"
source ~/.bashrc

rustc --version
#output
rustc 1.66.1 (90743e729 2023-01-10) (built from a source tarball)

9. Compilation model

(1) Compile model

#Switch to the mlc-llm directory
cd /root/mlc-llm
#chatglm2-6bModify yourself based on the information in the model directory
python3 -m mlc_llm.build --model chatglm2-6b --target opencl --max-seq-len 32768 --quantization q8f16_1
#Compilation is completed and a so file will be generated.
ll dist/chatglm2-6b-q8f16_1/chatglm2-6b-q8f16_1-opencl.so
-rwxr-xr-x 1 root root 3065776 Nov 10 15:27 dist/chatglm2-6b-q8f16_1/chatglm2-6b-q8f16_1-opencl.so*

#Execute in the /root/mlc-llm directory
mkdir -p build & amp; & amp; cd build
cmake .. & amp; & amp; cmake --build . --parallel $(nproc) & amp; & amp; cd ..

#After compilation is completed
ls -l ./build/
total 89208
-rw-r--r-- 1 root root 26782 Nov 10 15:31 CMakeCache.txt
drwxr-xr-x 11 root root 4096 Nov 10 15:34 CMakeFiles
-rw-r--r-- 1 root root 6529 Nov 10 15:31 cmake_install.cmake
-rw-r--r-- 1 root root 3095 Nov 10 15:31 CPackConfig.cmake
-rw-r--r-- 1 root root 3387 Nov 10 15:31 CPackSourceConfig.cmake
-rw-r--r-- 1 root root 16976604 Nov 10 15:34 libmlc_llm.a
-rwxr-xr-x 1 root root 35807824 Nov 10 15:34 libmlc_llm_module.so
-rwxr-xr-x 1 root root 35807824 Nov 10 15:34 libmlc_llm.so
-rw-r--r-- 1 root root 23948 Nov 10 15:31 Makefile
-rwxr-xr-x 1 root root 2659392 Nov 10 15:34 mlc_chat_cli
drwxr-xr-x 6 root root 4096 Nov 10 15:34 tokenizers
drwxr-xr-x 3 root root 4096 Nov 10 15:34 tvm
-rw-r--r-- 1 root root 1886 Nov 10 15:31 TVMBuildOptions.txt

./build/mlc_chat_cli –help
Maximum number of positional arguments exceeded
Usage: mlc_chat_cli [--help] [--version] [--model VAR] [--model-lib-path VAR] [--device VAR] [--evaluate] [--eval-prompt-len VAR] [--eval-gen-len VAR]

MLCChat CLI is the command line tool to run MLC-compiled LLMs out of the box.
Note: the --model argument is required. It can either be the model name with its quantization scheme or a full path to the model folder. In the former case, the provided name will be used to search for the model folder over possible paths . --model-lib-path argument is optional. If unspecified, the --model argument will be used to search for the library file over possible paths.

Optional arguments:
  -h, --help shows help message and exits
  -v, --version prints version information and exits
  --model [required] the model to use
  --model-lib-path [optional] the full path to the model library file to use
  --device [default: "auto"]
  --evaluate
  --eval-prompt-len [default: 128]
  --eval-gen-len [default: 1024]

#Run model
./build/mlc_chat_cli --model chatglm2-6b-q8f16_1 --device opencl
#output
Use MLC config: "/root/mlc-llm/dist/chatglm2-6b-q8f16_1/params/mlc-chat-config.json"
Use model weights: "/root/mlc-llm/dist/chatglm2-6b-q8f16_1/params/ndarray-cache.json"
Use model library: "/root/mlc-llm/dist/chatglm2-6b-q8f16_1/chatglm2-6b-q8f16_1-opencl.so"
You can use the following special commands:
  /help print the special commands
  /exit quit the cli
  /stats print out the latest stats (token/sec)
  /reset restart a fresh chat
  /reload [model] reload model `model` from disk, or reload the current model if `model` is not specified

Loading model...
arm_release_ver of this libmali is 'g6p0-01eac0', rk_so_ver is '7'.
Loading finished
Running system prompts...
System prompts finished
Q: who are you
Answer: I am an artificial intelligence assistant named ChatGLM2-6B, which is developed based on the language model jointly trained by Tsinghua University’s KEG Laboratory and Zhipu AI Company in 2023. My role is to provide appropriate responses and support to users' questions and requests.
Q: /stats
prefill: 0.7 tok/s, decode: 1.9 tok/s

10. Code calling demo

(1), test code

vi ~/.bashrc
#Add environment variables, replace $(pwd) with the installation directory
export TVM_HOME=$(pwd)/tvm_unity
export MLC_LLM_HOME=$(pwd)/mlc-llm
export PYTHONPATH=$TVM_HOME/python:$MLC_LLM_HOME/python:${PYTHONPATH}
#Effective environment variables
source ~/.bashrc

#Create demo.py in the /root/mlc-llm directory (required for calling the model) and insert the following content
from mlc_chat import ChatModule
from mlc_chat.callback import StreamToStdout
cm = ChatModule(model="chatglm2-6b-q8f16_1")

# Generate a response for a given prompt
output = cm.generate(
           prompt="Who are you?",
              progress_callback=StreamToStdout(callback_interval=2),
              )

# Print prefill and decode performance statistics
print(f"Statistics: {cm.stats()}\
")

#Note, some python dependencies may need to be installed manually
python demo.py
#Test output
arm_release_ver of this libmali is 'g6p0-01eac0', rk_so_ver is '7'.
I am an artificial intelligence assistant named ChatGLM2-6B, developed based on a language model jointly trained by Tsinghua University’s KEG Laboratory and Zhipu AI Company in 2023. My role is to provide appropriate responses and support to users' questions and requests.
Statistics: prefill: 1.4 tok/s, decode: 2.2 tok/s