Orange Pi 5 deploys chatglm2-6b model
Environmental information:
Deployment server: orangepi 5 16G version
System version: Ubuntu 22.04.3 LTS
Reference documentation:
Use GPU acceleration to run LLMs on Orange Pi: https://zhuanlan.zhihu.com/p/650110025
Far ahead! I will take you step by step to deploy the Tsinghua AI language model using the domestic Orange Pi, comparable to GPT. Can the Raspberry Pi do it? :https://zhuanlan.zhihu.com/p/663853222
1. Basic environment adjustment
(1) Replace apt download source
#Update software package apt update #install vim apt install vim -y #Back up the original sources.list file cp /etc/apt/sources.list /etc/apt/sources.list_bak vim /etc/apt/sources.list #Insert the following information deb http://repo.huaweicloud.com/ubuntu-ports/ jammy main restricted universe multiverse #deb-src http://repo.huaweicloud.com/ubuntu-ports/ jammy main restricted universe multiverse deb http://repo.huaweicloud.com/ubuntu-ports/ jammy-security main restricted universe multiverse #deb-src http://repo.huaweicloud.com/ubuntu-ports/ jammy-security main restricted universe multiverse deb http://repo.huaweicloud.com/ubuntu-ports/ jammy-updates main restricted universe multiverse #deb-src http://repo.huaweicloud.com/ubuntu-ports/ jammy-updates main restricted universe multiverse deb http://repo.huaweicloud.com/ubuntu-ports/ jammy-backports main restricted universe multiverse #deb-src http://repo.huaweicloud.com/ubuntu-ports/ jammy-backports main restricted universe multiverse #update package apt update & amp; & amp; apt upgrade -y #Install cmake apt install cmake -y
2. Install miniforge
(1) Obtain the miniforge installation package
#Install wget apt install wget -y #Method 1: Do not use scientific Internet download speed may be slow wget -e https://github.com/conda-forge/miniforge/releases/download/23.3.1-1/Miniforge3-Linux-aarch64.sh #Method 2: After the local download is completed, upload the scp command or use desktop Winscp, etc. scp -p22 username@ip:physical machine directory/Miniforge3-Linux-aarch64.sh ~/
The picture shows method 2
(2), install miniforge
cd ~/ sh Miniforge3-Linux-aarch64.sh
Installation process:
Enter
q just exit
Enter yes
Miniforge3 will now be installed into this location: #Miniforge3 will now be installed into this location: /root/miniforge3 - Press ENTER to confirm the location #-Press ENTER to confirm the location - Press CTRL-C to abort the installation #-Press CTRL-C to abort the installation - Or specify a different location below #-Or specify other locations below
If you want to save trouble, you can press Enter directly. If you want to install to another directory, enter the specific directory.
After installation is complete:
#Activate conda’s python environment source ~/.bashrc conda -V conda 23.3.1
3. Download llvm
(1) Download the compressed package
#Method 1: Slow downloading wget -e https://github.com/llvm/llvm-project/releases/download/llvmorg-17.0.2/clang + llvm-17.0.2-aarch64-linux-gnu.tar.xz #Method 2: After the local download is completed, upload the scp command or use desktop Winscp, etc. scp -p22 username@ip:physical machine directory/clang + llvm-17.0.2-aarch64-linux-gnu.tar.xz ~/ #decompression tar -xvf clang + llvm-17.0.2-aarch64-linux-gnu.tar.xz
4. Download mlc-ai
(1). Obtain the project code
#Method 1: Slow downloading git clone --recursive https://github.com/mlc-ai/relax.git tvm_unity & amp; & amp; cd tvm_unity/ #Method 2: Same as before, directly upload the compressed package scp -p22 username@ip:physical machine directory/tvm_utity.tar.gz ~/ tar -zxvf tvm_utity.tar.gz
(2), compile
cd tvm_unity/ mkdir -p build & amp; & amp; cd build cp ../cmake/config.cmake . viconfig.cmake set(CMAKE_BUILD_TYPE RelWithDebInfo) #This item is not in the file and needs to be added set(USE_OPENCL ON) #This item can be found in the file and needs to be modified. set(HIDE_PRIVATE_SYMBOLS ON) #This item is not in the file and needs to be added set(USE_LLVM /root/clang + llvm-17.0.2-aarch64-linux-gnu/bin/llvm-config) #This item can be found in the file and needs to be modified. #Install the compilation environment apt install g++ zlib1g-dev -y apt-get install libvcflib-tools cmake.. #Start compiling tvm make -j8
(3), python dependency installation
cd ../python apt install pip git -y #Install dependencies pip3 install --user . #If an error is reported during direct installation, you can specify the python installation dependency address. pip3 install --user . -i http://pypi.douban.com/simple --trusted-host pypi.douban.com
(4), add environment variables
vim /root/.bashrc #Add the following environment variables export PATH="$PATH:/root/.local/bin" #Effective environment variables source /root/.bashrc #Test installation tvmc #Output results usage: tvmc [--config CONFIG] [-v] [--version] [-h] {run,tune,compile} ... TVM compiler driver options: --config CONFIG configuration json file -v, --verbose increase verbosity --version print the version and exit -h, --help show this help message and exit. commands: {run,tune,compile} run run a compiled module tune auto-tune a model compile compile a model. TVMC - TVM driver command-line interface
5. Install mlc-llm
(1), download source code
#Method 1: Slow downloading git clone --recursive https://github.com/mlc-ai/mlc-llm.git & amp; & amp; cd mlc-llm #Method 2: After the local download is completed, upload the scp command or use desktop Winscp, etc. scp -p22 username@ip:physical machine directory/mlc-llm.tar.gz ~/ #decompression tar -xvf mlc-llm.tar.gz cd mlc-llm/
(2), install python dependencies
cd ~/mlc-llm/ pip3 install --user . #If the direct installation is abnormal, you can specify the python installation dependency address. pip3 install --user . -i http://pypi.douban.com/simple --trusted-host pypi.douban.com #Verify after installation is complete python3 -m mlc_llm.build –help #Output results usage: build.py [-h] [--model MODEL] [--hf-path HF_PATH] [--quantization {autogptq_llama_q4f16_0,autogptq_llama_q4f16_1,q0f16,q0f32,q3f16_0,q3f16_1,q4f16_0,q4f16_1,q4f16_2,q4f16_ft,q4f32_0,q4f32_1,q8f16_ft,q8 f16_1}] [--max-seq-len MAX_SEQ_LEN] [--target TARGET] [--reuse-lib REUSE_LIB] [--artifact-path ARTIFACT_PATH] [--use-cache USE_CACHE] [--convert-weight-only] [--build-model-only] [- -debug-dump] [--debug-load-script] [--llvm-mingw LLVM_MINGW] [--cc-path CC_PATH] [--system-lib] [--sep-embed] [--use-safetensors] [--enable-batching] [--no-cutlass -attn] [--no-cutlass-norm] [--no-cublas] [--use-cuda-graph] [--num-shards NUM_SHARDS] [--use-flash-attn-mqa] [--pdb] [--use-vllm-attention] build.py: error: unrecognized arguments: –help
6. Download chatglm2-6b model
(1). Download the model
#Create a directory to store the model mkdir -p dist/models & amp; & amp; cd dist/models #Method 1: Download the model, you need to access the scientific Internet git lfs install & amp; & git clone https://huggingface.co/THUDM/chatglm2-6b #Method 2: Local upload scp -p22 username@ip:physical machine directory/mlc-llm.tar.gz ~/ tar -zxvf chatglm2-6b.tar.gz
7. Install OpenCL driver
(1), install driver
#Step installation tutorial address, requires scientific Internet access https://llm.mlc.ai/docs/install/gpu.html#orange-pi-5-rk3588-based-sbc #Step content #Download and install the Ubuntu 22.04 for your board from here #Download and install libmali-g610.so cd /usr/lib & amp; & amp; sudo wget https://github.com/JeffyCN/mirrors/raw/libmali/lib/aarch64-linux-gnu/libmali-valhall-g610-g6p0-x11-wayland-gbm .so #Check if file mali_csffw.bin exists under path /lib/firmware, if not download it with command: cd /lib/firmware & amp; & amp; sudo wget https://github.com/JeffyCN/mirrors/raw/libmali/firmware/g610/mali_csffw.bin #Download OpenCL ICD loader and manually add libmali to ICD apt update apt install mesa-opencl-icd mkdir -p /etc/OpenCL/vendors echo "/usr/lib/libmali-valhall-g610-g6p0-x11-wayland-gbm.so" | sudo tee /etc/OpenCL/vendors/mali.icd #Download and install libOpenCL apt install ocl-icd-opencl-dev #Download and install dependencies for Mali OpenCL apt install libxcb-dri2-0 libxcb-dri3-0 libwayland-client0 libwayland-server0 libx11-xcb1 #Download and install clinfo to check if OpenCL successfully installed apt install clinfo #To verify you have correctly installed OpenCL runtime and Mali GPU driver, run clinfo in command line and see if you can get the GPU information. You are expect to see the following information: clinfo arm_release_ver: g13p0-01eac0, rk_so_ver: 3 Number of platforms 2 Platform Name ARM Platform Platform Vendor ARM Platform Version OpenCL 2.1 v1.g6p0-01eac0.2819f9d4dbe0b5a2f89c835d8484f9cd Platform Profile FULL_PROFILE ... #Note: The above downloaded files can be manually uploaded to the physical machine and transferred to the container, and placed in the specified directory.
8. Install rust
(1) Installation
#Download and install rust #method 1 apt install curl -y curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh #method2 apt install rustc cargo -y #Add environment variables vim /root/.bashrc #Add content export PATH="$PATH:/root/.cargo/bin" source ~/.bashrc rustc --version #output rustc 1.66.1 (90743e729 2023-01-10) (built from a source tarball)
9. Compilation model
(1) Compile model
#Switch to the mlc-llm directory cd /root/mlc-llm #chatglm2-6bModify yourself based on the information in the model directory python3 -m mlc_llm.build --model chatglm2-6b --target opencl --max-seq-len 32768 --quantization q8f16_1 #Compilation is completed and a so file will be generated. ll dist/chatglm2-6b-q8f16_1/chatglm2-6b-q8f16_1-opencl.so -rwxr-xr-x 1 root root 3065776 Nov 10 15:27 dist/chatglm2-6b-q8f16_1/chatglm2-6b-q8f16_1-opencl.so* #Execute in the /root/mlc-llm directory mkdir -p build & amp; & amp; cd build cmake .. & amp; & amp; cmake --build . --parallel $(nproc) & amp; & amp; cd .. #After compilation is completed ls -l ./build/ total 89208 -rw-r--r-- 1 root root 26782 Nov 10 15:31 CMakeCache.txt drwxr-xr-x 11 root root 4096 Nov 10 15:34 CMakeFiles -rw-r--r-- 1 root root 6529 Nov 10 15:31 cmake_install.cmake -rw-r--r-- 1 root root 3095 Nov 10 15:31 CPackConfig.cmake -rw-r--r-- 1 root root 3387 Nov 10 15:31 CPackSourceConfig.cmake -rw-r--r-- 1 root root 16976604 Nov 10 15:34 libmlc_llm.a -rwxr-xr-x 1 root root 35807824 Nov 10 15:34 libmlc_llm_module.so -rwxr-xr-x 1 root root 35807824 Nov 10 15:34 libmlc_llm.so -rw-r--r-- 1 root root 23948 Nov 10 15:31 Makefile -rwxr-xr-x 1 root root 2659392 Nov 10 15:34 mlc_chat_cli drwxr-xr-x 6 root root 4096 Nov 10 15:34 tokenizers drwxr-xr-x 3 root root 4096 Nov 10 15:34 tvm -rw-r--r-- 1 root root 1886 Nov 10 15:31 TVMBuildOptions.txt ./build/mlc_chat_cli –help Maximum number of positional arguments exceeded Usage: mlc_chat_cli [--help] [--version] [--model VAR] [--model-lib-path VAR] [--device VAR] [--evaluate] [--eval-prompt-len VAR] [--eval-gen-len VAR] MLCChat CLI is the command line tool to run MLC-compiled LLMs out of the box. Note: the --model argument is required. It can either be the model name with its quantization scheme or a full path to the model folder. In the former case, the provided name will be used to search for the model folder over possible paths . --model-lib-path argument is optional. If unspecified, the --model argument will be used to search for the library file over possible paths. Optional arguments: -h, --help shows help message and exits -v, --version prints version information and exits --model [required] the model to use --model-lib-path [optional] the full path to the model library file to use --device [default: "auto"] --evaluate --eval-prompt-len [default: 128] --eval-gen-len [default: 1024] #Run model ./build/mlc_chat_cli --model chatglm2-6b-q8f16_1 --device opencl #output Use MLC config: "/root/mlc-llm/dist/chatglm2-6b-q8f16_1/params/mlc-chat-config.json" Use model weights: "/root/mlc-llm/dist/chatglm2-6b-q8f16_1/params/ndarray-cache.json" Use model library: "/root/mlc-llm/dist/chatglm2-6b-q8f16_1/chatglm2-6b-q8f16_1-opencl.so" You can use the following special commands: /help print the special commands /exit quit the cli /stats print out the latest stats (token/sec) /reset restart a fresh chat /reload [model] reload model `model` from disk, or reload the current model if `model` is not specified Loading model... arm_release_ver of this libmali is 'g6p0-01eac0', rk_so_ver is '7'. Loading finished Running system prompts... System prompts finished Q: who are you Answer: I am an artificial intelligence assistant named ChatGLM2-6B, which is developed based on the language model jointly trained by Tsinghua University’s KEG Laboratory and Zhipu AI Company in 2023. My role is to provide appropriate responses and support to users' questions and requests. Q: /stats prefill: 0.7 tok/s, decode: 1.9 tok/s
10. Code calling demo
(1), test code
vi ~/.bashrc #Add environment variables, replace $(pwd) with the installation directory export TVM_HOME=$(pwd)/tvm_unity export MLC_LLM_HOME=$(pwd)/mlc-llm export PYTHONPATH=$TVM_HOME/python:$MLC_LLM_HOME/python:${PYTHONPATH} #Effective environment variables source ~/.bashrc #Create demo.py in the /root/mlc-llm directory (required for calling the model) and insert the following content from mlc_chat import ChatModule from mlc_chat.callback import StreamToStdout cm = ChatModule(model="chatglm2-6b-q8f16_1") # Generate a response for a given prompt output = cm.generate( prompt="Who are you?", progress_callback=StreamToStdout(callback_interval=2), ) # Print prefill and decode performance statistics print(f"Statistics: {cm.stats()}\ ") #Note, some python dependencies may need to be installed manually python demo.py #Test output arm_release_ver of this libmali is 'g6p0-01eac0', rk_so_ver is '7'. I am an artificial intelligence assistant named ChatGLM2-6B, developed based on a language model jointly trained by Tsinghua University’s KEG Laboratory and Zhipu AI Company in 2023. My role is to provide appropriate responses and support to users' questions and requests. Statistics: prefill: 1.4 tok/s, decode: 2.2 tok/s