PaddleNLP: Conversational Bots Based on LLaMA and Vicuna

This article is from the AlStudio community boutique project, [click here] to view more boutique content >>>

Introduction

  • Using PaddleNLP and Vicuna to pre-train large language model parameters to implement LLaMA model-based conversational robots.

  • PS. Please use the V100 32G or higher environment for this project and run it with the latest development version of PaddlePaddle and PaddleNLP. The model parameters are for non-commercial use only.

  • UPDATE: Added Vicuna 13B model parameters, larger model and better results.

References

  • facebookresearch/llama

  • lm-sys/FastChat

  • lmsys/vicuna-7b-delta-v1.1

  • lmsys/vicuna-13b-delta-v1.1

  • PaddlePaddle/PaddleNLP

LLaMA model

  • LLaMa is a large language model open sourced by Meta.

  • Its full name is Large Language Model Meta AI, with parameters ranging from 7 billion to 65 billion.

  • For example, the 13 billion parameter LLaMA model outperforms the 175 billion parameter GPT-3 on most benchmarks and can run on a single V100 GPU.

  • And the largest LLaMA model with 65 billion parameters is comparable to Google’s Chinchilla-70B and PaLM-540B.

Vacuna model

  • Vicuna is the latest open source large model jointly released by scholars from UC Berkeley, CMU, Stanford and other institutions.

  • Based on Meta’s open-source LLaMA large model, it is fine-tuned using user-shared dialogue data on the ShareGPT platform.

  • Contains open source pre-trained models of 7B and 13B models.

Download model

  • Download the corresponding model as needed, and download Vicuna 13B by default
# Download Vicuna 7B
# !git lfs clone http://git.aistudio.baidu.com/180581/vicuna-7b-v1.1.git

# Download Vicuna 13B
!git lfs clone http://git.aistudio.baidu.com/180581/vicuna-13b-v1.1.git

Environment configuration

  • Since some features require the latest version of Paddle and PaddleNLP support

  • So you need to install the latest development version of Paddle and PaddleNLP

!pip install --pre --upgrade paddlenlp -f https://www.paddlepaddle.org.cn/whl/paddlenlp.html --user
!pip install paddlepaddle-gpu==0.0.0.post112 -f https://www.paddlepaddle.org.cn/whl/linux/gpu/develop.html --user

Model loading

  • Create an LLaMA model

  • Load model parameters for Vicuna 7B / 13B

import os
import glob
import paddle

from tqdm import tqdm
from paddlenlp.transformers import LlamaForCausalLM, LlamaConfig, LlamaTokenizer

pattern = 'paddle-model-?-of-?.pdparams'

#Vicuna 7B
# ckpt_dir = 'vicuna-7b-v1.1'
# config_dict = {<!-- -->
# "hidden_size": 4096,
# "initializer_range": 0.02,
# "intermediate_size": 11008,
# "max_position_embeddings": 2048,
# "model_type": "llama",
# "num_attention_heads": 32,
# "num_hidden_layers": 32,
# "rms_norm_eps": 1e-06,
# "vocab_size": 32000,
# "bos_token_id": 1,
# "eos_token_id": 2,
# "pad_token_id": 0,
# "use_cache": True,
# "use_recompute": False,
# "use_flash_attention": False,
# }

#Vicuna 13B
ckpt_dir = 'vicuna-13b-v1.1'
config_dict = {<!-- -->
    "hidden_size": 5120,
    "initializer_range": 0.02,
    "intermediate_size": 13824,
    "max_position_embeddings": 2048,
    "model_type": "llama",
    "num_attention_heads": 40,
    "num_hidden_layers": 40,
    "rms_norm_eps": 1e-06,
    "vocab_size": 32000,
    "bos_token_id": 1,
    "eos_token_id": 2,
    "pad_token_id": 0,
    "use_cache": True,
    "use_recompute": False,
    "use_flash_attention": False,
}

paddle.set_default_dtype('float16')

tokenizer = LlamaTokenizer. from_pretrained(ckpt_dir)

config = LlamaConfig(**config_dict)

model = LlamaForCausalLM(config)
model.eval()

for name, layer in model.named_sublayers():
    if 'rotary_emb' in name:
        layer.inv_freq = layer.inv_freq.cast(paddle.float32)

paddle.device.cuda.empty_cache()


for file_path in tqdm(glob.glob(os.path.join(ckpt_dir, pattern))):
    params = paddle. load(file_path)
    assert model.set_dict(params)[1] == [], 'Load error.'
    del params
    paddle.device.cuda.empty_cache()

Dialogue Bot

  • Realize the effect of the dialogue robot through the set Prompt.

    'USER: {input}\
    \
    ASSISTANT: {output}'
    
input_text = input('USER: ')
prompt = f'''USER: {<!-- -->input_text}\
\
ASSISTANT: '''
with paddle.no_grad():
    with paddle.amp.auto_cast(False, level='O2', dtype='float16'):
        while True:
            if input_text == 'exit':
                break
            inputs = tokenizer(
                prompt,
                return_tensors="pd",
                return_attention_mask=True,
                return_position_ids=True
            )
            outputs = model. generate(
                input_ids = inputs. input_ids,
                attention_mask = inputs. attention_mask,
                position_ids = inputs. position_ids,
                max_length=2048-inputs.input_ids.shape[1],
                min_length=0,
                decode_strategy="sampling",
                temperature=0.8,
                top_k=40,
                top_p=0.95,
                repetition_penalty=1.1,
                bos_token_id=tokenizer.bos_token_id,
                eos_token_id=tokenizer.eos_token_id,
                pad_token_id=tokenizer.pad_token_id,
                use_cache=True,
                use_fast=True,
                use_fp16_decoding=True)
            response = tokenizer.decode(outputs[0][0], skip_special_tokens=True)
            print('ASSISTANT: ' + response)
            input_text = input('USER: ')
            prompt + = f'''{<!-- -->response}\
\
USER: {<!-- -->input_text}\
\
ASSISTANT: '''
            del inputs
            del outputs
            del response
            paddle.device.cuda.empty_cache()
USER: Hello, who are you?


ASSISTANT: Hello! I am Assistant, a language model.


USER: Write a poem about artificial intelligence


ASSISTANT: Artificial intelligence is so amazing
it can think about complex problems
Possess a wealth of knowledge and information
can help us solve the problem

However, it also has limitations
Inability to feel emotion and real experience
just a highly intelligent machine

So, although we can use it to get various services
But we also have to realize it's not a human replacement
It is an important means of co-existing with human beings

Therefore, we should learn how to work with
Take advantage of its advantages and overcome its disadvantages
make this world a better place
This is the mission of artificial intelligence


USER: Help me write a code to read jpg images using python and opencv.


ASSISTANT: When you want to read JPEG images, you can use the PIL library or OpenCV library in Python. Here is a sample code for reading a JPEG image using the OpenCV library:
```python
import cv2

# open the image
image = cv2.imread('example.jpg')

# display image
cv2.imshow('Image', image)
cv2.waitKey(0)
cv2.destroyAllWindows()
```
In the above code, `cv2.imread()` function is used to read JPEG image and save it in `image` variable. Then, you can use the `cv2.imshow()` function to display the image on the console, and use the `cv2.waitKey()` function to wait for the user to press a key event. Finally, you can use the `cv2.destroyAllWindows()` function to release the image and close the graphics window.


USER: Where is the capital of China?


ASSISTANT: The capital of China is Beijing.


USER: What about America?


ASSISTANT: The capital of the United States is Washington.


USER: exit

Summary

  • Through the above dialogue example, it can be seen that the effect of Vicuna 13B is not bad. It can understand Chinese better, and complete basic dialogues with the context and users, and can also complete simple code writing.

  • However, Vinuca has not achieved amazing results, and there is still a certain gap with business models such as Wenxinyiyan and ChatGPT.

This article is moved
Original project link