Personal blog: Sekyoro’s blog cabin
Personal website: Proanimer’s personal website
This is a very popular topic during this period. Large models have many parameters and occupy a large space, which is difficult to train and generally requires fine-tuning technology for specific tasks.
AnimeBot.ipynb – Colaboratory (google.com)My complete code
What is large model LLM
LLM, short for Large Language Model, is the latest innovation in artificial intelligence and machine learning. In December 2022, with the release of ChatGPT, this powerful new artificial intelligence went viral on the Internet. For those open-minded enough to live outside the buzz of artificial intelligence and the tech news cycle, ChatGPT is a chat interface running on an LLM called GPT-3.
The latest big models are Meta’s llama2, of course openai’s GPT4, Google’s PaLM2. In China, there are Tsinghua’s ChatGLM and so on.
Large model fine-tuning is to change its parameters or some layers on this basis to better cope with some downstream tasks. When you want to adapt a pre-existing model to a specific task or field, fine-tuning the model is crucial in machine learning. The decision to fine-tune your model depends on your goals, which are often domain or task specific.
There are many techniques for fine-tuning now. These techniques are all designed to solve their own specified tasks and generally require specific data.
There are generally three methods involved. Prompt Engineering, embedding and finetune are fine-tuning.
Prompt Engineering
To put it simply, it means giving some known information in advance when talking to the model.
This approach is simple, but due to the limitations of prompt size and associated costs of passing large text to LLM, using large document sets or web pages as input to LLM is not optimal.
Embeddings
Embedding is a way of representing information, whether text, images or audio, into digital form
Embedding works well when a large number of documents or web pages need to be passed to LLM. This approach works well, for example, when a chatbot is built to provide users with responses to a set of policy documents.
When using it, text and other content needs to be generated into embedding, which requires the seq2seq model to be embedded. When the user wants to query LLM, the embedding will be retrieved from the vector storage and passed to LLM. LLM uses embedding to generate responses from custom data.
Fine tuning
Fine-tuning is a way of teaching a model how to handle input queries and how to represent responses. For example, LLM can be fine-tuned by providing data on customer reviews and corresponding sentiments.
Fine-tuning is typically used to tune the LLM for a specific task and obtain a response within that range. The task could be email classification, sentiment analysis, entity extraction, generating product descriptions based on specifications, etc.
Specific fine-tuning technologies include Lora, QLora, Peft, etc.
Fine tuning technology
old school
In the old-school approach, there are various ways to fine-tune pre-trained language models, each tailored to specific needs and resource constraints.
- Feature-based: It uses pre-trained LLM as feature extractor to convert the input text into a fixed-size array. A separate classifier network predicts the classification probability of text in NLP tasks. During training, only the weights of the classifier change, which makes it resource-friendly but potentially less performant.
- Fine-tuning I: Fine-tuning I enhances the pre-trained LLM by adding additional dense layers. During training, only the weights of newly added layers are adjusted while keeping the pre-trained LLM weights frozen. In experiments,it shows slightly better performancethan feature-based methods.
- Fine-tuning II: In this approach, the entire model, including the pre-trained language model (LLM), is unfrozen for training, allowing all model weights to be updated. However, it can lead tocatastrophic forgetting, where new features overwrite old knowledge. Trim II is resource intensive but provides superior results when maximum performance is required. General language model fine-tuning
- ULMFiT is a transfer learning method that can be applied to NLP tasks. It involves a 3-layer AWD-LSTM architecture for representation. ULMFiT is a method for fine-tuning pre-trained language models for specific downstream tasks.
- Gradient-based parameter importance ranking: These methods are used to rank the importance of features or parameters in a model. In gradient-based ranking, the importance of a parameter depends on how much accuracy decreases when excluding the parameter. In random forest-based ranking, the impurity reduction for each feature can be averaged and the features ranked based on this metric.
Leading-edge strategy for LLM fine-tuning
- Low-Rank Adaptation (LoRA): LoRA is a technique for fine-tuning large language models. It uses low-rank approximation methods to reduce the computational and financial costs of adapting models with billions of parameters, such as GPT-3, to specific tasks or domains.
- Quantized LoRA (QLoRA): QLoRA is an efficient fine-tuning method for large language models (LLMs) that significantly reduces memory usage while maintaining full 16-bit fine-tuning performance. It achieves this by backpropagating the gradients of a frozen 4-bit quantized pretrained language model into a low-rank adapter.
- Parameter Efficient Fine-tuning (PEFT): PEFT is an NLP technology that reduces computing and storage costs by fine-tuning only a small set of parameters, allowing pre-trained language models to effectively adapt to various applications. It eliminates catastrophic forgetting, tunes key parameters for specific tasks, and delivers performance comparable to comprehensive fine-tuning of modes such as image classification and stable diffusion dreambooth. This is a valuable approach to achieve high performance with minimal trainable parameters.
- DeepSpeed: DeepSpeed is a deep learning software library for accelerating the training of large language models. It includes ZeRO (Zero Redundancy Optimizer), a memory-efficient approach to distributed training. DeepSpeed can automatically optimize fine-tuning jobs using Hugging Face’s Trainer API and provide an alternative script to run existing fine-tuning scripts.
- ZeRO: ZeRO is a set of memory optimization technologies that enable efficient training of large models with trillions of parameters, such as GPT-2 and Turing NLG 17B. A major attraction of ZeRO is that no model code modifications are required. This is a memory-efficient form of data parallelism that allows you to access the aggregated GPU memory of all available GPU devices without the inefficiencies caused by data copying in data parallelism.
Nowadays, lora and its derivative methods and PEFT are generally used.
You can make the data set for fine-tuning yourself or find it everywhere, such as hugging face or Google dataset or github.
As for the model, it is generally called directly using tool libraries such as hugging face or langchain. There is no need to download it manually. After obtaining general language or other types of data, preprocessing steps such as embedding are generally required. The embedding model generally needs to be consistent with the model that handles the task. There must be a corresponding relationship.
Next, Hugging Face’s transformers and other libraries are used to fine-tune large models. AutoModel
, AutoTokenizer
and AutoConfig
are often used, by calling from_pretrained
code>Get relevant information. The following is the general training process.
Training process
# Transformers installation pip install transformers datasets # To install from source instead of the last release, comment the command above and uncomment the following one. pip install git + https://github.com/huggingface/transformers.git
from datasets import load_dataset from transformers import AutoTokenizer from transformers import AutoModelForSequenceClassification from transformers import TrainingArguments dataset = load_dataset("yelp_review_full") #dataset["train"][100] tokenizer = AutoTokenizer.from_pretrained("bert-base-cased") def tokenize_function(examples): return tokenizer(examples["text"], padding="max_length", truncation=True) tokenized_datasets = dataset.map(tokenize_function, batched=True) small_train_dataset = tokenized_datasets["train"].shuffle(seed=42).select(range(1000)) small_eval_dataset = tokenized_datasets["test"].shuffle(seed=42).select(range(1000)) model = AutoModelForSequenceClassification.from_pretrained("bert-base-cased", num_labels=5) training_args = TrainingArguments(output_dir="test_trainer") trainer = Trainer( model=model, args=training_args, train_dataset=small_train_dataset, eval_dataset=small_eval_dataset, compute_metrics=compute_metrics, ) trainer.train()
The above compute_metrics
is used to evaluate the model. training_args
is the parameter set during training.
import numpy as np import evaluate metric = evaluate.load("accuracy") def compute_metrics(eval_pred): logits, labels = eval_pred predictions = np.argmax(logits, axis=-1) return metric.compute(predictions=predictions, references=labels)
You can use trainer.push_to_hub()
to push to your own warehouse. This will automatically add the training hyperparameters, training results and framework version to your model card
PEFT training adapters
Adapters trained with PEFT are also typically an order of magnitude smaller than full models, making them easier to share, store, and load. Usually paired with Lora model.
from transformers import AutoModelForCausalLM, AutoTokenizer peft_model_id = "ybelkada/opt-350m-lora" model = AutoModelForCausalLM.from_pretrained(peft_model_id)
To load and use the PEFT adapter type, please ensure that the Hub repository or local directory contains the adapter_config.json file and adapter weights.
You can also load the basic model first and then use load_adapter
from transformers import AutoModelForCausalLM, AutoTokenizer model_id = "facebook/opt-350m" peft_model_id = "ybelkada/opt-350m-lora" model = AutoModelForCausalLM.from_pretrained(model_id) model.load_adapter(peft_model_id)
load_in_8bit
and device_map
relate to where to place the model and how much it occupies.
Add adapter
from transformers import AutoModelForCausalLM, OPTForCausalLM, AutoTokenizer from peft import PeftConfig model_id = "facebook/opt-350m" model = AutoModelForCausalLM.from_pretrained(model_id) lora_config = LoraConfig( target_modules=["q_proj", "k_proj"], init_lora_weights=False ) model.add_adapter(lora_config, adapter_name="adapter_1")
Train an adapter
from peft import LoraConfig peft_config = LoraConfig( lora_alpha=16, lora_dropout=0.1, r=64, bias="none", task_type="CAUSAL_LM", ) model.add_adapter(peft_config) trainer = Trainer(model=model, ...) trainer.train() model.save_pretrained(save_dir) model = AutoModelForCausalLM.from_pretrained(save_dir)
Each PEFT method is defined by the PeftConfig
class, which stores all the important parameters used to build the PeftModel
.
from peft import LoraConfig, TaskType peft_config = LoraConfig(task_type=TaskType.SEQ_2_SEQ_LM, inference_mode=False, r=8, lora_alpha=32, lora_dropout=0.1) peft_config = LoraConfig( r=lora_r, lora_alpha=lora_alpha, lora_dropout=lora_dropout, target_modules=lora_target_modules, bias="none", task_type="CAUSAL_LM", )
Use the get_peft_model
function to wrap the basic model and peft_config to create a PeftModel. And use print_trainable_parameters
to print the parameters that need to be updated.
from transformers import AutoModelForSeq2SeqLM from peft import get_peft_model model_name_or_path = "bigscience/mt0-large" tokenizer_name_or_path = "bigscience/mt0-large" model = AutoModelForSeq2SeqLM.from_pretrained(model_name_or_path) model = get_peft_model(model, peft_config) model.print_trainable_parameters()
Save and push the model to the warehouse
model.save_pretrained("output_dir") model.push_to_hub("my_awesome_peft_model")
This only saves incremental trained PEFT weights, meaning it’s very efficient at storing, transferring and loading. For example, the bigscience/To_3B model trained using LoRA on the twitter_complaints subset of the RAFT dataset contains only two files: adapter_config.json and adapter_model.bin.
Download model
The logic of the following method is to first obtain the configuration of peft through PeftConfig, obtain the location of the basic model, use the basic model to obtain its model and tokenizer, and finally use PeftModel to obtain the model.
from transformers import AutoModelForSeq2SeqLM from peft import PeftModel, PeftConfig peft_model_id = "smangrul/twitter_complaints_bigscience_T0_3B_LORA_SEQ_2_SEQ_LM" config = PeftConfig.from_pretrained(peft_model_id) model = AutoModelForSeq2SeqLM.from_pretrained(config.base_model_name_or_path) model = PeftModel.from_pretrained(model, peft_model_id) tokenizer = AutoTokenizer.from_pretrained(config.base_model_name_or_path) model = model.to(device) model.eval() inputs = tokenizer("Tweet text : @HondaCustSvc Your customer service has been horrible during the recall process. I will never purchase a Honda again. Label :", return_tensors="pt") with torch.no_grad(): outputs = model.generate(input_ids=inputs["input_ids"].to("cuda"), max_new_tokens=10) print(tokenizer.batch_decode(outputs.detach().cpu().numpy(), skip_special_tokens=True)[0]) 'complaint'
You can also simply use
from peft import AutoPeftModelForCausalLM peft_model = AutoPeftModelForCausalLM.from_pretrained("ybelkada/opt-350m-lora") from peft import AutoPeftModel model = AutoPeftModel.from_pretrained(peft_model_id)
Practical combat
Download required packages
Generally, they are hugging face transformers, datasets and xformers, accelerate, trl, bitsandbytes, peft and other libraries
!pip install -Uqqq pip --progress-bar off !pip install -qqq torch==2.0.1 --progress-bar off !pip install -qqq transformers==4.32.1 --progress-bar off !pip install -qqq datasets==2.14.4 --progress-bar off !pip install -qqq peft==0.5.0 --progress-bar off !pip install -qqq bitsandbytes==0.41.1 --progress-bar off !pip install -qqq trl==0.7.1 --progress-bar off
Data processing
There are many ways to process data and there are many implementation methods. Here we mainly use pandas and datasets to process csv data.
animes_dataset = load_dataset("csv", data_files = "/content/animes.csv") reviews_dataset = load_dataset("csv", data_files = "/content/reviews.csv") animes_df = pd.DataFrame(animes_dataset["train"]) reviews_df = pd.DataFrame(reviews_dataset["train"]) merged_df = pd.merge(animes_df,reviews_df,left_on="uid",right_on="anime_uid") # remove /n/r def clean_text(x): #remove multiple whitespace new_string = str(x).strip() pattern = r"\s{3,}" new_string = re.sub(pattern, " ", new_string) #remove \r \ \t pattern = r"[\ \r\t]" new_string = re.sub(pattern,"", new_string) return new_string merged_df["synopsis"] = merged_df["synopsis"].map(clean_text) merged_df["text"] = merged_df["text"].map(clean_text) # split merged_df into train and test train_df, test_df = train_test_split(merged_df, test_size=0.1, random_state=42) dataset_dict = DatasetDict({<!-- --> "train": Dataset.from_pandas(train_df), "validation": Dataset.from_pandas(test_df) }) DEFAULT_SYSTEM_PROMPT = "Below is a name of an anime,write some intro about it" #@param {type:"string"} DEFAULT_SYSTEM_PROMPT = DEFAULT_SYSTEM_PROMPT.strip() def generate_training_prompt(data_point): # Remove square brackets and spaces from the string genres = data_point["genre"].strip("[]").replace(" ", "").replace("'","") synopsis_len = len(data_point["synopsis"]) split_len = random.randint(1,synopsis_len) synopsis_input = data_point["synopsis"][1:split_len] input = data_point["title"] + genres + synopsis_input output = data_point["synopsis"] + data_point["text"] return {<!-- --> "text":f"""### Instruction: {<!-- -->DEFAULT_SYSTEM_PROMPT} ### Input: {<!-- -->input.strip()} ### Response: {<!-- -->output.strip()} """.strip() } def process_dataset(data: Dataset): return ( data.shuffle(seed=42) .map(generate_training_prompt) .remove_columns( [ "uid_x", "aired", "members", "img_url", "uid_y", "profile", "anime_uid", "score_y", "link_y" ] ) ) dataset_dict["train"] = process_dataset(dataset_dict["train"]) dataset_dict["validation"] = process_dataset(dataset_dict["validation"])
The processing logic here is actually complicated. You only need to use pandas to read the data, then divide it into a training set and a test set and then convert it to a Dataset. In the middle, you need to remove some blank characters from the dataframe data.
Training settings
Due to the use of PEFT
lora_r = 16 lora_alpha = 64 lora_dropout = 0.1 lora_target_modules = [ "q_proj", "up_proj", "o_proj", "k_proj", "down_proj", "gate_proj", "v_proj", ] peft_config = LoraConfig( r=lora_r, lora_alpha=lora_alpha, lora_dropout=lora_dropout, target_modules=lora_target_modules, bias="none", task_type="CAUSAL_LM", )
Set trainingArgument and use trl for training.
OUTPUT_DIR = "experiments" training_arguments = TrainingArguments( per_device_train_batch_size=4, gradient_accumulation_steps=4, optim="paged_adamw_32bit", logging_steps=1, learning_rate=1e-4, fp16=True, max_grad_norm=0.3, num_train_epochs=2, evaluation_strategy="steps", eval_steps=0.2, warmup_ratio=0.05, save_strategy="epoch", group_by_length=True, output_dir=OUTPUT_DIR, report_to="tensorboard", save_safetensors=True, lr_scheduler_type="cosine", seed=42, ) trainer = SFTTrainer( model=model, train_dataset=dataset["train"], eval_dataset=dataset["validation"], peft_config=peft_config, dataset_text_field="text", max_seq_length=4096, tokenizer=tokenizer, args=training_arguments, )
Training and follow-up evaluation tests
trainer.train() from peft import AutoPeftModelForCausalLM # Load Lora adapter # model = PeftModel.from_pretrained( # base_model, # "/content/Finetuned_adapter", # ) # merged_model = model.merge_and_unload() trained_model = AutoPeftModelForCausalLM.from_pretrained( OUTPUT_DIR, low_cpu_mem_usage=True, ) merged_model = base_model.merge_and_unload() merged_model.save_pretrained("merged_model", safe_serialization=True) tokenizer.save_pretrained("merged_model") # trainer.push_to_hub("anime_chatbot") merged_model.push_to_hub("anime_chatbot") print("Pushed to hub") # @title test fine tune model # @title test base model DEFAULT_SYSTEM_PROMPT = "Below is a name of an anime,write some intro about it" #@param {type:"string"} DEFAULT_SYSTEM_PROMPT = DEFAULT_SYSTEM_PROMPT.strip() user_prompt = lambda input:f"""### Instruction: {<!-- -->DEFAULT_SYSTEM_PROMPT} ### Input: {<!-- -->input.strip()} ### Response: """.strip() pipe = pipeline('text-generation',model=merged_model,tokenizer=tokenizer,max_length=150) result = pipe(user_prompt("please introduce shingekinokyojin")) print(result[0]['generated_text'])
Attention
from transformers import AutoModelForSeq2SeqLM import torch model_base = AutoModelForCausalLM.from_pretrained("facebook/opt-350m", torch_dtype=torch.bfloat16) tokenizer = AutoTokenizer.from_pretrained("facebook/opt-350m")
Here model_base is
OPTForCausalLM( (model): OPTModel( (decoder): OPTDecoder( (embed_tokens): Embedding(50272, 512, padding_idx=1) (embed_positions): OPTLearnedPositionalEmbedding(2050, 1024) (project_out): Linear(in_features=1024, out_features=512, bias=False) (project_in): Linear(in_features=512, out_features=1024, bias=False) (layers): ModuleList( (0-23): 24 x OPTDecoderLayer( (self_attn): OPTAttention( (k_proj): Linear(in_features=1024, out_features=1024, bias=True) (v_proj): Linear(in_features=1024, out_features=1024, bias=True) (q_proj): Linear(in_features=1024, out_features=1024, bias=True) (out_proj): Linear(in_features=1024, out_features=1024, bias=True) ) (activation_fn): ReLU() (self_attn_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True) (fc1): Linear(in_features=1024, out_features=4096, bias=True) (fc2): Linear(in_features=4096, out_features=1024, bias=True) (final_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True) ) ) ) ) (lm_head): Linear(in_features=512, out_features=50272, bias=False) )
from peft import get_peft_model lora_config = LoraConfig( target_modules=["q_proj", "k_proj"], init_lora_weights=False ) peft_model = get_peft_model(peft_model_base, lora_config) peft_model.print_trainable_parameters()
Use lora_config to get peft_model
PeftModel( (base_model): LoraModel( (model): OPTForCausalLM( (model): OPTModel( (decoder): OPTDecoder( (embed_tokens): Embedding(50272, 512, padding_idx=1) (embed_positions): OPTLearnedPositionalEmbedding(2050, 1024) (project_out): Linear(in_features=1024, out_features=512, bias=False) (project_in): Linear(in_features=512, out_features=1024, bias=False) (layers): ModuleList( (0-23): 24 x OPTDecoderLayer( (self_attn): OPTAttention( (k_proj): Linear( in_features=1024, out_features=1024, bias=True (lora_dropout): ModuleDict( (default): Identity() ) (lora_A): ModuleDict( (default): Linear(in_features=1024, out_features=8, bias=False) ) (lora_B): ModuleDict( (default): Linear(in_features=8, out_features=1024, bias=False) ) (lora_embedding_A): ParameterDict() (lora_embedding_B): ParameterDict() ) (v_proj): Linear(in_features=1024, out_features=1024, bias=True) (q_proj): Linear( in_features=1024, out_features=1024, bias=True (lora_dropout): ModuleDict( (default): Identity() ) (lora_A): ModuleDict( (default): Linear(in_features=1024, out_features=8, bias=False) ) (lora_B): ModuleDict( (default): Linear(in_features=8, out_features=1024, bias=False) ) (lora_embedding_A): ParameterDict() (lora_embedding_B): ParameterDict() ) (out_proj): Linear(in_features=1024, out_features=1024, bias=True) ) (activation_fn): ReLU() (self_attn_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True) (fc1): Linear(in_features=1024, out_features=4096, bias=True) (fc2): Linear(in_features=4096, out_features=1024, bias=True) (final_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True) ) ) ) ) (lm_head): Linear(in_features=512, out_features=50272, bias=False) ) ) )
Use peft_model.merge_and_unload()
to get the fused model
OPTForCausalLM( (model): OPTModel( (decoder): OPTDecoder( (embed_tokens): Embedding(50272, 512, padding_idx=1) (embed_positions): OPTLearnedPositionalEmbedding(2050, 1024) (project_out): Linear(in_features=1024, out_features=512, bias=False) (project_in): Linear(in_features=512, out_features=1024, bias=False) (layers): ModuleList( (0-23): 24 x OPTDecoderLayer( (self_attn): OPTAttention( (k_proj): Linear(in_features=1024, out_features=1024, bias=True) (v_proj): Linear(in_features=1024, out_features=1024, bias=True) (q_proj): Linear(in_features=1024, out_features=1024, bias=True) (out_proj): Linear(in_features=1024, out_features=1024, bias=True) ) (activation_fn): ReLU() (self_attn_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True) (fc1): Linear(in_features=1024, out_features=4096, bias=True) (fc2): Linear(in_features=4096, out_features=1024, bias=True) (final_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True) ) ) ) ) (lm_head): Linear(in_features=512, out_features=50272, bias=False) )
Some problems encountered
- Data set processing, how to write fine-tuned templates
Examples found
14.fine-tuning-llama-2-7b-on-custom-dataset.ipynb – Colaboratory (google.com)
Fine_tuned_Llama_PEFT_QLora.ipynb – Colaboratory (google.com)
Use a template during training
DEFAULT_SYSTEM_PROMPT = """ Below is a conversation between a human and an AI agent. Write a summary of the conversation. """.strip() def generate_training_prompt( conversation: str, summary: str, system_prompt: str = DEFAULT_SYSTEM_PROMPT ) -> str: return f"""### Instruction: {<!-- -->system_prompt} ### Input: {<!-- -->conversation.strip()} ### Response: {<!-- -->summary} """.strip()
during testing
def generate_prompt( conversation: str, system_prompt: str = DEFAULT_SYSTEM_PROMPT ) -> str: return f"""### Instruction: {<!-- -->system_prompt} ### Input: {<!-- -->conversation.strip()} ### Response: """.strip()
-
Is the model obtained after training a peftmodel or what type of model it is?
One way is
repo_id = "meta-llama/Llama-2-7b-chat-hf" use_ram_optimized_load=False base_model = AutoModelForCausalLM.from_pretrained( repo_id, device_map='auto', trust_remote_code=True, ) base_model.config.use_cache = False
base_model
is a LlamaForCausalLM
class, used after training
trainer.save_model("Finetuned_adapter")
Save the model, and then use PeftModel.from_pretrained
to get the PeftModel
model = PeftModel.from_pretrained( base_model, "/content/Finetuned_adapter", ) merged_model = model.merge_and_unload()
Then save the model
merged_model.save_pretrained("/content/Merged_model") tokenizer.save_pretrained("/content/Merged_model")
The other is to use AutoPeftModelForCausalLM
from peft import AutoPeftModelForCausalLM trained_model = AutoPeftModelForCausalLM.from_pretrained( OUTPUT_DIR, low_cpu_mem_usage=True, ) merged_model = model.merge_and_unload() merged_model.save_pretrained("merged_model", safe_serialization=True) tokenizer.save_pretrained("merged_model")
Reference materials
- Training Large Language Model (LLM) on your data | by Mohit Soni | Walmart Global Tech Blog | Aug, 2023 | Medium
- A Practical Introduction to LLMs | By: Shawhin Talebi | Towards Data Science
- The Ultimate Guide to LLM Fine Tuning: Best Practices & Tools | Lakera – Protecting AI teams that disrupt the world.
- tutorial https://learn.deeplearning.ai/finetuning-large-language-models
If you have any questions, you are welcome to communicate!
Server configuration
Pagoda: Pagoda server panel, one-click all-round deployment and management
Cloud server: Alibaba Cloud Server
Vultr server
GPU server:Vast.ai