The relationship between the principle of super hot ChatGPT technology and us

Article directory

- - 1 Introduction to ChatGPT
  - 2 ChatGPT development history
  - 3 Principle of ChatGPT
  - 4 ChatGPT and our relationship
  - - 4.1 Relationship between ChatGPT and big data
    - 4.2 Relationship between ChatGPT and Java
  - 4.3 Opportunities in the ChatGPT Era
  - - 4.4 Problems with ChatGPT
    - 4.5 Thoughts on the development of ChatGPT

1 Introduction to ChatGPT

As we all know, recently, the large language model ChatGPT launched by the American artificial intelligence company OpenAI has become popular all over the world, and the domestic popularity continues to rise, and the IT industry is even more crazy. Accompanied by various ChatGPT voices, the voices of many ordinary people around me regard ChatGPT as a machine that can talk to humans or can replace many language-related workers, including of course some of our programmers. I think his appearance may be a turning point in the history of artificial intelligence development.

2 ChatGPT development history

ChatGPT is a derivative of a class of machine learning natural language processing models called Large Language Models (LLM). LLM features are:

Can digest large amounts of text data and infer relationships between words of the text.
These models have grown considerably over the past few years as computing power has improved.
As the input data set and parameter space continue to expand, the capabilities of the LLM will increase accordingly.

The most basic training of a language model involves predicting a word within a sequence of words. The most common, usually “next-token-prediction” (next-token-prediction), the goal is to predict the next possible word or token given some text. This task is the basis of the language model and can Used in text generation, automatic translation, speech recognition, etc.) and masked-language-modeling (masked-language-modeling, the main idea is to cover up some tags or words in the input text, and then let the model predict these covered tokens or words).

The figure above is an example of a basic sequence modeling technique, typically deployed with a Long Short-Term Memory (LSTM, a special type of Recurrent Neural Network RNN) model. Given the context, the LSTM model fills in the blanks with the word with the highest statistical probability. This sequence modeling structure has the following two main limitations:

The model fails to give more weight to certain contexts. In the above example, for example, “Jacob hates reading” mentioned above, the model may associate “reading” with “hates” by default, but in practical applications, if there is a character “Jacob” in the data, And in this data, “Jacob” likes reading very much, so when processing the sentence “Jacob hates reading”, the model should pay more attention to the information of “Jacob”, instead of simply relying on the relationship between “reading” and “hates” in the context. relationship to draw conclusions. Therefore, if the model only relies on the words in the context without fully considering the relationship between entities in the text, it may draw wrong conclusions in practical applications.
Second, LSTMs process input data sequentially and step-by-step rather than processing the entire corpus at once. This means that when training an LSTM, the context window size is fixed and can only be extended across a few steps of the sequence, not across the entire sequence. This approach limits the LSTM model to capture more complex relationships between words and derive more meaning from them.

In response to this problem, a team at Google Brain introduced transformers in 2017. Unlike LSTMs, transformers can process all input data simultaneously. Transformers are based on self-attention (self-attention, for each word, self-attention can strengthen or weaken the representation of the word by calculating the relationship strength between the word and other words, so as to better capture semantic information) mechanism, The model can assign different weights to different parts of the input data according to their relationship to any position in the language sequence. This feature is a huge improvement in infusing meaning into LLMs and supports processing larger datasets.

In 2018, OpenAI first launched the Generative Pre-training Transformer (model), code-named GPT-1.
In 2019, this model continued to evolve into GPT-2.
In 2020, the GPT model will evolve to GPT-3. Even InstructGPT and ChatGPT in November 2022. Before human feedback was integrated into the system, the biggest advances in the evolution of GPT models were driven by gains in computational efficiency, allowing GPT-3 to receive much more data than GPT-2 for training, giving it more A diverse knowledge base, and the ability to perform a wider range of tasks.

3 ChatGPT principle

ChatGPT is an upgrade of InstructGPT, and its novelty is to incorporate human feedback into the training process in order to better align the model output with the user’s intent. In 2022, OpenAI published the paper “Training language models to follow instructions with human feedback” (using human feedback instruction streams to train language models), introducing reinforcement learning using human feedback (Reinforcement Learning from Human Feedback, RLHF). This is the core idea of ChatGPT, and the specific principles are as follows:

Step1: Collect training data set and train model with supervised strategy (SFT)
- Sampling an appropriate amount of hint data from the hint dataset
- Labelers respond appropriately to prompts, creating a known output for each input. It is necessary to ensure the diversity and accuracy of the data set.
- The data set formed in the previous step is fine-tuned to the GPT model by means of supervised learning to establish GPT-3.5, also known as the SFT model.
Step2: Collect comparative data and train reward model (RM)
- After training the SFT model in Step1, after 1 hint for the series model, its output is sampled
- Label workers sort the output sampled data from best to worst to form multiple sets of new data sets, thereby increasing the diversity and generalization ability of the model
- Use the above grouped and sorted data sets to train the reward model
Step3: Use the PPO reinforcement learning method to assign the optimization model to the reward model
- Taking a new cue from the data set to write a story about Minase?
- Initialize the PPO (Proximal Policy Optimization Policy) model according to the supervised policy, which can be used to update the policy as the model generates responses
- For a new prompt, generate output according to the PPO policy
- Compute the reward using the reinforcement model for the output
- The reward is used to update the PPO strategy, then output, then reward, then update, and so on…

4 Relationship between ChatGPT and us

4.1 Relationship between ChatGPT and big data

In the working principle of GTP, each model requires a data set. In real application scenarios, the amount of data is relatively large, complex, and distributed across multiple servers. At this time, big data is required to be responsible for collection.
For the collected data, it is necessary to process the data according to the requirements of ChatGPT engineers, such as deduplication, filtering, selection, etc.
Any artificial intelligence product is a combination of models and data. High-quality data will greatly improve or promote the upgrade and use experience of the entire artificial intelligence product

4.2 Relationship between ChatGPT and Java

In the principle of ChatGPT, we know that there are many models in each step, and the models are related to each other, so how to apply the final model to production requires Java server personnel to distribute the analysis model and apply it to real data

4.3 Opportunities in the ChatGPT era

4.4 Problems with ChatGPT

During the test, some problems with ChatGPT were also found, and I would like to share them with you here.

? Figure 4-5 ChatGPT lacks the underlying logical reasoning ability when answering application questions [1]

First of all, the first question is simple logic, and I found that it can almost be solved. But the second question found that ChatGPT does not have the ability to reason about complex questions. Take this ratio problem as an example. Although it seems logical, it is actually nonsense. The reason why this paragraph can be generated, I guess is that I have seen similar word problems during the training process. But in fact, I didn’t understand the principle and answering process behind the application questions, that is, the reasoning process.

? Figure 5 ChatGPT has a template to ensure security when answering subjective questions [1]

For some issues that require subjective evaluation, certain templates are used in the training process for safety reasons, resulting in the generated replies having an obvious feeling of template generation.
How do I view China’s development? How do you view the development of the United States? These two screenshots have been harmonized. Instead, try the real estate question. You can also search for some topics in the field you care about to see if the answer is templated.

? Figure 6 ChatGPT lacks factuality detection

Lacking some factual checks, the author of Journey to the West was not Shi Naian, but the model did not identify it.

There are some answers that deviate from real business scenarios. Due to display or training reasons, the answers are relatively one-sided or not ideal:

? Figure 7 ChatGPT one-sidedness test

ChatGPT’s programming ability is not yet very deep, and some basic problems are not serious. It can be realized with a little modification or direct operation. It can indeed bring a programming pleasure:

The specific Python code is as follows:

def generate_triangle(numRows):
    triangle = []
    for i in range(numRows):
        row = []
        for j in range(i + 1):
            if j == 0 or j == i:
                row.append(1)
            else:
                row.append(triangle[i-1][j-1] + triangle[i-1][j])
        triangle.append(row)
    return triangle

def print_triangle(triangle):
    for row in triangle:
        print(" ".join([str(i) for i in row]).center(50))

numRows = int(input("Please enter the number of rows of Yang Hui's triangle:"))
triangle = generate_triangle(numRows)
print_triangle(triangle)

4.5 Thoughts on the development of ChatGPT

The performance of ChatGPT has surprised everyone, but also frightened everyone, so how should we make good use of ChatGPT and its ideas in good and bad?

Study some lower-level problems that apply to both large and small models

For example, how to improve the robustness and generalization ability of the model; how to improve the logical reasoning ability of the model, even if it is as strong as ChatGPT, it is still difficult to learn the underlying logic in some complex reasoning problems, and more often it is only from what has been seen Analogy and generation from past data.
Research some tasks that are combined with specific domains

Combining with other fields, such as medical, financial, biopharmaceutical and other fields, through the integration of specific knowledge in related fields, set up the model structure, incorporate some ingenuity, and do a good job in specific tasks. For example, scBERT, which I saw a while ago, is doing a task of judging cell types by using mRNA expression. By combining the characteristics of mRNA and related knowledge, we have designed unique category encoding, gene encoding and pre-training tasks. Pretrained models are introduced into this field.
Do data-centric tasks

OpenAI staff once pointed out that high-quality data is crucial when training large models. Ng Enda also proposed Data-centric AI (DCAI) in the past two years, shifting the focus from model development to the data level, and studying how to make limited data more and better.

References:

[1] Official website link: https://openai.com/blog/chatgpt

[2] 网络链接：https://mp.weixin.qq.com/s?__biz=MzI4MDYzNzg4Mw== & amp;mid=2247554744 & amp;idx=2 & amp;sn=3b93ca4720cd86fb13978d40a2c691c6 & amp;chksm=ebb72e6cdcc0a77a56a7ab0e1b315baf7801e418af0d1f88c0446dd25e93c8b50a6cdc471cb0 & amp;scene=27

[3] Network link: https://baijiahao.baidu.com/s?id=1758693674943354647 & amp;wfr=spider & amp;for=pc