Detailed comparative evaluation of ChatGPT Alpha and GPT-3.5 and GPT-4

Author: Rainy, WeChat: cntheai

Learn AI here. For more AI consultation and tutorials, please visit the THEAI forum: https://www.cntheai.com

Note: The manuscript was updated on November 7, 2023. Because the ChatGPT function is also iterating, as time goes by, some of the functions under evaluation may be different from those currently used, and the output results of the AI function will have certain differences. It is random, so there may be some problems in the evaluation. If there are any mistakes, you are welcome to criticize and correct me.

Recently, ChatGPT has pushed a new Alpha mode to users for free on a small scale. The new mode integrates the GPT-4 (All Tools) function. Users can use GPT4-32k without restrictions, and can also use functions such as drawing, networking, and file analysis. Below we Conduct a detailed comparative evaluation of Alpha with GPT-3.5 and GPT-4.

What is GPT-4 (All Tools)?
GPT-4 (All Tools) is the latest artificial intelligence model launched by OpenAI, which combines the capabilities of natural language understanding and generation, and expands the capabilities of its predecessor GPT-3. This version includes a series of enhanced tools, such as:

  1. Image generation (DALL-E): Ability to generate images based on text descriptions.
  2. Code Execution (Python): Able to execute Python code, providing programming and data analysis capabilities.
  3. Web browser (Browser): Allows AI models to browse the web, search for information, and reference content without direct Internet access.
  4. MyFiles Browser: Ability to browse and analyze files uploaded to conversations.

GPT-4 (All Tools) provides more comprehensive services through these tools, which can provide assistance in a variety of tasks, such as image creation, programming question answering, data analysis, web search, and file content understanding. It is designed to be smarter, more flexible, and more efficient at handling complex tasks.

At the time of writing, ChatGPT has released some updates. The most intuitive change is the redesign of the interface, but this change only applies to some accounts. I used two ChatGPT Plus accounts to log in, and the interfaces are different.

After logging in, the new interface directly integrates Browse with Bing and Advanced Data Analysis, two functions that were previously available to some Plus users.

In order to facilitate screenshot comparison here, we continue to select GPT-4 of the old interface for comparison.

### 1. Data deadline

We asked the model based on the model data deadline. The GPT-3.5 deadline is January 2022, and the Alpha and GPT-4 model deadlines are both April 2023. It can be speculated that Alpha calls the GPT-4 model, but the specific functions of the language still need to be compared in detail with examples.

2. Comparison of word processing

Here is the most commonly used question on the Internet to distinguish whether Alpha is calling the GPT-4 model. Use Why did Lu Xun beat Zhou Shuren to ask the question.

Through testing, we found that GPT-3.5 regarded Lu Xun and Zhou Shuren as two people, while both Alpha and GPT-4 could correctly identify Lu Xun and Zhou Shuren as one person. A rough look shows that Alpha is indeed using GPT-4.

But in order to better test the word processing ability, we continue to test GPT-3.5, Alpha and GPT-4. We use the following prompt words . Your task is to use the article structure of the Xiaohongshu blogger and my Write a post recommendation on the given topic. Your responses should include the use of emojis to add interest and interactivity, as well as images to match each paragraph. Start with an engaging introduction to set the tone for your recommendation. Then, provide at least three paragraphs related to the topic that highlight their unique features and appeal. Use emojis in your writing to make it more engaging and interesting. For each paragraph, please provide an image that matches the description. These images should be visually appealing and help your description come to life. The topic I gave is: [Anta wants crazy basketball shoes] Test your ability.

By comparison, GPT-3.5 still has a gap in word processing capabilities compared to Alpha, the old version of GPT-4, and the new version of GPT-4 (All Tools). Alpha and the new version of GPT-4 can directly call DALL·E 3 to generate related pictures. The old version GPT-4 cannot directly export images.

As far as Alpha is concerned, the word processing quality is still relatively excellent. The integration of GPT-4 (All Tools) seems to be more than just talk. Word processing should be driven by GPT-4.

3. Image generation comparison

Next, we use Draw me a picture, panda eating bamboo, animation style as the prompt word to test the picture output function of the three modes.

As can be seen from the figure below, GPT-3.5 cannot produce images. Both Alpha and GPT-4 call DALL·E 3, and the image quality is similar.

4. Comparison of picture recognition ability

GTP-3.5 does not support file upload, but Alapha and GPT-4 support image upload. The image format supports jpg, png, and webp. Drag the image directly to the dialog box to upload.

Upload a picture here and ask Please tell me the address where this picture was taken to test the image recognition capabilities of Alpha and GPT-4.

It can be seen that both can correctly identify that the picture shooting address is West Lake, Hangzhou, but the output language of Alpha is English and GPT-4 is Chinese.

Then we upload another picture, ask What fruits are in this picture, and test again.

This time, both correctly identified the fruits in the picture, and both output them in Chinese. Through testing, it was felt that the output languages of Alpha and GPT-4, English and Chinese, have a certain degree of randomness, but the probability of GPT-4 Chinese output is much higher. In daily use, whether using Alpha or GPT-4, it is recommended to add the sentence Output in Chinese after the prompt word.

5. Comparison of networking functions

We tested the networking function with the question Please tell me today's date and the weather in Shanghai.

As can be seen from the figure below:

GPT-3.5 cannot connect to the Internet.

Alphapha directly calls the Bing search engine to connect to the Internet. However, if the prompt word does not specify Chinese output, in many cases it will be output in English. You need to enter the command again to translate it into Chinese. This problem should be because Alphapha uses Bing search to preferentially search English websites when connecting to the Internet. , click the superscript 1 at the end of the output text to directly access the data retrieval website. When we click 1, we will find that the source website link is https://www.timeanddate.com/ weather/china/shanghai , is an English website.

In the old version of GPT-4, you can check the Browse with Bing function to connect directly to the Internet. Although the Browse with Bing function also uses Bing search to search the website, and the output text is also marked with a superscript 1 at the end, click 1 You will find that the source website link is still https://www.timeanddate.com/weather/china/shanghai, but the difference is that GPT-4 can be output directly in Chinese, and there will be no Alpha output in English. Phenomenon.

The new version of GPT-4 can be connected directly to the Internet, but the output results are sometimes in Chinese and sometimes in English.

Next we change the question to Please make a one-week travel plan for [Lijiang, Yunnan] provided by the user. I want you to plan a whole day for each day, including places to eat, activities and totally everything. I want you to write down an approximate amount of what they will spend on meals, activities, etc. I also want you to make hotel recommendations.

Let’s compare the results of Alphapha and the old version of GPT-4 Browse with Bing:


Both will also mark links to indexed websites with superscripts. The Alpha data source is for English websites, while the old GPT-4 Browse with Bing data source is for Chinese websites.

Although both Alpha and GPT-4 call Bing search, Alpha and the new version of GPT-4 are more likely to retrieve English websites, while the old version of GPT-4 Browse with Bing is more likely to retrieve Chinese websites. When outputting results, Alpha and the new version of GPT- 4 is often easy to output in English, while GPT-4 Browse with Bing basically outputs in Chinese.

When using the old version of GPT-4 Browse with Bing because the networking function is checked, all questions will be connected to the Internet. However, in Alpha and the new version of GPT-4, sometimes some questions will not trigger the networking function, and the output will most likely be in English. , so we need to consider optimizing the prompt word. Below I will share a prompt word I wrote. As for why it is written like this, I will write a special article for analysis later.

Use the networking function to search only Chinese websites, and do not translate keywords into English when searching. Search only in Chinese.
prompt
The output results are translated into Chinese output, do not use English.

Add before the prompt word (prompt) to use the networking function and only search Chinese websites. When searching, do not translate the keywords into English and search only in Chinese. , add at the end to translate the output result into Chinese output, do not use English. Control, so that the Chinese website will be retrieved with a high probability and the output will be in Chinese.

6. Document analysis and comparison

GPT-3.5 does not support document uploading. Alpha and the new version of GPT-4 support direct uploading of documents for analysis, while the old version of GPT-4 requires the Advanced Data Analysis function to be checked to support document uploading.

6.1 Document format support

Although Alpha, the old version of GPT-4 Advanced Data Analysis, and the new version of GPT-4 all support the upload of common formats such as doc, xls, ppt, pdf, txt, zip, rar, etc., the ability to read documents is not the same.

Next, we upload a paper with the suffix .epub. Although it can be uploaded successfully, neither Alpha nor the new version of GPT-4 can correctly read the content of the document and display an error, while the old version of GPT-4 Advanced Data Analysis successfully reads the document content.

Through a series of comparisons, it can be speculated that Alpha can be regarded as a test version of the new version of GPT-4. It is unknown how long the new version of GPT-4 and the old version of GPT-4 can coexist.

6.2 Document size limit

What is the maximum upload attachment supported by this language model? We asked Alpha, the old version of GPT-4 Advanced Data Analysis, and the new version of GPT-4 respectively, and were told that the maximum upload attachment supported is 25M.

However, through actual testing, when uploading a 30M file, Alpha, the old version of GPT-4 Advanced Data Analysis, and the new version of GPT-4 can all be uploaded successfully.

At the same time, I checked the official documentation of openai and did not see the value of the document size limit.

6.3 Comparison of analytical capabilities

Here we simulate a scenario where we need to refine and summarize the key points of a paper and make a PPT for presentation.

First upload a paper in doc format. This paper has a total of 28 pages and more than 6900 characters. At the same time, enter the following prompt word to help me extract all the titles of this article. The title must have hierarchical levels. All hierarchical titles must be titled. It should be consistent with the original text. The content of each title should be summarized into 200 words of text content, which can be used for ppt presentation and output in Chinese.

By summarizing the content, Alpha did not write anything based on the original document at all. The old version of GPT-4 Advanced Data Analysis tried a variety of methods and entered an infinite loop, while the new version of GPT-4 read the content of the document but did not summarize it completely.

Here we click Show work in the old version of GPT-4 Advanced Data Analysis, and we will find that the Python code is automatically executed every time.

This shows ChatGPT trying to distinguish titles based on font size.

Therefore, in our daily work, it is better to format the article as much as possible before analyzing it. In addition, try to upload doc and txt for analysis, because when uploading documents in other formats such as pdf, not only ChatGPT cannot distinguish typesetting based on font size, but also First, correctly identify the text, and reading text in doc and txt is much more convenient than other formats.

6.4 Test related issues

① Alpha’s document upload function is extremely unstable, and it often fails to upload, sometimes good or bad.

② The old version of GPT-4 Advanced Data Analysis cannot successfully read the doc format, but it can read the docx format.

③The new version of GPT-4 can successfully read the docx format, but cannot read the doc format.

Omissions are inevitable due to limited talent and knowledge. Criticisms or suggestions are welcome.

Also welcome to visit the THEAI forum: https://www.cntheai.com.