ChatGPT’s access_token acquisition (latest!!!)

Get the access_token of ChatGPT (latest!!!)

Preface

Recently, I am engaged in mobile application development. I wrote a chatGPT app on a whim, but the interface can only be provided by the official website. My own account has no free quota, and my friend’s account only has a 5$ quota. No more, so I am going to meet this chatGPT for a while!

Login process analysis

1. Preparation

First copy the following link and enter, and open the packet capture tool (F12), as shown in the figure, delete the cookie

https://chat.openai.com/auth/login?iss=https://auth0.openai.com/

image-20230523214554803

After the deletion is complete, click the Log in button. At this time, view the network capture data as follows:

image-20230523214840870

We observed that the contents of the request are as follows: csrf, auth0?prompt=login, authorize?client_id=..., identifier?state=hKF..., so far our preparations have been completed.

2. Get csrfToken

Let’s take a look at it in order, first is csrf, at this time someone must ask why not look at providers? Ha, because it doesn’t work.

image-20230523215147871

Observation found that it was a GET request, but at this time I clicked the preview but could not see the data, so I went to ApiFox to test this interface and found that the request was a csrfToken, as shown below:

image-20230523215620416

Good guy, you are so secretive, you can’t even see you in the browser’s packet capture, I caught you!

3. Get authorize_url

In the previous step we got the csrfToken, so what is his use? At this time, we should look at the next auth0?prompt=login, click on this request, we can see the following picture:

image-20230523220526160

Note that this is the POST method. We look at the payload and find that it carries the following parameters:

image-20230523220629169

Isn’t this the surprise? The big csrfToken is here, understand? At this time, I clicked the preview to view the response data. Good guy, the response data could not be loaded, so I had to go to ApiFox to request, but I found that the return was a large string of html, hey, what’s going on? After my comparison, the headers were all the same, but I still couldn’t request it, so I tried it with python to request it, and when I printed the result, I was pleasantly surprised to find that there was a return value, and it was still a url!

image-20230523222805449

At this point, we can compare the url captured by the packet capture, and find that this url is the one in the list of authorize, nice! One step closer.

4. Get identify_url

With the authorize url obtained in the previous step, we click to view its content in the packet capture tool as shown below:

image-20230523223431582

As you can see, his status code is 302, what does this mean? The 302 status code represents redirection. You can see that there is a key in the response header called Location: , and the value behind it is the URL pointed to by the redirection. We look at the list name on the left and find that there is an identifier under authorize. After comparison, we find that this is the redirected link, so what we need to get in this step is the Location in the response header value. After being implemented in python, the url can be spliced as shown in the figure:

image-20230523223822154

So far, we have already completed half of the workload~ Surprised?

5. Get password_url

In the previous step, we got the identifier_url, and the content of the page is shown in the figure

image-20230523224218249

At this point, we enter our account, click continue, and at the same time view the content of the captured packet, as shown in the following figure:

image-20230523224404846

It can be found that this identifier (the link we just got) has performed a 302 redirection! I checked the payload and found that he brought the following parameters:

image-20230523224658687

We found that this link has a large string of state parameters behind our url, as well as the username we entered, and other parameters (I don’t know what it is, so don’t worry about filling it in). After reading the parameters, let’s take a look at the response header. At this time, you can see that there is a Location parameter in the response header, and the following string is the URL after redirection. You can find that this is the address of the string of URLs in the password below. Speaking of this, Do you feel like this is like a nesting doll hahaha, take your time step by step. Using the same python code, I successfully extracted the url string of his password.

6. Get resume_url

We first log in to our chatGPT account, and we can see the following list after capturing packets:

image-20230523225455237

Good guy, the password is still a redirection, and we also found that the link after the redirection is the string of resume, so we can do it in an analogy to the above identifier, and get the string of urls from resume.

7. Get auth0_url

We click on the url of resume to see the following picture:

image-20230523225830602

His redirected link is the auth0 link below him, chatGPT is really good for you, endless nesting dolls! In the same way, we continue to get the link of auth0.

8. Get chat_openai_url

Continuing to deduce, we can also request the url of chat.openai.com. So far, we have been clear about the whole process from login to entering the page. At this time, everyone should be more confused. What is the use of getting an html in the end? What about my token? Don’t worry, the good show will come soon.

Interface jump reveal

Please clear your cookies before reading the following content! ! ! If you just followed the above 8-step process, then you have to delete the cookie now, and how to delete it is introduced in the preparation work.

Officially started

Let’s go back to the beginning and think carefully about whether I meant to clear the cookies, yes! The most critical point is this cookie. Go back to the url of login, check the cookie at this time, we can find:

image-20230523233240315

Hey, there is a cookie attached to the request response. When we look at the header content, we can see the key name of set-cookie, and that is the response cookie shown in the picture. Of course, this doesn’t show anything. Let’s look at the next session can be found:

image-20230523233317820

Oh, the link to this request also has a response cookie, which is interesting, then let’s continue to look at the cookie attached to csrf, as shown in the figure:

image-20230523233347343

interesting! Layer by layer, chatGPT you know nesting dolls. We continue to investigate one by one, as follows:

image-20230523233358619

image-20230523233408719

Here you will find that there is no cookie this time, but the response cookie is received, continue to look down:

image-20230523233423090

Here you will find out how to request only these few cookies. Where did the previous cookies such as _dd_s, _cfuvid, etc. go? You can understand that the cookie has a list, similar to a map, with key-value pairs. When requesting, you only need to get the ones I need to use.

image-20230523233438926

image-20230523233456601

It is found here that __cf_bm already exists, so where is the __cf_bm obtained in the response? That’s right, the original content was directly replaced. Did you suddenly realize it when you talked about it? Below I will release all the processes, and you can compare the cookies in each step by yourself.

image-20230523233510543

image-20230523233526189

image-20230523233536872

image-20230523233547331

image-20230523233715346

So far, it is all the process. At this time, the token we are most concerned about comes. Click on the response data of the session and we can see the following:

image-20230523233804275

woc! The token came, and after so many steps back and forth, we finally found the token. Presumably everyone is already familiar with the entire process from logging in to obtaining tokens. Simply put, we need to carry the necessary parameters and request headers at every step of the jump. The most important thing in the request header is the cookie, which affects every request. The next step is cookies, so let’s analyze and check step by step to figure out the whole process. The idea has been answered here, and the code is next.

Code Analysis

In the above disclosure, we mentioned cookies. We can simulate a browser and create a new cookie_list to store all cookies. At this time, we implement a function to add and update the content of cookie_list. The code is as follows:

# Store cookies, which can be added and updated. The cookies here refer to the cookies in the response header after the request
def add_or_update_cookie_list(cookies, cookie_list):
  # new_cookie converts all the cookies in the cookie_list into the cookie string format carried by the request
    new_cookie = ""
    if cookies != "":
        for name, value in cookies.items():
            cookie_list[name] = value
    for key, value in cookie_list.items():
      # The format of the cookie is name=value;name=value...
        new_cookie = new_cookie + key + "=" + value + ";"
    return new_cookie

In order, we first request csrf_token, which is a get request without difficulty, so I use the request library to make the request, but I am puzzled by the 403 of life and death, and the browser and ApiFox requests are all successful. Later, after checking the information I found that this is most likely due to the problem of “verification of native simulated browser TLS/JA3 fingerprints”, so I found a solution:

Use the curl_cffi library to solve

pip install curl_cffi
from curl_cffi import requests
url = "xxx"
...
...
res = requests.get(url=url, impersonate="chrome101", headers=headers)
print(res. text)

After using this library, it was successfully solved! Let’s continue to look at the code:

# Get csrf_token
def csrf_token_get(cookie_list):
    try:
        csrf_url = "https://chat.openai.com/api/auth/csrf"
        res = requests.get(url=csrf_url, impersonate="chrome101")
        csrfToken = res.json()['csrfToken']
        if res.status_code == 200:
          # Print log, can be removed
            loggerConfig.Log.log.info(f"Get csrfToken successfully! {<!-- -->csrfToken}")
            # Pass the cookie and cookie_list of the response header into add_or_update_cookie_list to update the cookie_list content
            csrf_cookies = add_or_update_cookie_list(res.cookies, cookie_list)
            # Make sure there is no ";" at the end of the cookie
            csrf_cookies = csrf_cookies[:-1]
            return csrf_cookies, csrfToken
    except:
      # Print log, can be removed
        loggerConfig.Log.log.warning("Error: Request failed!")

After getting csrfToken, we should get authorize_url, the code is as follows:

def auth_post(csrf_cookies, csrf_token, cookie_list):
    try:
      # We pass the cookie obtained in the previous step to the next request every time, even if the cookie is more than the actual cookie content required by the request, it doesn't matter
        auth0_url = f"https://chat.openai.com/api/auth/signin/auth0?prompt=login"
        headers = {<!-- -->
            "Origin": "https://chat.openai.com",
            "Cookie": csrf_cookies,
            "Referer": "https://chat.openai.com/auth/login?sso",
            "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/113.0.0.0 Safari/537.36"
        }
        data = {<!-- -->
            "callbackUrl": "/",
            "csrfToken": csrf_token,
            "json": "true"
        }
        res = requests. post(url=auth0_url, headers=headers, data=data, impersonate="chrome101")
        auth_cookie = res.cookies
        # update cookie_list
        add_or_update_cookie_list(auth_cookie, cookie_list)
        # Request succeeded
        if res.status_code == 200:
            print(res. text)
            login_url = res.json()['url']
            # print log
            loggerConfig.Log.log.info(f"Get auth_url successfully! {<!-- -->login_url}]")
            return login_url
        else:
          # print log
            loggerConfig.Log.log.warning(f"Error: Response<{<!-- -->res.status_code}>")
    except:
        # print log
        loggerConfig.Log.log.warning("Error: Request failed!")

The next step is to get the identifier_url, the code is as follows:

def login_indentify(url, cookie_list):
    try:
        headers = {<!-- -->
            "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/113.0.0.0 Safari/537.36",
        }
        res = requests.get(url=url, headers=headers, allow_redirects=False, impersonate="chrome101")
        if res.status_code == 302:
            # Get the response header cookie and process it into the format of the request header cookie
            set_cookie_string = res.cookies
            new_cookie = add_or_update_cookie_list(set_cookie_string, cookie_list)
            new_cookie = new_cookie[:-1]

            # Get state parameter value
            location = res. headers['location']
            state = location. split("state=")[1]

            # identify request address, this address is used to verify the account
            identify_url = "https://auth0.openai.com" + res. headers['location']
            loggerConfig.Log.log.info(f"Get identify_url successfully! {<!-- -->identify_url}")

            return state, new_cookie, identify_url
        else:
            loggerConfig.Log.log.warning(f"Error: Response<{<!-- -->res.status_code}>")
    except:
        loggerConfig.Log.log.warning("Error: Request failed!")

After we get the identifier_url, we should get the password_url, the code is as follows:

def username_identify(state, new_cookie, identify_url):
    try:
      # Change the username here to your own
        data = {<!-- -->
            "state": state,
            "username": "xxx",
            "js-available": "true",
            "webauthn-available": "true",
            "is-brave": "false",
            "webauthn-platform-available": "true",
            "action": "default"
        }
        headers = {<!-- -->
            "Sec-Fetch-Site": "same-origin",
            "Cookie": new_cookie,
            "Origin": "https://auth0.openai.com",
            "Referer": identify_url,
            "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/113.0.0.0 Safari/537.36"
        }
        res = requests. post(url=identify_url, data=data, headers=headers, impersonate="chrome101",
                            allow_redirects=False)
        if res.status_code == 302:
            password_url = "https://auth0.openai.com" + res. headers['location']
            loggerConfig.Log.log.info(f"Get password_url successfully! {<!-- -->password_url}")
            return password_url, state, new_cookie
        else:
            loggerConfig.Log.log.warning(f"Error: Response<{<!-- -->res.status_code}>")
    except:
        loggerConfig.Log.log.warning("Error: Request failed!")

Continue to follow the order to get resume_url, the code is as follows:

def password_identify(password_url, state, new_cookie, cookie_list):
    try:
      # Account password enter your own
        data = {<!-- -->
            "state": state,
            "username": "xxx",
            "password": "xxx",
            "action": "default"
        }

        headers = {<!-- -->
            "Sec-Fetch-Site": "same-origin",
            "Cookie": new_cookie,
            "Origin": "https://auth0.openai.com",
            "Referer": password_url,
            "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/113.0.0.0 Safari/537.36"
        }
        res = requests. post(url=password_url, headers=headers, data=data, impersonate="chrome101",
                            allow_redirects=False)
        if res.status_code == 302:
            resume_url = "https://auth0.openai.com" + res. headers['location']
            loggerConfig.Log.log.info(f"Get resume_url successfully! {<!-- -->resume_url}")

            password_set_cookie = res.cookies
            new_cookie = add_or_update_cookie_list(password_set_cookie, cookie_list)
            new_cookie = new_cookie[:-1]

            return resume_url, new_cookie
        else:
            loggerConfig.Log.log.warning(f"Error: Response<{<!-- -->res.status_code}>")
    except:
        loggerConfig.Log.log.warning("Error: Request failed!")

The next step is to get auth0_url (I wrote callback_url in the code), the code is as follows:

def resume_url_get(resume_url, new_cookie, cookie_list):
    try:
        headers = {<!-- -->
            "Cookie": new_cookie,
            "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/113.0.0.0 Safari/537.36"
        }

        res = requests.get(url=resume_url, headers=headers, impersonate="chrome101", allow_redirects=False)
        if res.status_code == 302:
            callback_url = res. headers['location']
            loggerConfig.Log.log.info(f"Get callback_url successfully! {<!-- -->callback_url}")
            callback_url_set_cookie = res.cookies
            add_or_update_cookie_list(callback_url_set_cookie, cookie_list)
            new_cookie = new_cookie[:-1]
            return callback_url, new_cookie
        else:
            loggerConfig.Log.log.warning(f"Error: Response<{<!-- -->res.status_code}>")
    except:
        loggerConfig.Log.log.warning("Error: Request failed!")

Then we take chat_openai_url, (hold on, it’s over soon!) The code is as follows:

def callback_url_get(callback_url, new_cookie, cookie_list):
    try:
        headers = {<!-- -->
            "Cookie": new_cookie,
            "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/113.0.0.0 Safari/537.36"
        }

        res = requests.get(url=callback_url, headers=headers, impersonate="chrome101", allow_redirects=False)
        if res.status_code == 302:
            chat_openai_url = res. headers['location']
            loggerConfig.Log.log.info(f"Get chat_openai_url successfully! {<!-- -->chat_openai_url}")
            chat_openai_cookies = res.cookies
            add_or_update_cookie_list(chat_openai_cookies, cookie_list)
        else:
            loggerConfig.Log.log.warning(f"Error: Response<{<!-- -->res.status_code}>")
    except:
        loggerConfig.Log.log.warning("Error: Request failed!")

last step! Take the token! code show as below:

def get_access_token(cookie_list):
    try:
        cookies = add_or_update_cookie_list("", cookie_list)
        cookies = cookies[:-1]
        headers = {<!-- -->
            "Cookie": cookies,
            "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/113.0.0.0 Safari/537.36"
        }
        session_url = "https://chat.openai.com/api/auth/session"
        res = requests.get(url=session_url, headers=headers, impersonate="chrome101")
        if res.status_code == 200:
            loggerConfig.Log.log.info("Get accessToken successfully!")
            return res.json()['accessToken']
        else:
            loggerConfig.Log.log.warning(f"Error: Response<{<!-- -->res.status_code}>")
    except:
        loggerConfig.Log.log.warning("Error: Request failed!")

At this point, the access_token of chatGPT can be obtained, hey, really, openai really knows how to play dolls, jumping around.

Summary

Heart journey

I suffered a lot when I analyzed the requests step by step. I compared and checked them one by one, tried countless experiments, and read countless error reports, 403, but the moment the data came out, the whole person was extremely excited. And because of the problem of scientific Internet access, I often have to wait for a long time for a response, and there are also a lot of 429s, but I am still very happy, after all, I have worked hard to get certain results!

Experience summary

Through this reverse engineering, I have a higher level of mastery of crawlers (for myself, I am very good), knowing that I can use cookies as a breakthrough in the future, including learning about simulated browser fingerprint verification, which I did not know before. The anti-reptile mechanism is full of rewards!