Get the access_token of ChatGPT (latest!!!)
Preface
Recently, I am engaged in mobile application development. I wrote a chatGPT app on a whim, but the interface can only be provided by the official website. My own account has no free quota, and my friend’s account only has a 5$ quota. No more, so I am going to meet this chatGPT for a while!
Login process analysis
1. Preparation
First copy the following link and enter, and open the packet capture tool (F12), as shown in the figure, delete the cookie
https://chat.openai.com/auth/login?iss=https://auth0.openai.com/
After the deletion is complete, click the Log in button. At this time, view the network capture data as follows:
We observed that the contents of the request are as follows: csrf, auth0?prompt=login, authorize?client_id=..., identifier?state=hKF...
, so far our preparations have been completed.
2. Get csrfToken
Let’s take a look at it in order, first is csrf, at this time someone must ask why not look at providers? Ha, because it doesn’t work.
Observation found that it was a GET request, but at this time I clicked the preview but could not see the data, so I went to ApiFox to test this interface and found that the request was a csrfToken, as shown below:
Good guy, you are so secretive, you can’t even see you in the browser’s packet capture, I caught you!
3. Get authorize_url
In the previous step we got the csrfToken, so what is his use? At this time, we should look at the next auth0?prompt=login
, click on this request, we can see the following picture:
Note that this is the POST method. We look at the payload and find that it carries the following parameters:
Isn’t this the surprise? The big csrfToken is here, understand? At this time, I clicked the preview to view the response data. Good guy, the response data could not be loaded, so I had to go to ApiFox to request, but I found that the return was a large string of html, hey, what’s going on? After my comparison, the headers were all the same, but I still couldn’t request it, so I tried it with python to request it, and when I printed the result, I was pleasantly surprised to find that there was a return value, and it was still a url!
At this point, we can compare the url captured by the packet capture, and find that this url is the one in the list of authorize, nice! One step closer.
4. Get identify_url
With the authorize url obtained in the previous step, we click to view its content in the packet capture tool as shown below:
As you can see, his status code is 302, what does this mean? The 302 status code represents redirection. You can see that there is a key in the response header called Location:
, and the value behind it is the URL pointed to by the redirection. We look at the list name on the left and find that there is an identifier under authorize. After comparison, we find that this is the redirected link, so what we need to get in this step is the Location
in the response header value. After being implemented in python, the url can be spliced as shown in the figure:
So far, we have already completed half of the workload~ Surprised?
5. Get password_url
In the previous step, we got the identifier_url, and the content of the page is shown in the figure
At this point, we enter our account, click continue, and at the same time view the content of the captured packet, as shown in the following figure:
It can be found that this identifier (the link we just got) has performed a 302 redirection! I checked the payload and found that he brought the following parameters:
We found that this link has a large string of state parameters behind our url, as well as the username we entered, and other parameters (I don’t know what it is, so don’t worry about filling it in). After reading the parameters, let’s take a look at the response header. At this time, you can see that there is a Location
parameter in the response header, and the following string is the URL after redirection. You can find that this is the address of the string of URLs in the password below. Speaking of this, Do you feel like this is like a nesting doll hahaha, take your time step by step. Using the same python code, I successfully extracted the url string of his password.
6. Get resume_url
We first log in to our chatGPT account, and we can see the following list after capturing packets:
Good guy, the password is still a redirection, and we also found that the link after the redirection is the string of resume, so we can do it in an analogy to the above identifier, and get the string of urls from resume.
7. Get auth0_url
We click on the url of resume to see the following picture:
His redirected link is the auth0 link below him, chatGPT is really good for you, endless nesting dolls! In the same way, we continue to get the link of auth0.
8. Get chat_openai_url
Continuing to deduce, we can also request the url of chat.openai.com. So far, we have been clear about the whole process from login to entering the page. At this time, everyone should be more confused. What is the use of getting an html in the end? What about my token? Don’t worry, the good show will come soon.
Interface jump reveal
Please clear your cookies before reading the following content! ! ! If you just followed the above 8-step process, then you have to delete the cookie now, and how to delete it is introduced in the preparation work.
Officially started
Let’s go back to the beginning and think carefully about whether I meant to clear the cookies, yes! The most critical point is this cookie. Go back to the url of login
, check the cookie at this time, we can find:
Hey, there is a cookie attached to the request response. When we look at the header content, we can see the key name of set-cookie, and that is the response cookie shown in the picture. Of course, this doesn’t show anything. Let’s look at the next session
can be found:
Oh, the link to this request also has a response cookie, which is interesting, then let’s continue to look at the cookie attached to csrf
, as shown in the figure:
interesting! Layer by layer, chatGPT you know nesting dolls. We continue to investigate one by one, as follows:
Here you will find that there is no cookie this time, but the response cookie is received, continue to look down:
Here you will find out how to request only these few cookies. Where did the previous cookies such as _dd_s, _cfuvid
, etc. go? You can understand that the cookie has a list, similar to a map, with key-value pairs. When requesting, you only need to get the ones I need to use.
It is found here that __cf_bm
already exists, so where is the __cf_bm
obtained in the response? That’s right, the original content was directly replaced. Did you suddenly realize it when you talked about it? Below I will release all the processes, and you can compare the cookies in each step by yourself.
So far, it is all the process. At this time, the token we are most concerned about comes. Click on the response data of the session and we can see the following:
woc! The token came, and after so many steps back and forth, we finally found the token. Presumably everyone is already familiar with the entire process from logging in to obtaining tokens. Simply put, we need to carry the necessary parameters and request headers at every step of the jump. The most important thing in the request header is the cookie, which affects every request. The next step is cookies, so let’s analyze and check step by step to figure out the whole process. The idea has been answered here, and the code is next.
Code Analysis
In the above disclosure, we mentioned cookies. We can simulate a browser and create a new cookie_list to store all cookies. At this time, we implement a function to add and update the content of cookie_list. The code is as follows:
# Store cookies, which can be added and updated. The cookies here refer to the cookies in the response header after the request def add_or_update_cookie_list(cookies, cookie_list): # new_cookie converts all the cookies in the cookie_list into the cookie string format carried by the request new_cookie = "" if cookies != "": for name, value in cookies.items(): cookie_list[name] = value for key, value in cookie_list.items(): # The format of the cookie is name=value;name=value... new_cookie = new_cookie + key + "=" + value + ";" return new_cookie
In order, we first request csrf_token, which is a get request without difficulty, so I use the request library to make the request, but I am puzzled by the 403 of life and death, and the browser and ApiFox requests are all successful. Later, after checking the information I found that this is most likely due to the problem of “verification of native simulated browser TLS/JA3 fingerprints”, so I found a solution:
Use the curl_cffi library to solve
pip install curl_cffi
from curl_cffi import requests url = "xxx" ... ... res = requests.get(url=url, impersonate="chrome101", headers=headers) print(res. text)
After using this library, it was successfully solved! Let’s continue to look at the code:
# Get csrf_token def csrf_token_get(cookie_list): try: csrf_url = "https://chat.openai.com/api/auth/csrf" res = requests.get(url=csrf_url, impersonate="chrome101") csrfToken = res.json()['csrfToken'] if res.status_code == 200: # Print log, can be removed loggerConfig.Log.log.info(f"Get csrfToken successfully! {<!-- -->csrfToken}") # Pass the cookie and cookie_list of the response header into add_or_update_cookie_list to update the cookie_list content csrf_cookies = add_or_update_cookie_list(res.cookies, cookie_list) # Make sure there is no ";" at the end of the cookie csrf_cookies = csrf_cookies[:-1] return csrf_cookies, csrfToken except: # Print log, can be removed loggerConfig.Log.log.warning("Error: Request failed!")
After getting csrfToken, we should get authorize_url, the code is as follows:
def auth_post(csrf_cookies, csrf_token, cookie_list): try: # We pass the cookie obtained in the previous step to the next request every time, even if the cookie is more than the actual cookie content required by the request, it doesn't matter auth0_url = f"https://chat.openai.com/api/auth/signin/auth0?prompt=login" headers = {<!-- --> "Origin": "https://chat.openai.com", "Cookie": csrf_cookies, "Referer": "https://chat.openai.com/auth/login?sso", "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/113.0.0.0 Safari/537.36" } data = {<!-- --> "callbackUrl": "/", "csrfToken": csrf_token, "json": "true" } res = requests. post(url=auth0_url, headers=headers, data=data, impersonate="chrome101") auth_cookie = res.cookies # update cookie_list add_or_update_cookie_list(auth_cookie, cookie_list) # Request succeeded if res.status_code == 200: print(res. text) login_url = res.json()['url'] # print log loggerConfig.Log.log.info(f"Get auth_url successfully! {<!-- -->login_url}]") return login_url else: # print log loggerConfig.Log.log.warning(f"Error: Response<{<!-- -->res.status_code}>") except: # print log loggerConfig.Log.log.warning("Error: Request failed!")
The next step is to get the identifier_url, the code is as follows:
def login_indentify(url, cookie_list): try: headers = {<!-- --> "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/113.0.0.0 Safari/537.36", } res = requests.get(url=url, headers=headers, allow_redirects=False, impersonate="chrome101") if res.status_code == 302: # Get the response header cookie and process it into the format of the request header cookie set_cookie_string = res.cookies new_cookie = add_or_update_cookie_list(set_cookie_string, cookie_list) new_cookie = new_cookie[:-1] # Get state parameter value location = res. headers['location'] state = location. split("state=")[1] # identify request address, this address is used to verify the account identify_url = "https://auth0.openai.com" + res. headers['location'] loggerConfig.Log.log.info(f"Get identify_url successfully! {<!-- -->identify_url}") return state, new_cookie, identify_url else: loggerConfig.Log.log.warning(f"Error: Response<{<!-- -->res.status_code}>") except: loggerConfig.Log.log.warning("Error: Request failed!")
After we get the identifier_url, we should get the password_url, the code is as follows:
def username_identify(state, new_cookie, identify_url): try: # Change the username here to your own data = {<!-- --> "state": state, "username": "xxx", "js-available": "true", "webauthn-available": "true", "is-brave": "false", "webauthn-platform-available": "true", "action": "default" } headers = {<!-- --> "Sec-Fetch-Site": "same-origin", "Cookie": new_cookie, "Origin": "https://auth0.openai.com", "Referer": identify_url, "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/113.0.0.0 Safari/537.36" } res = requests. post(url=identify_url, data=data, headers=headers, impersonate="chrome101", allow_redirects=False) if res.status_code == 302: password_url = "https://auth0.openai.com" + res. headers['location'] loggerConfig.Log.log.info(f"Get password_url successfully! {<!-- -->password_url}") return password_url, state, new_cookie else: loggerConfig.Log.log.warning(f"Error: Response<{<!-- -->res.status_code}>") except: loggerConfig.Log.log.warning("Error: Request failed!")
Continue to follow the order to get resume_url, the code is as follows:
def password_identify(password_url, state, new_cookie, cookie_list): try: # Account password enter your own data = {<!-- --> "state": state, "username": "xxx", "password": "xxx", "action": "default" } headers = {<!-- --> "Sec-Fetch-Site": "same-origin", "Cookie": new_cookie, "Origin": "https://auth0.openai.com", "Referer": password_url, "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/113.0.0.0 Safari/537.36" } res = requests. post(url=password_url, headers=headers, data=data, impersonate="chrome101", allow_redirects=False) if res.status_code == 302: resume_url = "https://auth0.openai.com" + res. headers['location'] loggerConfig.Log.log.info(f"Get resume_url successfully! {<!-- -->resume_url}") password_set_cookie = res.cookies new_cookie = add_or_update_cookie_list(password_set_cookie, cookie_list) new_cookie = new_cookie[:-1] return resume_url, new_cookie else: loggerConfig.Log.log.warning(f"Error: Response<{<!-- -->res.status_code}>") except: loggerConfig.Log.log.warning("Error: Request failed!")
The next step is to get auth0_url (I wrote callback_url in the code), the code is as follows:
def resume_url_get(resume_url, new_cookie, cookie_list): try: headers = {<!-- --> "Cookie": new_cookie, "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/113.0.0.0 Safari/537.36" } res = requests.get(url=resume_url, headers=headers, impersonate="chrome101", allow_redirects=False) if res.status_code == 302: callback_url = res. headers['location'] loggerConfig.Log.log.info(f"Get callback_url successfully! {<!-- -->callback_url}") callback_url_set_cookie = res.cookies add_or_update_cookie_list(callback_url_set_cookie, cookie_list) new_cookie = new_cookie[:-1] return callback_url, new_cookie else: loggerConfig.Log.log.warning(f"Error: Response<{<!-- -->res.status_code}>") except: loggerConfig.Log.log.warning("Error: Request failed!")
Then we take chat_openai_url, (hold on, it’s over soon!) The code is as follows:
def callback_url_get(callback_url, new_cookie, cookie_list): try: headers = {<!-- --> "Cookie": new_cookie, "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/113.0.0.0 Safari/537.36" } res = requests.get(url=callback_url, headers=headers, impersonate="chrome101", allow_redirects=False) if res.status_code == 302: chat_openai_url = res. headers['location'] loggerConfig.Log.log.info(f"Get chat_openai_url successfully! {<!-- -->chat_openai_url}") chat_openai_cookies = res.cookies add_or_update_cookie_list(chat_openai_cookies, cookie_list) else: loggerConfig.Log.log.warning(f"Error: Response<{<!-- -->res.status_code}>") except: loggerConfig.Log.log.warning("Error: Request failed!")
last step! Take the token! code show as below:
def get_access_token(cookie_list): try: cookies = add_or_update_cookie_list("", cookie_list) cookies = cookies[:-1] headers = {<!-- --> "Cookie": cookies, "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/113.0.0.0 Safari/537.36" } session_url = "https://chat.openai.com/api/auth/session" res = requests.get(url=session_url, headers=headers, impersonate="chrome101") if res.status_code == 200: loggerConfig.Log.log.info("Get accessToken successfully!") return res.json()['accessToken'] else: loggerConfig.Log.log.warning(f"Error: Response<{<!-- -->res.status_code}>") except: loggerConfig.Log.log.warning("Error: Request failed!")
At this point, the access_token of chatGPT can be obtained, hey, really, openai really knows how to play dolls, jumping around.
Summary
Heart journey
I suffered a lot when I analyzed the requests step by step. I compared and checked them one by one, tried countless experiments, and read countless error reports, 403, but the moment the data came out, the whole person was extremely excited. And because of the problem of scientific Internet access, I often have to wait for a long time for a response, and there are also a lot of 429s, but I am still very happy, after all, I have worked hard to get certain results!
Experience summary
Through this reverse engineering, I have a higher level of mastery of crawlers (for myself, I am very good), knowing that I can use cookies as a breakthrough in the future, including learning about simulated browser fingerprint verification, which I did not know before. The anti-reptile mechanism is full of rewards!