Hi~ Hello everyone, this is the Demon King? ~!
For more python source code/information/answers/tutorials, etc. Click here to jump to the end of the article and get the business card for free.
Environment usage:
-
Python 3.8
-
Pycharm
-
nodejs
Module usage:
-
import requests –> pip install requests
-
import execjs –> pip install pyexecjs
-
import json
Implementing crawler program:
-
Where is the packet capture analysis data?
-
Developer tool packet capture –> F12
When being prohibited from calling developer tools
-
-
Select data from other years –> XHR
Data package interface: https://www.aqistudy.cn/historydata/api/historyapi.php
-
Encryption parameters:
Request parameter encryption: hA4Nse2cT <√>
Response data encryption:
-
-
Analyze encrypted data generation rules –> JS code generation
Pass in certain values
and generate cipher text data through JS code function How to find the location where encryption parameters are generated:
-
Search directly for the keyword hA4Nse2cT
-
Set a breakpoint through stack send in the launcher
First use python code to request the link to obtain the response encrypted data
Make bold guesses, practice carefully
hex_md5 –> MD5 encryption
-
Code display
“””Import module”””
''' Have a question and no one has an answer? The editor has created a Python learning and communication QQ group: 926207505 Looking for like-minded friends to help each other, there are also good video learning tutorials and PDF e-books in the group! ''' #Import and compile JS code module import execjs #Import data request module import requests #Import json module import json #Import csv module import csv import pandas as pd
“””send request”””
month_list = ['202301', '202302', '202303', '202304','202305','202306','202308','202309','202310'] for month in month_list: # Simulate browser headers = {<!-- --> 'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/118.0.0.0 Safari/537.36' } # Request link url = 'https://www.aqistudy.cn/historydata/api/historyapi.php'
“””Calling JS code to obtain encryption parameters”””
''' Have a question and no one has an answer? The editor has created a Python learning and communication QQ group: 926207505 Looking for like-minded friends to help each other, there are also good video learning tutorials and PDF e-books in the group! ''' #Read js file data_file = open('data.js', 'r', encoding='utf-8').read() #Compile JS code data_code = execjs.compile(data_file) # Parameters m0fhOhhGL = "GETDAYDATA" oNLhNQ = {<!-- --> "city": "Beijing", "month": month } # Call js code function hA4Nse2cT = data_code.call('post_data', m0fhOhhGL, oNLhNQ) print('Encrypted request parameters: ', hA4Nse2cT) # Request parameters data = {<!-- --> 'hA4Nse2cT': hA4Nse2cT } # send request response = requests.post(url=url, data=data, headers=headers).text
“””Decrypt response encrypted data”””
#Read file response_file = open('response.js', 'r', encoding='utf-8').read() # Compile file response_code = execjs.compile(response_file) # Call js function result = response_code.call('dxvERkeEvHbS', response) print('Encrypted response data: ',response) print('Clear text response data: ',result)
“””save data”””
''' Have a question and no one has an answer? The editor has created a Python learning and communication QQ group: 926207505 Looking for like-minded friends to help each other, there are also good video learning tutorials and PDF e-books in the group! ''' json_data = json.loads(result) content_list = [] for index in json_data['result']['data']['items']: content_list.append(index) df_data = pd.DataFrame(content_list) df_data.to_excel(f'{month}.xlsx', index=False)
Module installation issues:
-
If installing python third-party modules:
-
win + R, enter cmd, click OK, enter the installation command pip install module name (pip install requests) and press Enter
-
Click Terminal in pycharm and enter the installation command
-
-
Reason for installation failure:
-
Failure 1: pip is not an internal command
Solution: Set environment variables
-
Failure 2: A large number of red reports (read time out)
Solution: Because the network link times out, the mirror source needs to be switched.
Tsinghua: https://pypi.tuna.tsinghua.edu.cn/simple Alibaba Cloud: https://mirrors.aliyun.com/pypi/simple/ University of Science and Technology of China https://pypi.mirrors.ustc.edu.cn/simple/ Huazhong University of Science and Technology: https://pypi.hustunique.com/ Shandong University of Technology: https://pypi.sdutlinux.org/ Douban: https://pypi.douban.com/simple/ For example: pip3 install -i https://pypi.doubanio.com/simple/ module name
-
Failure three: cmd shows that it has been installed, or the installation is successful, but it still cannot be imported into pycharm.
Solution: There may be multiple python versions installed (just install one of anaconda or python), just uninstall one.
Or the python interpreter in your pycharm is not set up properly.
-
How to configure the python interpreter in pycharm?
-
Select file >>> setting >>> Project >>> python interpreter
-
Click the gear and select add
-
Add python installation path
How to install plug-ins in pycharm?
-
Select file >>> setting >>> Plugins
-
Click Marketplace and enter the name of the plug-in you want to install. For example: translation plug-in, enter translation / Chinese plug-in, enter Chinese
-
Select the corresponding plug-in and click install.
-
After the installation is successful, the option to restart pycharm will pop up. Click OK and the restart will take effect.
Epilogue
Finally, thank you for reading my article~ This flight ends here
I hope this article has been helpful to you and learned some knowledge~
The hidden stars are also working hard to shine, and you should work hard too (let’s work hard together).
Finally, let’s spread the word~For more source codes, information, materials, answers, and exchanges click on the business card below to get it