Python implements JS reverse decryption to collect website data

Hi~ Hello everyone, this is the Demon King? ~!

For more python source code/information/answers/tutorials, etc. Click here to jump to the end of the article and get the business card for free.

Environment usage:

  • Python 3.8

  • Pycharm

  • nodejs

Module usage:

  • import requests –> pip install requests

  • import execjs –> pip install pyexecjs

  • import json

Implementing crawler program:

  1. Where is the packet capture analysis data?

    • Developer tool packet capture –> F12

      When being prohibited from calling developer tools

  2. Select data from other years –> XHR

    Data package interface: https://www.aqistudy.cn/historydata/api/historyapi.php

    • Encryption parameters:

      Request parameter encryption: hA4Nse2cT <√>

      Response data encryption:

  3. Analyze encrypted data generation rules –> JS code generation

    Pass in certain values and generate cipher text data through JS code function

    How to find the location where encryption parameters are generated:

    1. Search directly for the keyword hA4Nse2cT

    2. Set a breakpoint through stack send in the launcher

    First use python code to request the link to obtain the response encrypted data

    Make bold guesses, practice carefully

    hex_md5 –> MD5 encryption

Code display

“””Import module”””

'''
Have a question and no one has an answer? The editor has created a Python learning and communication QQ group: 926207505
Looking for like-minded friends to help each other, there are also good video learning tutorials and PDF e-books in the group!
'''
#Import and compile JS code module
import execjs
#Import data request module
import requests
#Import json module
import json
#Import csv module
import csv
import pandas as pd

“””send request”””

month_list = ['202301', '202302', '202303', '202304','202305','202306','202308','202309','202310']
for month in month_list:
    # Simulate browser
    headers = {<!-- -->
        'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/118.0.0.0 Safari/537.36'
    }
    # Request link
    url = 'https://www.aqistudy.cn/historydata/api/historyapi.php'

“””Calling JS code to obtain encryption parameters”””

'''
Have a question and no one has an answer? The editor has created a Python learning and communication QQ group: 926207505
Looking for like-minded friends to help each other, there are also good video learning tutorials and PDF e-books in the group!
'''
    #Read js file
    data_file = open('data.js', 'r', encoding='utf-8').read()
    #Compile JS code
    data_code = execjs.compile(data_file)
    # Parameters
    m0fhOhhGL = "GETDAYDATA"
    oNLhNQ = {<!-- -->
        "city": "Beijing",
        "month": month
    }
    # Call js code function
    hA4Nse2cT = data_code.call('post_data', m0fhOhhGL, oNLhNQ)
    print('Encrypted request parameters: ', hA4Nse2cT)
    # Request parameters
    data = {<!-- -->
        'hA4Nse2cT': hA4Nse2cT
    }
    # send request
    response = requests.post(url=url, data=data, headers=headers).text

“””Decrypt response encrypted data”””

 #Read file
    response_file = open('response.js', 'r', encoding='utf-8').read()
    # Compile file
    response_code = execjs.compile(response_file)
    # Call js function
    result = response_code.call('dxvERkeEvHbS', response)
    print('Encrypted response data: ',response)
    print('Clear text response data: ',result)

“””save data”””

'''
Have a question and no one has an answer? The editor has created a Python learning and communication QQ group: 926207505
Looking for like-minded friends to help each other, there are also good video learning tutorials and PDF e-books in the group!
'''
    json_data = json.loads(result)
    content_list = []
    for index in json_data['result']['data']['items']:
        content_list.append(index)

    df_data = pd.DataFrame(content_list)
    df_data.to_excel(f'{month}.xlsx', index=False)

Module installation issues:

  • If installing python third-party modules:

    1. win + R, enter cmd, click OK, enter the installation command pip install module name (pip install requests) and press Enter

    2. Click Terminal in pycharm and enter the installation command

  • Reason for installation failure:

    • Failure 1: pip is not an internal command

      Solution: Set environment variables

    • Failure 2: A large number of red reports (read time out)

      Solution: Because the network link times out, the mirror source needs to be switched.

       Tsinghua: https://pypi.tuna.tsinghua.edu.cn/simple
         Alibaba Cloud: https://mirrors.aliyun.com/pypi/simple/
         University of Science and Technology of China https://pypi.mirrors.ustc.edu.cn/simple/
         Huazhong University of Science and Technology: https://pypi.hustunique.com/
         Shandong University of Technology: https://pypi.sdutlinux.org/
         Douban: https://pypi.douban.com/simple/
         For example: pip3 install -i https://pypi.doubanio.com/simple/ module name
      
    • Failure three: cmd shows that it has been installed, or the installation is successful, but it still cannot be imported into pycharm.

      Solution: There may be multiple python versions installed (just install one of anaconda or python), just uninstall one.
      Or the python interpreter in your pycharm is not set up properly.

How to configure the python interpreter in pycharm?

  1. Select file >>> setting >>> Project >>> python interpreter

  2. Click the gear and select add

  3. Add python installation path

How to install plug-ins in pycharm?

  1. Select file >>> setting >>> Plugins

  2. Click Marketplace and enter the name of the plug-in you want to install. For example: translation plug-in, enter translation / Chinese plug-in, enter Chinese

  3. Select the corresponding plug-in and click install.

  4. After the installation is successful, the option to restart pycharm will pop up. Click OK and the restart will take effect.

Epilogue

Finally, thank you for reading my article~ This flight ends here

I hope this article has been helpful to you and learned some knowledge~

The hidden stars are also working hard to shine, and you should work hard too (let’s work hard together).

Finally, let’s spread the word~For more source codes, information, materials, answers, and exchanges click on the business card below to get it