selenium waits for element loading, element manipulation, js execution, switching tabs, logging into cnblogs (chrome), and drawer semi-automatic likes

selenium waits for elements to load

Code operations are very fast

?

\dashrightarrow

? Some tags have not been loaded yet

?

\dashrightarrow

? Can’t find it even if I look for it

?

\dashrightarrow

? will report an error

Set wait: show wait, hermit wait

# Search for a certain tag. If it cannot be found, wait up to 10 seconds.
bro.implicitly_wait(10)

Selenium element operations

  1. Click operation: click()
  2. Write text: send_keys("content")
  3. Clear text: clear()

Execute js

When using Selenium to operate the browser, you can write your own js for execution. What will you do with this?

  • Create new tab
  • Print out some variables (variables belonging to the currently crawled page)
  • Get the currently logged in cookies
  • Slide screen
  1. Basic usage: bro.execute_script('alert("Beauty")')
  2. Print out some variables: res=bro.execute_script('console.log(urlMap)')
  3. Create a new tab: bro.execute_script('open()')
  4. Sliding the screen (sliding to the bottom): bro.execute_script('scrollTo(0,document.documentElement.scrollHeight)')
  5. Get the current access address:
    • bro.execute_script('alert(location)')
    • Change the current access address: bro.execute_script('location="http://www.baidu.com"')
  6. Print cookies: bro.execute_script('alert(document.cookie)')

Switch tab

from selenium import webdriver
import time

bro = webdriver.Firefox()
bro.get('https://www.cnblogs.com/liuqingzheng/p/16005896.html')
bro.implicitly_wait(10)

# Open tab
bro.execute_script('window.open()')

# Switch to a tab
bro.switch_to.window(bro.window_handles[1])
bro.get('https://www.baidu.com/')
time.sleep(2)
bro.get('http://www.taobao.com')
time.sleep(2)

# go back
bro.back()
time.sleep(2)

# go ahead
bro.forward()
time.sleep(2)

# Close tab
bro.close()

# Close the page
bro.quit()

bro.close()

Log in to cnblogs (chrome)

The data to be crawled in the future can only be seen after logging in.

  • If you use selenium, it is slow

    ?

    \dashrightarrow

    ? Cannot enable multi-threading

    ?

    \dashrightarrow

    ? Not too fast

  • If you use requests to send requests, it is difficult to log in and you will not be able to log in automatically.

    ?

    \dashrightarrow

    ? Can’t get cookie

  • Log in using selenium

    ?

    \dashrightarrow

    ? Get cookies

    ?

    \dashrightarrow

    ? Change to another machine, use this cookie, and still be logged in.

Log in to get cookies
import time

from selenium import webdriver
from selenium.webdriver.chrome.options import Options
import json
from selenium.webdriver.common.by import By
# Remove automated software controlled detection
options = Options()
options.add_argument("--disable-blink-features=AutomationControlled")
bro = webdriver.Chrome(options=options)

bro.get('https://www.cnblogs.com/')
bro.implicitly_wait(10)
bro.maximize_window()
login_btn = bro.find_element(By.LINK_TEXT, 'Login')
login_btn.click()

time.sleep(2)

# Find the username and password input boxes
username = bro.find_element(By.CSS_SELECTOR, '#mat-input-0')
password = bro.find_element(By.ID, 'mat-input-1')

submit_btn = bro.find_element(By.CSS_SELECTOR,
                              'body > app-root > app-sign-in-layout > div > div > app-sign-in > app-content-container > div > div > div > form > div > button')
# Verification code
code=bro.find_element(By.ID,'Shape3')
time.sleep(1)


username.send_keys('@qq.com')
time.sleep(1)
password.send_keys('#')
time.sleep(1)
submit_btn.click() # In one case, the direct login is successful. In another case, a verification code will pop up.
code.click()
time.sleep(10)

# Let the program stop here first---》Manually operate the browser---》Complete the verification code---》The program will continue.
# So far, the login is successful.
# Take out the cookie and save it
cookies = bro.get_cookies()
with open('cnblogs.json', 'w', encoding='utf-8') as f:
    json.dump(cookies, f)

time.sleep(2)
bro.close()
Change to another machine and use this cookie
import time

from selenium import webdriver
from selenium.webdriver.chrome.options import Options
import json
from selenium.webdriver.common.by import By

options = Options()
options.add_argument("--disable-blink-features=AutomationControlled")
bro = webdriver.Chrome(options=options)

bro.get('https://www.cnblogs.com/')
bro.implicitly_wait(10)
bro.maximize_window()

time.sleep(5)
# Remove cookie--"Write to browser---"Refresh browser---"Login status
with open('cnblogs.json', 'r') as f:
    cookies = json.load(f)
#Write to browser
for item in cookies:
    bro.add_cookie(item) # If it is a cookie that is not logged in, an error will be reported if you write it in.

# Refresh browser
bro.refresh()

time.sleep(5)
bro.close()

Drawer semi-automatic likes

Log in using selenium

?

\dashrightarrow

? Get cookies
Like Use requests Like Use cookies

Use requests to like
# Visit the home page and parse the ID number
import requests
from bs4 import BeautifulSoup

#### Access with cookies#####
session = requests.Session()
cookie = {<!-- -->} # Take out locally and write
with open('chouti.json', 'r') as f:
    cookie_list = json.load(f)
##### The cookie format of selenium is different from that of requests. You need to convert {key:value,key:value}
for item in cookie_list:
    cookie[item['name']] = item['value']
header={<!-- -->'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/119.0.0.0 Safari/537.36'}
res = session.get('https://dig.chouti.com/', cookies=cookie,headers=header)
soup = BeautifulSoup(res.text, 'html.parser')


print(res.text)



divs = soup.find_all(name='div', class_='link-item')
for div in divs:
    article_id = div.attrs.get('data-id')
    data = {<!-- -->
        'linkId': article_id
    }

    res1 = session.post('https://dig.chouti.com/link/vote', data=data,headers=header)
    print(res1.text)