selenium waits for elements to load
Code operations are very fast
?
\dashrightarrow
? Some tags have not been loaded yet
?
\dashrightarrow
? Can’t find it even if I look for it
?
\dashrightarrow
? will report an error
Set wait: show wait, hermit wait
# Search for a certain tag. If it cannot be found, wait up to 10 seconds. bro.implicitly_wait(10)
Selenium element operations
- Click operation:
click()
- Write text:
send_keys("content")
- Clear text:
clear()
Execute js
When using Selenium to operate the browser, you can write your own js for execution. What will you do with this?
- Create new tab
- Print out some variables (variables belonging to the currently crawled page)
- Get the currently logged in cookies
- Slide screen
- Basic usage:
bro.execute_script('alert("Beauty")')
- Print out some variables:
res=bro.execute_script('console.log(urlMap)')
- Create a new tab:
bro.execute_script('open()')
- Sliding the screen (sliding to the bottom):
bro.execute_script('scrollTo(0,document.documentElement.scrollHeight)')
- Get the current access address:
bro.execute_script('alert(location)')
- Change the current access address:
bro.execute_script('location="http://www.baidu.com"')
- Print cookies:
bro.execute_script('alert(document.cookie)')
Switch tab
from selenium import webdriver import time bro = webdriver.Firefox() bro.get('https://www.cnblogs.com/liuqingzheng/p/16005896.html') bro.implicitly_wait(10) # Open tab bro.execute_script('window.open()') # Switch to a tab bro.switch_to.window(bro.window_handles[1]) bro.get('https://www.baidu.com/') time.sleep(2) bro.get('http://www.taobao.com') time.sleep(2) # go back bro.back() time.sleep(2) # go ahead bro.forward() time.sleep(2) # Close tab bro.close() # Close the page bro.quit() bro.close()
Log in to cnblogs (chrome)
The data to be crawled in the future can only be seen after logging in.
- If you use selenium, it is slow
?
\dashrightarrow
? Cannot enable multi-threading
?
\dashrightarrow
? Not too fast
- If you use requests to send requests, it is difficult to log in and you will not be able to log in automatically.
?
\dashrightarrow
? Can’t get cookie
- Log in using selenium
?
\dashrightarrow
? Get cookies
?
\dashrightarrow
? Change to another machine, use this cookie, and still be logged in.
Log in to get cookies
import time from selenium import webdriver from selenium.webdriver.chrome.options import Options import json from selenium.webdriver.common.by import By # Remove automated software controlled detection options = Options() options.add_argument("--disable-blink-features=AutomationControlled") bro = webdriver.Chrome(options=options) bro.get('https://www.cnblogs.com/') bro.implicitly_wait(10) bro.maximize_window() login_btn = bro.find_element(By.LINK_TEXT, 'Login') login_btn.click() time.sleep(2) # Find the username and password input boxes username = bro.find_element(By.CSS_SELECTOR, '#mat-input-0') password = bro.find_element(By.ID, 'mat-input-1') submit_btn = bro.find_element(By.CSS_SELECTOR, 'body > app-root > app-sign-in-layout > div > div > app-sign-in > app-content-container > div > div > div > form > div > button') # Verification code code=bro.find_element(By.ID,'Shape3') time.sleep(1) username.send_keys('@qq.com') time.sleep(1) password.send_keys('#') time.sleep(1) submit_btn.click() # In one case, the direct login is successful. In another case, a verification code will pop up. code.click() time.sleep(10) # Let the program stop here first---》Manually operate the browser---》Complete the verification code---》The program will continue. # So far, the login is successful. # Take out the cookie and save it cookies = bro.get_cookies() with open('cnblogs.json', 'w', encoding='utf-8') as f: json.dump(cookies, f) time.sleep(2) bro.close()
Change to another machine and use this cookie
import time from selenium import webdriver from selenium.webdriver.chrome.options import Options import json from selenium.webdriver.common.by import By options = Options() options.add_argument("--disable-blink-features=AutomationControlled") bro = webdriver.Chrome(options=options) bro.get('https://www.cnblogs.com/') bro.implicitly_wait(10) bro.maximize_window() time.sleep(5) # Remove cookie--"Write to browser---"Refresh browser---"Login status with open('cnblogs.json', 'r') as f: cookies = json.load(f) #Write to browser for item in cookies: bro.add_cookie(item) # If it is a cookie that is not logged in, an error will be reported if you write it in. # Refresh browser bro.refresh() time.sleep(5) bro.close()
Drawer semi-automatic likes
Log in using selenium
?
\dashrightarrow
? Get cookies
Like Use requests Like Use cookies
Use requests to like
# Visit the home page and parse the ID number import requests from bs4 import BeautifulSoup #### Access with cookies##### session = requests.Session() cookie = {<!-- -->} # Take out locally and write with open('chouti.json', 'r') as f: cookie_list = json.load(f) ##### The cookie format of selenium is different from that of requests. You need to convert {key:value,key:value} for item in cookie_list: cookie[item['name']] = item['value'] header={<!-- -->'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/119.0.0.0 Safari/537.36'} res = session.get('https://dig.chouti.com/', cookies=cookie,headers=header) soup = BeautifulSoup(res.text, 'html.parser') print(res.text) divs = soup.find_all(name='div', class_='link-item') for div in divs: article_id = div.attrs.get('data-id') data = {<!-- --> 'linkId': article_id } res1 = session.post('https://dig.chouti.com/link/vote', data=data,headers=header) print(res1.text)