selenium/webdriver operating principle and mechanism

I’ve been looking at some low-level stuff recently. Driver translates as driver, meaning driver. If you compare webdriver to a driver, it is very appropriate.

We can compare WebDriver driving a browser to a taxi driver driving a taxi. There are three roles when driving a taxi:

· Passenger: He/she tells the taxi driver where to go and roughly how to get there.

· Taxi driver: He controls the taxi according to the requirements of passengers.

· Taxi: The taxi completes the actual driving according to the driver’s control and delivers the passengers to their destination.

There are also three similar roles in WebDriver:

· Automated test code: The automated test code sends requests to the browser driver (such as Firefox driver, Google driver).

· Browser driver: It parses the codes of these automated tests and sends them to the browser after parsing.

· Browser: Execute the instructions sent by the browser driver, and finally complete the operation desired by the engineer.

So in this analogy:

· The automated test code written by engineers is equivalent to passengers.

· Browser drivers are like taxi drivers.

· The browser is like a taxi.

Let’s technically explain how WebDriver works:

Technically speaking, the same three roles above:

· WebDriver API (based on Java, Python, C# and other languages).

· For Java language, it is the downloaded selenium Jar package, such as selenium-java-3.8.1.zip package, which represents the version of Selenium3.8.1.

· Browser driver (browser driver), each browser has its own driver, which exists in the form of exe file. For example, Google’s chromedriver.exe, Firefox’s geckodriver.exe, and IE’s IEDriverServer.exe browser.

The browser is of course the various commonly used browsers that we are familiar with. So how do they communicate with each other when the WebDriver script is running? Why can the same browser driver process both Java language scripts and Python language scripts? Let’s take a look at what happens on the backend when a Selenium script is executed:

· For each Selenium script, an http request is created and sent to the browser driver.

· The browser driver contains an HTTP Server to receive these http requests.

· After receiving the request, the HTTP Server specifically controls the corresponding browser according to the request.

The browser performs specific testing steps

The browser returns the step execution results to the HTTP Server. The HTTP Server returns the result to the Selenium script. If it is an incorrect http code, we will see the corresponding error message on the console.

Why use HTTP protocol?

Because the HTTP protocol is a standard protocol for communication between browsers and Web servers, and almost every programming language provides a wealth of http libraries, so that it can easily handle requests and requests between the client and the server. In response to the response, the structure of WebDriver is a typical C/S structure. The WebDriver API is equivalent to the client, and the small browser driver is the server side.

The protocol WebDriver is based on: JSON Wire protocol.

The JSON Wire protocol is based on the http protocol and further standardizes the data in the body part of the http request and response.

We know that HTTP requests and responses often include the following parts: http request method, http request and response content body, http response status code, etc.

Common http request methods:

GET: Used to obtain information from the server. For example, get the title information of the web page.

POST: Send an operation request to the server. Such as findElement, Click, etc.

http response status code:

In order to give users clearer feedback, WebDriver provides more detailed http response status codes, such as:

7:NoSuchElement

11:ElementNotVisible

200：Everything OK

Now comes the most critical http request and body part of the response:

The body part mainly transmits specific data. In WebDriver, these data exist and are transmitted in the form of JSON. This is the JSON Wire protocol.

Selenium is a webdriver API that encapsulates the APIs of each browser into a protocol designed and defined by Selenium itself, called The WebDriver Wire Protocol.

Operational level:

1. Testers write UI automation test scripts (java, python, etc.). After running the script, the program will open the specified webdriver browser.

The webdriver browser acts as a remote-server to accept script commands, and the webservice will open a port: http://localhost:9515 and the browser will listen to this port.

2. The webservice will translate the script language into json format and pass it to the browser to execute the operation command.

Logical level:

1. After the tester executes the test script, he creates a session and sends a restfull request to the webservice through an http request.

2. The webservice translates the restfull request into a script that the browser can understand, and then accepts the script execution result.

3. The webservice encapsulates the result – json and gives it to the client/test script. Then the client will know whether the operation is successful, and the test can also be verified.

We can verify:

Download chromedriver, put it in the environment variable, make sure it matches the chrome browser version, and then execute chromedriver

As you can see, a server will be started and port 9515 will be opened:

andersons-iMac:~ anderson$ chromedriver

Starting ChromeDriver 2.39.562713 (dd642283e958a93ebf6891600db055f1f1b4f3b2) on port 9515

Only local connections are allowed.

GVA info: Successfully connected to the Intel plugin, offline Gen9

Emphasized that only local connections are allowed. As mentioned before, when a passenger sends a request to the driver, the behavior is to construct an http request. The constructed request looks like this:

Request method: POST

Request address: http://localhost:9515/session

Request body:

capabilities = {
 
"capabilities": {
 
"alwaysMatch": {
 
"browserName": "chrome"
 
},
 
"firstMatch": [
 
{}
 
]
 
},
 
"desiredCapabilities": {
 
"platform": "ANY",
 
"browserName": "chrome",
 
"version": "",
 
"chromeOptions": {
 
"args": [],
 
"extensions": []
 
}
 
}
 
}
 
We can try to use python requests to send requests to ChromeDriver
 
import requests
 
import json
 
session_url = 'http://localhost:9515/session'
 
session_pars = {"capabilities": {"firstMatch": [{}], \
 
"alwaysMatch": {"browserName": "chrome",\
 
"platformName": "any", \
 
"goog:chromeOptions": {"extensions": [], "args": []}}}, \
 
"desiredCapabilities": {"browserName": "chrome", \
 
"version": "", "platform": "ANY", "goog:chromeOptions": {"extensions": [], "args": []}}}
 
r_session = requests.post(session_url,json=session_pars)
 
?print(json.dumps(r_session.json(),indent=2))
 
  result:
 
{
 
"sessionId": "44fdb7b1b048a76c0f625545b0d2567b",
 
"status": 0,
 
"value": {
 
"acceptInsecureCerts": false,
 
"acceptSslCerts": false,
 
"applicationCacheEnabled": false,
 
"browserConnectionEnabled": false,
 
"browserName": "chrome",
 
"chrome": {
 
"chromedriverVersion": "2.40.565386 (45a059dc425e08165f9a10324bd1380cc13ca363)",
 
"userDataDir": "/var/folders/yd/dmwmz84x5rj354qkz9rwwzbc0000gn/T/.org.chromium.Chromium.RzlABs"
 
},
 
"cssSelectorsEnabled": true,
 
"databaseEnabled": false,
 
"handlesAlerts": true,
 
"hasTouchScreen": false,
 
"javascriptEnabled": true,
 
"locationContextEnabled": true,
 
"mobileEmulationEnabled": false,
 
"nativeEvents": true,
 
"networkConnectionEnabled": false,
 
"pageLoadStrategy": "normal",
 
"platform": "Mac OS X",
 
"rotatable": false,
 
"setWindowRect": true,
 
"takesHeapSnapshot": true,
 
"takesScreenshot": true,
 
"unexpectedAlertBehaviour": "",
 
"version": "71.0.3578.80",
 
"webStorageEnabled": true
 
}
 
}

How to open a web page, similar to driver.get(url)

Then the constructed request is:

Request method: POST

Request address: http://localhost:9515/session/:sessionId/url

Note: “:sessionId” in the above address

To use the request to start the browser to return the value of the sessionId in the result

For example: I just sent a request, started the browser, and the result returned was “sessionId”: “44fdb7b1b048a76c0f625545b0d2567b”

Then request the URL address

Request address: http://localhost:9515/session/b2801b5dc58b15e76d0d3295b04d295c/url

Request body: {“url”: “https://www.baidu.com”, “sessionId”: “44fdb7b1b048a76c0f625545b0d2567b”}

Right now:

import requests
 
?url = 'http://localhost:9515/session/44fdb7b1b048a76c0f625545b0d2567b/url'
 
pars = {"url": "https://www.baidu.com", "sessionId": "44fdb7b1b048a76c0f625545b0d2567b"}
 
r = requests.post(url,json=pars)
 
?print(r.json())

How to locate elements, similar to driver.finde_element_by_xx:

Request method: POST

Request address: http://localhost:9515/session/:sessionId/element

Note: “:sessionId” in the above address

Use the request to start the browser to return the value of the sessionId in the result.

For example: I just sent a request, started the browser, and the “sessionId” in the returned result: “b2801b5dc58b15e76d0d3295b04d295c”

Then I construct the request address to find the page element

Request address: http://localhost:9515/session/b2801b5dc58b15e76d0d3295b04d295c/element

Request body: {“using”: “css selector”, “value”: “.postTitle a”, “sessionId”: “b2801b5dc58b15e76d0d3295b04d295c”}

Right now:

import requests

url = ‘http://localhost:9515/session/b2801b5dc58b15e76d0d3295b04d295c/element’

pars = {“using”: “css selector”, “value”: “.postTitle a”, “sessionId”: “b2801b5dc58b15e76d0d3295b04d295c”}

r = requests.post(url,json=pars)

print(r.json())

How to operate elements: similar to click()

Request method: POST

Request address: http://localhost:9515/session/:sessionId/element/:id/click

Note: “:sessionId” in the above address

To use the request to start the browser to return the value of the sessionId in the result

:id returns the value of ELEMENT after requesting element positioning.

For example: I just sent a request, started the browser, and the “sessionId” in the returned result: “b2801b5dc58b15e76d0d3295b04d295c”

Element positioning, returns the value of ELEMENT “0.11402119390850629-1”

Then I construct the request address of the click page element

Request address: http://localhost:9515/session/b2801b5dc58b15e76d0d3295b04d295c/element/0.11402119390850629-1/click

Request body: {“id”: “0.11402119390850629-1”, “sessionId”: “b2801b5dc58b15e76d0d3295b04d295c”}

Right now:

import requests
 
?url = 'http://localhost:9515/session/b2801b5dc58b15e76d0d3295b04d295c/element/0.11402119390850629-1/click'
 
pars ={"id": "0.5930642995574296-1", "sessionId": "b2801b5dc58b15e76d0d3295b04d295c"}
 
r = requests.post(url,json=pars)
 
?print(r.json())

As can be seen from the above, UI automation can actually be written as API automation.

Just, just

It’s so cumbersome. There are no encapsulated wedriver commands that are easy to use. It feels a bit like taking off your pants and farting.

Let’s write some code to get a feel for it:

import requests
 
import time
 
capabilities = {
 
"capabilities": {
 
"alwaysMatch": {
 
"browserName": "chrome"
 
},
 
"firstMatch": [
 
{}
 
]
 
},
 
"desiredCapabilities": {
 
"platform": "ANY",
 
"browserName": "chrome",
 
"version": "",
 
"chromeOptions": {
 
"args": [],
 
"extensions": []
 
}
 
}
 
}

# Open the browser http://127.0.0.1:9515/session

res = requests.post(‘http://127.0.0.1:9515/session’, json=capabilities).json()

session_id = res[‘sessionId’]

# Open Baidu

requests.post(‘http://127.0.0.1:9515/session/%s/url’ % session_id,

json={“url”: “http://www.baidu.com”, “sessionId”: session_id})

time.sleep(3)

# Close the browser and delete the session

requests.delete(‘http://127.0.0.1:9515/session/%s’ % session_id, json={“sessionId”: session_id})

In fact, understanding the real principle is to solve problems more conveniently, and make it easier to view and solve problems when debugging.

Of course, if you also need to call a small amount of UI automation in interface automation, you can consider this method.

Finally, I would like to thank everyone who read my article carefully. Reciprocity is always necessary. Although it is not a very valuable thing, if you can use it, you can take it directly:

This information should be the most comprehensive and complete preparation warehouse for [software testing] friends. This warehouse has also accompanied tens of thousands of test engineers through the most difficult journey. I hope it can also help you!

The knowledge points of the article match the official knowledge files, and you can further learn relevant knowledge. Python entry skill treeWeb crawlerSelenium388655 people are learning the system