What is a headless browser?
A headless browser is a browser without a graphical user interface. A headless browser does not control the operation of the browser through its graphical user interface (GUI
), but uses the command line.
Why use Chrome headless?
Chrome Headless
is used for scraping (Google), testing (developers), and hacking (hackers).- Search engines use it to render pages, generate dynamic content, and index data from single-page
Web
applications. SEO
tool is used to analyze the website and make suggestions on how to improve it.- Monitoring tool for monitoring the execution time of
JavaScript
inWeb
applications. - A testing tool that renders pages and compares them to previous versions to track user interface changes.
- The main advantage of using
Headless Chrome
is that users can write scripts to run the browser programmatically and perform tasks such as crawling, analyzing or imaging websites quickly and at scale, without having to open the browser’sGUI
and click on a million things. - Three things are needed to do this: the headless
Chrome
, theDevTools
protocol, and Puppeteer. - You’ve already seen
Chrome Headless
.Dev Protocol
is a remote instance ofChrome DevTools
, opened in another browser. It allows you to see headlessChrome
“through your eyes” without having to run a browserGUI
.Puppeteer
is a node library that provides developers with tools to programmatically control headlessChrome
through theDevTools
protocol. - Combining the three, you can use
Headless Chrome
to write repetitive large-scale action scripts and run them quickly and at scale.
Install chrome browser and test
Basically every programmer will install the chrome
browser. If it is not installed, you can download and install it. After installation, you can directly use the chrome
browser to run a headless browser. command, assuming the chrome browser installation path is: C:\Users\administrator\AppData\Local\Google\Chrome\Application\chrome.exe
, you can execute the following Order
C:\Users\best5\AppData\Local\Google\Chrome\Application\chrome.exe --headless --hide-scrollbars --disable-gpu --screenshot=e: \chrome.jpg --window-size=1280,1696 https://www.baidu.com
A chrome.jpg
file will be generated
Docker running
- Pull the image:
docker pull browserless/chrome:latest
- Run the container:
docker run -p 3000:3000 browserless/chrome:latest
- Use a browser to access:
http://localhost:3000/
Looks very powerful
k8s deployment
- Write the deployment
ymal
file and name itbrowserless-chrome.yaml
--- apiVersion: v1 Kind: Service metadata: name: browserless-chrome namespace: kube-public labels: app: browserless-chrome spec: type: NodePort ports: - name: websocket port: 30000 targetPort: 3000 nodePort: 30000 selector: app: browserless-chrome --- apiVersion: apps/v1 Kind: Deployment metadata: name: browserless-chrome namespace: kube-public spec: replicas: 1 revisionHistoryLimit: 0 #The number of histories in Replica Sets selector: matchLabels: app: browserless-chrome template: metadata: labels: app: browserless-chrome spec: containers: - name: browserless-chrome imagePullPolicy: Always image: browserless/chrome:latest env: - name: PORT value: "3000" securityContext: runAsNonRoot: true runAsUser: 999 runAsGroup: 999 ports: - containerPort: 3000 livenessProbe: tcpSocket: port: 3000 initialDelaySeconds: 5 failureThreshold: 2 periodSeconds: 60 readinessProbe: tcpSocket: port: 3000 initialDelaySeconds: 5 periodSeconds: 10 startupProbe: tcpSocket: port: 3000 failureThreshold: 30 periodSeconds: 10 resources: requests: CPU: 0.2 memory: 300Mi limits: cpu: 1 memory: 1Gi imagePullSecrets: - name: puller
kubectl apply -f browserless-chrome.yaml
Push the image to the private repository
- Retag the image:
docker tag browserless/chrome:latest xxx.cn/base/browserless-chrome:latest
- Push to the private warehouse:
docker push imgsreg.ipipa.cn:20443/base/browserless-chrome:latest
Java call example
- Add the following dependencies in
pom.xml
<dependency> <groupId>io.github.fanyong920</groupId> <artifactId>jvppeteer</artifactId> <version>1.1.5</version> </dependency>
- Use the local
chrome
program to call the sample code
public class BrowserTest {<!-- --> @SneakyThrows @Test void test() {<!-- --> //Automatically download, it will not download again after the first download. // BrowserFetcher.downloadIfNotExist(null); ArrayList<String> arrayList = new ArrayList<>(); //Generating pdf must be in headless mode to take effect LaunchOptions options = new LaunchOptionsBuilder() .withExecutablePath("C:\Users\administrator\AppData\Local\Google\\Chrome\Application\chrome.exe\ ") .withArgs(arrayList) .withHeadless(true) .build(); arrayList.add("--no-sandbox"); arrayList.add("--disable-setuid-sandbox"); Browser browser = Puppeteer.launch(options); Page page = browser.newPage(); page.goTo("https://www.baidu.com"); PDFOptions pdfOptions = new PDFOptions(); pdfOptions.setPath("test.pdf"); page.pdf(pdfOptions); page.close(); browser.close(); } }
- Use
wetsocket
to remotely callchrome
sample code
public class BrowserTest {<!-- --> @SneakyThrows @Test void test() {<!-- --> //Automatically download, it will not download again after the first download. // BrowserFetcher.downloadIfNotExist(null); ArrayList<String> arrayList = new ArrayList<>(); //Generating pdf must be in headless mode to take effect LaunchOptions options = new LaunchOptionsBuilder() .withArgs(arrayList) .withHeadless(true) .build(); arrayList.add("--no-sandbox"); arrayList.add("--disable-setuid-sandbox"); Browser browser = Puppeteer.connect(options, "ws://localhost:3000", null, null); Page page = browser.newPage(); page.goTo("https://www.baidu.com"); PDFOptions pdfOptions = new PDFOptions(); pdfOptions.setPath("test.pdf"); page.pdf(pdfOptions); page.close(); browser.close(); } }
The test.pdf file will be generated in the project directory and can be opened to see the effect.