[Chrome] Use k8s and docker to deploy the headless browser Headless, Java calling example

What is a headless browser?

A headless browser is a browser without a graphical user interface. A headless browser does not control the operation of the browser through its graphical user interface (GUI), but uses the command line.

Why use Chrome headless?

  • Chrome Headless is used for scraping (Google), testing (developers), and hacking (hackers).
  • Search engines use it to render pages, generate dynamic content, and index data from single-page Web applications.
  • SEO tool is used to analyze the website and make suggestions on how to improve it.
  • Monitoring tool for monitoring the execution time of JavaScript in Web applications.
  • A testing tool that renders pages and compares them to previous versions to track user interface changes.
  • The main advantage of using Headless Chrome is that users can write scripts to run the browser programmatically and perform tasks such as crawling, analyzing or imaging websites quickly and at scale, without having to open the browser’s GUIand click on a million things.
  • Three things are needed to do this: the headless Chrome, the DevTools protocol, and Puppeteer.
  • You’ve already seen Chrome Headless. Dev Protocol is a remote instance of Chrome DevTools, opened in another browser. It allows you to see headless Chrome “through your eyes” without having to run a browser GUI. Puppeteer is a node library that provides developers with tools to programmatically control headless Chrome through the DevTools protocol.
  • Combining the three, you can use Headless Chrome to write repetitive large-scale action scripts and run them quickly and at scale.

Install chrome browser and test

Basically every programmer will install the chrome browser. If it is not installed, you can download and install it. After installation, you can directly use the chrome browser to run a headless browser. command, assuming the chrome browser installation path is: C:\Users\administrator\AppData\Local\Google\Chrome\Application\chrome.exe, you can execute the following Order

C:\Users\best5\AppData\Local\Google\Chrome\Application\chrome.exe --headless --hide-scrollbars --disable-gpu --screenshot=e: \chrome.jpg --window-size=1280,1696 https://www.baidu.com

A chrome.jpg file will be generated

Docker running

  • Pull the image: docker pull browserless/chrome:latest
  • Run the container: docker run -p 3000:3000 browserless/chrome:latest
  • Use a browser to access: http://localhost:3000/

Looks very powerful

k8s deployment

  • Write the deployment ymal file and name it browserless-chrome.yaml
---
apiVersion: v1
Kind: Service
metadata:
  name: browserless-chrome
  namespace: kube-public
  labels:
    app: browserless-chrome
spec:
  type: NodePort
  ports:
    - name: websocket
      port: 30000
      targetPort: 3000
      nodePort: 30000
  selector:
    app: browserless-chrome
---
apiVersion: apps/v1
Kind: Deployment
metadata:
  name: browserless-chrome
  namespace: kube-public
spec:
  replicas: 1
  revisionHistoryLimit: 0 #The number of histories in Replica Sets
  selector:
    matchLabels:
      app: browserless-chrome
  template:
    metadata:
      labels:
        app: browserless-chrome
    spec:
      containers:
        - name: browserless-chrome
          imagePullPolicy: Always
          image: browserless/chrome:latest
          env:
            - name: PORT
              value: "3000"
          securityContext:
            runAsNonRoot: true
            runAsUser: 999
            runAsGroup: 999
          ports:
            - containerPort: 3000
          livenessProbe:
            tcpSocket:
              port: 3000
            initialDelaySeconds: 5
            failureThreshold: 2
            periodSeconds: 60
          readinessProbe:
            tcpSocket:
              port: 3000
            initialDelaySeconds: 5
            periodSeconds: 10
          startupProbe:
            tcpSocket:
              port: 3000
            failureThreshold: 30
            periodSeconds: 10
          resources:
            requests:
              CPU: 0.2
              memory: 300Mi
            limits:
              cpu: 1
              memory: 1Gi
      imagePullSecrets:
        - name: puller
  • kubectl apply -f browserless-chrome.yaml

Push the image to the private repository

  • Retag the image: docker tag browserless/chrome:latest xxx.cn/base/browserless-chrome:latest
  • Push to the private warehouse: docker push imgsreg.ipipa.cn:20443/base/browserless-chrome:latest

Java call example

  • Add the following dependencies in pom.xml
<dependency>
  <groupId>io.github.fanyong920</groupId>
  <artifactId>jvppeteer</artifactId>
  <version>1.1.5</version>
</dependency>
  • Use the local chrome program to call the sample code
public class BrowserTest {<!-- -->

    @SneakyThrows
    @Test
    void test() {<!-- -->
        //Automatically download, it will not download again after the first download.
// BrowserFetcher.downloadIfNotExist(null);
        ArrayList<String> arrayList = new ArrayList<>();
        //Generating pdf must be in headless mode to take effect
        LaunchOptions options = new LaunchOptionsBuilder()
                .withExecutablePath("C:\Users\administrator\AppData\Local\Google\\Chrome\Application\chrome.exe\ ")
                .withArgs(arrayList)
                .withHeadless(true)
                .build();
        arrayList.add("--no-sandbox");
        arrayList.add("--disable-setuid-sandbox");
        Browser browser = Puppeteer.launch(options);
        Page page = browser.newPage();
        page.goTo("https://www.baidu.com");
        PDFOptions pdfOptions = new PDFOptions();
        pdfOptions.setPath("test.pdf");
        page.pdf(pdfOptions);
        page.close();
        browser.close();
    }
}
  • Use wetsocket to remotely call chrome sample code
public class BrowserTest {<!-- -->

    @SneakyThrows
    @Test
    void test() {<!-- -->
        //Automatically download, it will not download again after the first download.
// BrowserFetcher.downloadIfNotExist(null);
        ArrayList<String> arrayList = new ArrayList<>();
        //Generating pdf must be in headless mode to take effect
        LaunchOptions options = new LaunchOptionsBuilder()
                .withArgs(arrayList)
                .withHeadless(true)
                .build();
        arrayList.add("--no-sandbox");
        arrayList.add("--disable-setuid-sandbox");
        Browser browser = Puppeteer.connect(options, "ws://localhost:3000", null, null);
        Page page = browser.newPage();
        page.goTo("https://www.baidu.com");
        PDFOptions pdfOptions = new PDFOptions();
        pdfOptions.setPath("test.pdf");
        page.pdf(pdfOptions);
        page.close();
        browser.close();
    }
}

The test.pdf file will be generated in the project directory and can be opened to see the effect.