springboot uses Jsoup to parse html

springboot uses Jsoup to parse html

1. Demand
The html data transmitted from the front end to the back end needs to be parsed and replaced by the back end
2. Solve
Use Jsoup

<dependency>
    <groupId>org.jsoup</groupId>
    <artifactId>jsoup</artifactId>
    <version>1.15.4</version>
</dependency>

3. The main function of Jsoup
1) Parse HTML from a URL, file or string
2) Use DOM or CSS selectors to find and retrieve data
3) Operable HTML elements, attributes, text
Note: jsoup is released based on the MIT protocol and can be used in commercial projects with confidence.
4. use
4.1. Loading via url

/* get method */
 /* Document doc = Jsoup.connect("http://www.baidu.com/").get(); */
 /* post method */
 /* Document doc = Jsoup.connect("http://www.baidu.com/").post(); */
 /* Add parameters and other request information */
 Document doc = Jsoup.connect("http://www.baidu.com/")
    .data("user", "test")
    .cookie("user", "test")
    .timeout(3000)
    .post();

4.2. Loading via file

 File f = new File("input.html");
 /* The third parameter is used to handle relative paths */
 Document doc = Jsoup. parse(f, "UTF-8", "http://www.baidu.com/");

4.3. Loading by String

 String html = "<html><head><title></title></head><body></body></html>";
 Document doc = Jsoup. parse(html);

5. Analysis
Jsoup provides a series of static analysis methods to generate Document objects

 static Document parse(File in, String charsetName)
 static Document parse(File in, String charsetName, String baseUri)
 static Document parse(InputStream in, String charsetName, String baseUri)
 static Document parse(String html)
 static Document parse(String html, String baseUri)
 static Document parse(URL url, int timeoutMillis)
 static Document parseBodyFragment(String bodyHtml)
 static Document parseBodyFragment(String bodyHtml, String baseUri)

Where baseUri indicates that the retrieved relative URL is relative to baseUriURL
Where charsetName represents the character set
6.Connection connect(String url) Create a connection according to the given url (must be http or https)
Connection provides some methods to crawl web content

Connection cookie(String name, String value) Place cookie when sending request
Connection data(Map<String,String> data) pass request parameters
Connection data(String... keyvals) pass request parameters
Document get() sends a request in get mode and parses the returned result
Document post() sends the request in post mode and parses the returned result
Connection userAgent(String userAgent)
Connection header(String name, String value) Add request header
Connection referrer(String referrer) set request source

7.jsoup provides similar JS to get html elements:

getElementById(String id) get element with id
getElementsByTag(String tag) Get elements with tags
getElementsByClass(String className) Get elements with class
getElementsByAttribute(String key) Get elements with attributes
At the same time, the following methods are also provided to obtain sibling nodes:
siblingElements(), firstElementSibling(), lastElementSibling(); nextElementSibling(), previousElementSibling()

8. Get and set element data

 attr(String key) get element data attr(String key, String value) set element data
 attributes() gets all attributes
 id(), className() classNames() get the value of id class
 text() gets the text value
 text(String value) set the text value
 html() get html
 html(String value) set html
 outerHtml() to get inner html
 data() get data content
 tag() gets tag and tagName() gets tagname

9. Operate html elements:

 append(String html), prepend(String html)
 appendText(String text), prependText(String text)
 appendElement(String tagName), prependElement(String tagName)
 html(String value)

10. Jsoup also provides selectors similar to JQuery, using selectors to retrieve data

 tagname Use the tag name to locate, such as a
 ns|tag locates the tag using the namespace, e.g. fb:name to find the <fb:name> element
 #id Use the element id to locate, for example #logo
 .class is located using the class attribute of the element, such as .head
 * Position all elements
 [attribute] uses the attribute of the element to locate, for example [href] means to retrieve all elements with href attribute
 [^attr] uses the attribute name prefix of the element to locate, for example [^data-] is used to find the dataset attribute of HTML5
 [attr=value] uses the attribute value for positioning, for example [width=500] locates all elements with a width attribute value of 500
 [attr^=value],[attr$=value],[attr*=value] These three syntaxes respectively represent that the attribute starts with value, ends with value and contains
 [attr~=regex] Use regular expressions to filter attribute values, such as img[src~=(?i)\.(png|jpe?g)]
 The above is the most basic selector syntax, which can also be used in combination

11. Combination usage

el#id locates an element with an id value, such as a#logo -> <a id=logo href= … >
el.class locates the element whose class is the specified value, such as div.head -> <div class=head>xxxx</div>
el[attr] locates all elements that define an attribute, such as a[href]
Any combination of the above three, such as a[href]#logo, a[name].outerlink

12. In addition to some basic syntax and the combination of these syntaxes, jsoup also supports the use of expressions for element filtering and selection

 :lt(n) For example, td:lt(3) means less than three columns
 :gt(n) div p:gt(2) means the div contains more than 2 p
 :eq(n) form input:eq(1) means a form with only one input
 :has(seletor) div:has(p) indicates the div containing the p element
 :not(selector) div:not(.logo) Indicates a list of all divs that do not contain class=logo elements
 :contains(text) An element containing a certain text, case-insensitive, such as p:contains(oschina)
 :containsOwn(text) The text information is completely equal to the filter of the specified condition
 :matches(regex) Use regular expressions for text filtering: div:matches((?i)login)
 :matchesOwn(regex) use a regular expression to find the own text

13. Similar to java script, Jsoup provides the following functions

 getElementById(String id) get element by id
 getElementsByTag(String tag) get elements by tags
 getElementsByClass(String className) Get elements by class
 getElementsByAttribute(String key) get elements by attributes

14. At the same time, the following methods are also provided to obtain sibling nodes:

 siblingElements();
 firstElementSibling();
 lastElementSibling();
 nextElementSibling();
 previousElementSibling();

15. Use the following method to get the data of the element:

 attr(String key) get element data
 attr(String key, String value) set element data
 attributes() get all attributes
 id(), className() classNames() get the value of id class
 text() gets the text value
 text(String value) set the text value
 html() get html
 html(String value) set html
 outerHtml() to get inner html
 data() get data content
 tag() gets tag and tagName() gets tagname

16. The operation html provides the following methods:

 append(String html), prepend(String html)
 appendText(String text), prependText(String text)
 appendElement(String tagName), prependElement(String tagName)
 html(String value)

17. Use in the project
17.1. Data passed to the backend

<p style="text-align: center;">Niannujiao Chibi Nostalgia</p><p> &nbsp; &nbsp; &nbsp; &nbsp; On the west side of the old base, the human way is: Zhou Lang Chibi of the Three Kingdoms. The rocks pierced through the sky, the stormy waves hit the shore, and thousands of piles of snow were rolled up. Picturesque, a moment how many hero.</p><p> & amp; nbsp; Feather fans and scarves, while talking and laughing, masts and sculls are wiped out in ashes. If you travel in the motherland, you should laugh at me passionately, and you will be born early. Life is like a dream, and one statue is still in Jiangyue. &nbsp;</p><p><br></p><p>Name<span id="customer_name" title="customer name" sde-model="{ & amp;quot; tableSchema & amp;quot;: & amp;quot; confinement & amp;quot;, & amp;quot; tableName & amp;quot;: & amp;quot;customer_cmt & amp;quot;, & amp;quot;columnName & amp;quot;: & amp;quot;customer_name & amp;quot;, & amp;quot;dataType & amp ;quot;: & amp;quot;text & amp;quot;, & amp;quot;value & amp;quot;: & amp;quot;customer name& amp;quot;}" contenteditable="false" class=" sde-bg">
<span class="sde-left" style="color:#0000FF" contenteditable="false">[</span>
<span id="customerName" title="Customer Name" style="color:#000000;" class="sde-value" contenteditable="true">Customer Name</span>
<span style="color:#0000FF" contenteditable="false" class="sde-right">]</span></span></p><p>Phone<span id="custom_phone" title="Customer Phone" sde-model="{ & amp;quot;tableSchema & amp;quot;: & amp;quot;confinement & amp;quot;, & amp;quot;tableName & amp;quot;: & amp;quot;customer_cmt & amp;quot;, & amp;quot;columnName & amp;quot;: & amp;quot;custom_phone & amp;quot;, & amp;quot;dataType & amp ;quot;: & amp;quot;text & amp;quot;, & amp;quot;value & amp;quot;: & amp;quot;customer phone number& amp;quot;}" contenteditable="false" class=" sde-bg">
<span class="sde-left" style="color:#0000FF" contenteditable="false">[</span>
<span id="customPhone" title="Customer Phone" style="color:#000000;" class="sde-value" contenteditable="true">Customer Phone</span>
<span style="color:#0000FF" contenteditable="false" class="sde-right">]</span></span>
</p>

17.2. Data obtained from the database by the backend

{"customPhone":"15693167830","customerName":"Zhang Dandan"}

17.3. The backend parses the replaced data

<p style="text-align: center;">Niannujiao Chibi Nostalgia</p>
<p> &nbsp; &nbsp; &nbsp; &nbsp; On the west side of the old base, the human way is: Zhou Lang Chibi of the Three Kingdoms. The rocks pierced through the sky, the stormy waves hit the shore, and thousands of piles of snow were rolled up. Picturesque, a moment how many hero.</p>
<p> & amp; nbsp; Feather fans and scarves, while talking and laughing, masts and sculls are wiped out in ashes. If you travel in the motherland, you should laugh at me passionately, and you will be born early. Life is like a dream, and one statue is still in Jiangyue. &nbsp;</p>
<p><br></p>
<p>Name<span id="customer_name" title="customer name" sde-model="{ & amp;quot; tableSchema & amp;quot;: & amp;quot; confinement & amp;quot;, & amp;quot; tableName & amp;quot;: & amp;quot;customer_cmt & amp;quot;, & amp;quot;columnName & amp;quot;: & amp;quot;customer_name & amp;quot;, & amp;quot;dataType & amp ;quot;: & amp;quot;text & amp;quot;, & amp;quot;value & amp;quot;: & amp;quot;customer name& amp;quot;}" contenteditable="false" class=" sde-bg"> <span class="sde-left" style="color:#0000FF" contenteditable="false">[</span> <span id="customerName" title="Customer Name" style="color :#000000;" class="sde-value" contenteditable="true">Zhang Dandan</span> <span style="color:#0000FF" contenteditable="false" class="sde-right">]</span ></span></p>
<p>Phone<span id="custom_phone" title="Customer Phone" sde-model="{ & amp;quot;tableSchema & amp;quot;: & amp;quot;confinement & amp;quot;, & amp;quot;tableName & amp;quot;: & amp;quot;customer_cmt & amp;quot;, & amp;quot;columnName & amp;quot;: & amp;quot;custom_phone & amp;quot;, & amp;quot;dataType & amp ;quot;: & amp;quot;text & amp;quot;, & amp;quot;value & amp;quot;: & amp;quot;customer phone number& amp;quot;}" contenteditable="false" class=" sde-bg"> <span class="sde-left" style="color:#0000FF" contenteditable="false">[</span> <span id="customPhone" title="Customer Phone" style="color :#000000;" class="sde-value" contenteditable="true">15693167830</span> <span style="color:#0000FF" contenteditable="false" class="sde-right">]</span ></span></p>

17.4. Backend processing code

// The data that needs to be replaced and filled is obtained from the database, obtained as JSONObject
JSONObject json = XXXX

// html is the data passed from the front end to the back end
String s = new String(html);
// Convert to Document via Jsoup
Document document = Jsoup. parse(s);
// traverse the filling data obtained in the database
Iterator<String> iterator = json.keySet().iterator();
while (iterator. hasNext()) {
    String key = iterator. next();
    // assign a value based on the id in the element
    document.getElementById(key).text(json.getString(key));
}
// Get the element through the p tag
Elements pEl = document. getElementsByTag("p");

17.4.1. Duplicate id processing
front-end code

<p style="white-space: normal; text-align: center;">Nian Nujiao Chibi Nostalgia</p><p style="white-space: normal;"><br></p><p style="white-space: normal;"> &nbsp; &nbsp; &nbsp; &nbsp; On the west side of the old base, the human way is: Zhou Lang Chibi of the Three Kingdoms. The rocks pierced through the sky, the stormy waves hit the shore, and thousands of piles of snow were rolled up. Picturesque, a moment how many hero.</p><p style="white-space: normal;"> &nbsp; &nbsp; &nbsp; &nbsp; hair. Feather fans and scarves, while talking and laughing, masts and sculls are wiped out in ashes. If you travel in the motherland, you should laugh at me passionately, and you will be born early. Life is like a dream, and one statue is still in Jiangyue.</p><p style="white-space: normal;"><br></p><p style="white-space: normal;">Name<span id="customer_name" title="customer name" sde-model="{ & amp;quot;tableSchema & amp;quot;: & amp;quot;confinement & amp;quot;, & amp;quot;tableName & amp;quot;: & amp;quot;customer_cmt & amp;quot;, & amp;quot;columnName & amp;quot;: & amp; quot;customer_name & amp;quot;, & amp;quot;dataType & amp;quot;: & amp;quot;text & amp;quot;, & amp;quot;value & amp;quot;: & amp;quot; Customer name& amp;quot;}" contenteditable="false" class="sde-bg">
<span class="sde-left" style="color:#0000FF" contenteditable="false">[</span>
<span id="customerName" title="Customer Name" style="color:#000000;" class="sde-value" contenteditable="true">Customer Name</span>
<span style="color:#0000FF" contenteditable="false" class="sde-right">]</span></span> & amp;nbsp; & amp;nbsp; & amp;nbsp;Emergency Contacts Name<span id="customer_name" title="customer name" sde-model="{ & amp;quot; tableSchema & amp;quot;: & amp;quot; confinement & amp;quot;, & amp;quot; tableName & amp;quot;: & amp;quot;customer_cmt & amp;quot;, & amp;quot;columnName & amp;quot;: & amp;quot;customer_name & amp;quot;, & amp;quot;dataType & amp ;quot;: & amp;quot;text & amp;quot;, & amp;quot;value & amp;quot;: & amp;quot;customer name& amp;quot;}" contenteditable="false" class=" sde-bg">
<span class="sde-left" style="color:#0000FF" contenteditable="false">[</span>
<span id="customerName" title="Customer Name" style="color:#000000;" class="sde-value" contenteditable="true">Customer Name</span>
<span style="color:#0000FF" contenteditable="false" class="sde-right">]</span></span></p><p style="white-space: normal;">family name<span id="customer_name" title="customer name" sde-model="{ & amp;quot;tableSchema & amp;quot;: & amp;quot;confinement & amp;quot;, & amp;quot;tableName & amp;quot;: & amp;quot;customer_cmt & amp;quot;, & amp;quot;columnName & amp;quot;: & amp ;quot;customer_name & amp;quot;, & amp;quot;dataType & amp;quot;: & amp;quot;text & amp;quot;, & amp;quot;value & amp;quot;: & amp;quot ;Customer name& amp;quot;}" contenteditable="false" class="sde-bg">
<span class="sde-left" style="color:#0000FF" contenteditable="false">[</span>
<span id="customerName" title="Customer Name" style="color:#000000;" class="sde-value" contenteditable="true">Customer Name</span>
<span style="color:#0000FF" contenteditable="false" class="sde-right">]</span></span> & amp;nbsp; & amp;nbsp; & amp;nbsp; Phone<span id="custom_phone" title="customer phone" sde-model="{ & amp;quot;tableSchema & amp;quot;: & amp;quot;confinement & amp;quot;, & amp;quot;tableName & amp; quot;: &quot;customer_cmt &quot;, &quot;columnName &quot;: &quot;custom_phone &quot;, &quot;dataType &quot; : & amp;quot;text & amp;quot;, & amp;quot;value & amp;quot;: & amp;quot;customer phone& amp;quot;}" contenteditable="false" class="sde-bg ">
<span class="sde-left" style="color:#0000FF" contenteditable="false">[</span>
<span id="customPhone" title="Customer Phone" style="color:#000000;" class="sde-value" contenteditable="true">Customer Phone</span>
<span style="color:#0000FF" contenteditable="false" class="sde-right">]</span></span></p><p style="white-space: normal;">Emergency Contact Phone<span id="custom_phone" title="Customer Phone" sde-model="{ & amp;quot;tableSchema & amp;quot ;: & amp;quot;confinement & amp;quot;, & amp;quot;tableName & amp;quot;: & amp;quot;customer_cmt & amp;quot;, & amp;quot;columnName & amp;quot;: & amp;quot;custom_phone & amp;quot;, & amp;quot;dataType & amp;quot;: & amp;quot;text & amp;quot;, & amp;quot;value & amp;quot;: & amp ;quot;Customer phone number& amp;quot;}" contenteditable="false" class="sde-bg">
<span class="sde-left" style="color:#0000FF" contenteditable="false">[</span>
<span id="customPhone" title="Customer Phone" style="color:#000000;" class="sde-value" contenteditable="true">Customer Phone</span>
<span style="color:#0000FF" contenteditable="false" class="sde-right">]</span></span> & amp;nbsp; & amp;nbsp; Family Phone<span id="custom_phone " title="Customer Phone" sde-model="{ & amp;quot;tableSchema & amp;quot;: & amp;quot;confinement & amp;quot;, & amp;quot;tableName & amp;quot;: & amp;quot;customer_cmt & amp;quot;, & amp;quot;columnName & amp;quot;: & amp;quot;custom_phone & amp;quot;, & amp;quot;dataType & amp;quot;: & amp; quot;text & amp;quot;, & amp;quot;value & amp;quot;: & amp;quot;customer phone& amp;quot;}" contenteditable="false" class="sde-bg">
<span class="sde-left" style="color:#0000FF" contenteditable="false">[</span>
<span id="customPhone" title="Customer Phone" style="color:#000000;" class="sde-value" contenteditable="true">Customer Phone</span>
<span style="color:#0000FF" contenteditable="false" class="sde-right">]</span></span>
</p>
Document document = Jsoup. parse(s);
Elements pList = document. getElementsByTag("p");
// Mainly deal with data repetition, that is, label id repetition
for(Element p:pList){ // Repeat across lines
    Iterator<String> iterator = map.keySet().iterator();
    while (iterator. hasNext()) {
        String key = iterator. next();
        Element elementById = p.getElementById(key);

        if(elementById != null){
            Elements span = p. getElementsByTag("span");
            for(Element sp:span){ // Repeat in the same row
                Element byId = sp. getElementById(key);
                if(byId !=null){
                    byId.text(map.get(key).toString());
                }
            }
        }
    }
}

17.5. The style when the front end is passed in

17.6. The style returned after backend processing