springboot uses Jsoup to parse html
1. Demand
The html data transmitted from the front end to the back end needs to be parsed and replaced by the back end
2. Solve
Use Jsoup
<dependency> <groupId>org.jsoup</groupId> <artifactId>jsoup</artifactId> <version>1.15.4</version> </dependency>
3. The main function of Jsoup
1) Parse HTML from a URL, file or string
2) Use DOM or CSS selectors to find and retrieve data
3) Operable HTML elements, attributes, text
Note: jsoup is released based on the MIT protocol and can be used in commercial projects with confidence.
4. use
4.1. Loading via url
/* get method */ /* Document doc = Jsoup.connect("http://www.baidu.com/").get(); */ /* post method */ /* Document doc = Jsoup.connect("http://www.baidu.com/").post(); */ /* Add parameters and other request information */ Document doc = Jsoup.connect("http://www.baidu.com/") .data("user", "test") .cookie("user", "test") .timeout(3000) .post();
4.2. Loading via file
File f = new File("input.html"); /* The third parameter is used to handle relative paths */ Document doc = Jsoup. parse(f, "UTF-8", "http://www.baidu.com/");
4.3. Loading by String
String html = "<html><head><title></title></head><body></body></html>"; Document doc = Jsoup. parse(html);
5. Analysis
Jsoup provides a series of static analysis methods to generate Document objects
static Document parse(File in, String charsetName) static Document parse(File in, String charsetName, String baseUri) static Document parse(InputStream in, String charsetName, String baseUri) static Document parse(String html) static Document parse(String html, String baseUri) static Document parse(URL url, int timeoutMillis) static Document parseBodyFragment(String bodyHtml) static Document parseBodyFragment(String bodyHtml, String baseUri)
Where baseUri indicates that the retrieved relative URL is relative to baseUriURL
Where charsetName represents the character set
6.Connection connect(String url) Create a connection according to the given url (must be http or https)
Connection provides some methods to crawl web content
Connection cookie(String name, String value) Place cookie when sending request Connection data(Map<String,String> data) pass request parameters Connection data(String... keyvals) pass request parameters Document get() sends a request in get mode and parses the returned result Document post() sends the request in post mode and parses the returned result Connection userAgent(String userAgent) Connection header(String name, String value) Add request header Connection referrer(String referrer) set request source
7.jsoup provides similar JS to get html elements:
getElementById(String id) get element with id getElementsByTag(String tag) Get elements with tags getElementsByClass(String className) Get elements with class getElementsByAttribute(String key) Get elements with attributes At the same time, the following methods are also provided to obtain sibling nodes: siblingElements(), firstElementSibling(), lastElementSibling(); nextElementSibling(), previousElementSibling()
8. Get and set element data
attr(String key) get element data attr(String key, String value) set element data attributes() gets all attributes id(), className() classNames() get the value of id class text() gets the text value text(String value) set the text value html() get html html(String value) set html outerHtml() to get inner html data() get data content tag() gets tag and tagName() gets tagname
9. Operate html elements:
append(String html), prepend(String html) appendText(String text), prependText(String text) appendElement(String tagName), prependElement(String tagName) html(String value)
10. Jsoup also provides selectors similar to JQuery, using selectors to retrieve data
tagname Use the tag name to locate, such as a ns|tag locates the tag using the namespace, e.g. fb:name to find the <fb:name> element #id Use the element id to locate, for example #logo .class is located using the class attribute of the element, such as .head * Position all elements [attribute] uses the attribute of the element to locate, for example [href] means to retrieve all elements with href attribute [^attr] uses the attribute name prefix of the element to locate, for example [^data-] is used to find the dataset attribute of HTML5 [attr=value] uses the attribute value for positioning, for example [width=500] locates all elements with a width attribute value of 500 [attr^=value],[attr$=value],[attr*=value] These three syntaxes respectively represent that the attribute starts with value, ends with value and contains [attr~=regex] Use regular expressions to filter attribute values, such as img[src~=(?i)\.(png|jpe?g)] The above is the most basic selector syntax, which can also be used in combination
11. Combination usage
el#id locates an element with an id value, such as a#logo -> <a id=logo href= … > el.class locates the element whose class is the specified value, such as div.head -> <div class=head>xxxx</div> el[attr] locates all elements that define an attribute, such as a[href] Any combination of the above three, such as a[href]#logo, a[name].outerlink
12. In addition to some basic syntax and the combination of these syntaxes, jsoup also supports the use of expressions for element filtering and selection
:lt(n) For example, td:lt(3) means less than three columns :gt(n) div p:gt(2) means the div contains more than 2 p :eq(n) form input:eq(1) means a form with only one input :has(seletor) div:has(p) indicates the div containing the p element :not(selector) div:not(.logo) Indicates a list of all divs that do not contain class=logo elements :contains(text) An element containing a certain text, case-insensitive, such as p:contains(oschina) :containsOwn(text) The text information is completely equal to the filter of the specified condition :matches(regex) Use regular expressions for text filtering: div:matches((?i)login) :matchesOwn(regex) use a regular expression to find the own text
13. Similar to java script, Jsoup provides the following functions
getElementById(String id) get element by id getElementsByTag(String tag) get elements by tags getElementsByClass(String className) Get elements by class getElementsByAttribute(String key) get elements by attributes
14. At the same time, the following methods are also provided to obtain sibling nodes:
siblingElements(); firstElementSibling(); lastElementSibling(); nextElementSibling(); previousElementSibling();
15. Use the following method to get the data of the element:
attr(String key) get element data attr(String key, String value) set element data attributes() get all attributes id(), className() classNames() get the value of id class text() gets the text value text(String value) set the text value html() get html html(String value) set html outerHtml() to get inner html data() get data content tag() gets tag and tagName() gets tagname
16. The operation html provides the following methods:
append(String html), prepend(String html) appendText(String text), prependText(String text) appendElement(String tagName), prependElement(String tagName) html(String value)
17. Use in the project
17.1. Data passed to the backend
<p style="text-align: center;">Niannujiao Chibi Nostalgia</p><p> On the west side of the old base, the human way is: Zhou Lang Chibi of the Three Kingdoms. The rocks pierced through the sky, the stormy waves hit the shore, and thousands of piles of snow were rolled up. Picturesque, a moment how many hero.</p><p> & amp; nbsp; Feather fans and scarves, while talking and laughing, masts and sculls are wiped out in ashes. If you travel in the motherland, you should laugh at me passionately, and you will be born early. Life is like a dream, and one statue is still in Jiangyue. </p><p><br></p><p>Name<span id="customer_name" title="customer name" sde-model="{ & amp;quot; tableSchema & amp;quot;: & amp;quot; confinement & amp;quot;, & amp;quot; tableName & amp;quot;: & amp;quot;customer_cmt & amp;quot;, & amp;quot;columnName & amp;quot;: & amp;quot;customer_name & amp;quot;, & amp;quot;dataType & amp ;quot;: & amp;quot;text & amp;quot;, & amp;quot;value & amp;quot;: & amp;quot;customer name& amp;quot;}" contenteditable="false" class=" sde-bg"> <span class="sde-left" style="color:#0000FF" contenteditable="false">[</span> <span id="customerName" title="Customer Name" style="color:#000000;" class="sde-value" contenteditable="true">Customer Name</span> <span style="color:#0000FF" contenteditable="false" class="sde-right">]</span></span></p><p>Phone<span id="custom_phone" title="Customer Phone" sde-model="{ & amp;quot;tableSchema & amp;quot;: & amp;quot;confinement & amp;quot;, & amp;quot;tableName & amp;quot;: & amp;quot;customer_cmt & amp;quot;, & amp;quot;columnName & amp;quot;: & amp;quot;custom_phone & amp;quot;, & amp;quot;dataType & amp ;quot;: & amp;quot;text & amp;quot;, & amp;quot;value & amp;quot;: & amp;quot;customer phone number& amp;quot;}" contenteditable="false" class=" sde-bg"> <span class="sde-left" style="color:#0000FF" contenteditable="false">[</span> <span id="customPhone" title="Customer Phone" style="color:#000000;" class="sde-value" contenteditable="true">Customer Phone</span> <span style="color:#0000FF" contenteditable="false" class="sde-right">]</span></span> </p>
17.2. Data obtained from the database by the backend
{"customPhone":"15693167830","customerName":"Zhang Dandan"}
17.3. The backend parses the replaced data
<p style="text-align: center;">Niannujiao Chibi Nostalgia</p> <p> On the west side of the old base, the human way is: Zhou Lang Chibi of the Three Kingdoms. The rocks pierced through the sky, the stormy waves hit the shore, and thousands of piles of snow were rolled up. Picturesque, a moment how many hero.</p> <p> & amp; nbsp; Feather fans and scarves, while talking and laughing, masts and sculls are wiped out in ashes. If you travel in the motherland, you should laugh at me passionately, and you will be born early. Life is like a dream, and one statue is still in Jiangyue. </p> <p><br></p> <p>Name<span id="customer_name" title="customer name" sde-model="{ & amp;quot; tableSchema & amp;quot;: & amp;quot; confinement & amp;quot;, & amp;quot; tableName & amp;quot;: & amp;quot;customer_cmt & amp;quot;, & amp;quot;columnName & amp;quot;: & amp;quot;customer_name & amp;quot;, & amp;quot;dataType & amp ;quot;: & amp;quot;text & amp;quot;, & amp;quot;value & amp;quot;: & amp;quot;customer name& amp;quot;}" contenteditable="false" class=" sde-bg"> <span class="sde-left" style="color:#0000FF" contenteditable="false">[</span> <span id="customerName" title="Customer Name" style="color :#000000;" class="sde-value" contenteditable="true">Zhang Dandan</span> <span style="color:#0000FF" contenteditable="false" class="sde-right">]</span ></span></p> <p>Phone<span id="custom_phone" title="Customer Phone" sde-model="{ & amp;quot;tableSchema & amp;quot;: & amp;quot;confinement & amp;quot;, & amp;quot;tableName & amp;quot;: & amp;quot;customer_cmt & amp;quot;, & amp;quot;columnName & amp;quot;: & amp;quot;custom_phone & amp;quot;, & amp;quot;dataType & amp ;quot;: & amp;quot;text & amp;quot;, & amp;quot;value & amp;quot;: & amp;quot;customer phone number& amp;quot;}" contenteditable="false" class=" sde-bg"> <span class="sde-left" style="color:#0000FF" contenteditable="false">[</span> <span id="customPhone" title="Customer Phone" style="color :#000000;" class="sde-value" contenteditable="true">15693167830</span> <span style="color:#0000FF" contenteditable="false" class="sde-right">]</span ></span></p>
17.4. Backend processing code
// The data that needs to be replaced and filled is obtained from the database, obtained as JSONObject JSONObject json = XXXX // html is the data passed from the front end to the back end String s = new String(html); // Convert to Document via Jsoup Document document = Jsoup. parse(s); // traverse the filling data obtained in the database Iterator<String> iterator = json.keySet().iterator(); while (iterator. hasNext()) { String key = iterator. next(); // assign a value based on the id in the element document.getElementById(key).text(json.getString(key)); } // Get the element through the p tag Elements pEl = document. getElementsByTag("p");
17.4.1. Duplicate id processing
front-end code
<p style="white-space: normal; text-align: center;">Nian Nujiao Chibi Nostalgia</p><p style="white-space: normal;"><br></p><p style="white-space: normal;"> On the west side of the old base, the human way is: Zhou Lang Chibi of the Three Kingdoms. The rocks pierced through the sky, the stormy waves hit the shore, and thousands of piles of snow were rolled up. Picturesque, a moment how many hero.</p><p style="white-space: normal;"> hair. Feather fans and scarves, while talking and laughing, masts and sculls are wiped out in ashes. If you travel in the motherland, you should laugh at me passionately, and you will be born early. Life is like a dream, and one statue is still in Jiangyue.</p><p style="white-space: normal;"><br></p><p style="white-space: normal;">Name<span id="customer_name" title="customer name" sde-model="{ & amp;quot;tableSchema & amp;quot;: & amp;quot;confinement & amp;quot;, & amp;quot;tableName & amp;quot;: & amp;quot;customer_cmt & amp;quot;, & amp;quot;columnName & amp;quot;: & amp; quot;customer_name & amp;quot;, & amp;quot;dataType & amp;quot;: & amp;quot;text & amp;quot;, & amp;quot;value & amp;quot;: & amp;quot; Customer name& amp;quot;}" contenteditable="false" class="sde-bg"> <span class="sde-left" style="color:#0000FF" contenteditable="false">[</span> <span id="customerName" title="Customer Name" style="color:#000000;" class="sde-value" contenteditable="true">Customer Name</span> <span style="color:#0000FF" contenteditable="false" class="sde-right">]</span></span> & amp;nbsp; & amp;nbsp; & amp;nbsp;Emergency Contacts Name<span id="customer_name" title="customer name" sde-model="{ & amp;quot; tableSchema & amp;quot;: & amp;quot; confinement & amp;quot;, & amp;quot; tableName & amp;quot;: & amp;quot;customer_cmt & amp;quot;, & amp;quot;columnName & amp;quot;: & amp;quot;customer_name & amp;quot;, & amp;quot;dataType & amp ;quot;: & amp;quot;text & amp;quot;, & amp;quot;value & amp;quot;: & amp;quot;customer name& amp;quot;}" contenteditable="false" class=" sde-bg"> <span class="sde-left" style="color:#0000FF" contenteditable="false">[</span> <span id="customerName" title="Customer Name" style="color:#000000;" class="sde-value" contenteditable="true">Customer Name</span> <span style="color:#0000FF" contenteditable="false" class="sde-right">]</span></span></p><p style="white-space: normal;">family name<span id="customer_name" title="customer name" sde-model="{ & amp;quot;tableSchema & amp;quot;: & amp;quot;confinement & amp;quot;, & amp;quot;tableName & amp;quot;: & amp;quot;customer_cmt & amp;quot;, & amp;quot;columnName & amp;quot;: & amp ;quot;customer_name & amp;quot;, & amp;quot;dataType & amp;quot;: & amp;quot;text & amp;quot;, & amp;quot;value & amp;quot;: & amp;quot ;Customer name& amp;quot;}" contenteditable="false" class="sde-bg"> <span class="sde-left" style="color:#0000FF" contenteditable="false">[</span> <span id="customerName" title="Customer Name" style="color:#000000;" class="sde-value" contenteditable="true">Customer Name</span> <span style="color:#0000FF" contenteditable="false" class="sde-right">]</span></span> & amp;nbsp; & amp;nbsp; & amp;nbsp; Phone<span id="custom_phone" title="customer phone" sde-model="{ & amp;quot;tableSchema & amp;quot;: & amp;quot;confinement & amp;quot;, & amp;quot;tableName & amp; quot;: "customer_cmt ", "columnName ": "custom_phone ", "dataType " : & amp;quot;text & amp;quot;, & amp;quot;value & amp;quot;: & amp;quot;customer phone& amp;quot;}" contenteditable="false" class="sde-bg "> <span class="sde-left" style="color:#0000FF" contenteditable="false">[</span> <span id="customPhone" title="Customer Phone" style="color:#000000;" class="sde-value" contenteditable="true">Customer Phone</span> <span style="color:#0000FF" contenteditable="false" class="sde-right">]</span></span></p><p style="white-space: normal;">Emergency Contact Phone<span id="custom_phone" title="Customer Phone" sde-model="{ & amp;quot;tableSchema & amp;quot ;: & amp;quot;confinement & amp;quot;, & amp;quot;tableName & amp;quot;: & amp;quot;customer_cmt & amp;quot;, & amp;quot;columnName & amp;quot;: & amp;quot;custom_phone & amp;quot;, & amp;quot;dataType & amp;quot;: & amp;quot;text & amp;quot;, & amp;quot;value & amp;quot;: & amp ;quot;Customer phone number& amp;quot;}" contenteditable="false" class="sde-bg"> <span class="sde-left" style="color:#0000FF" contenteditable="false">[</span> <span id="customPhone" title="Customer Phone" style="color:#000000;" class="sde-value" contenteditable="true">Customer Phone</span> <span style="color:#0000FF" contenteditable="false" class="sde-right">]</span></span> & amp;nbsp; & amp;nbsp; Family Phone<span id="custom_phone " title="Customer Phone" sde-model="{ & amp;quot;tableSchema & amp;quot;: & amp;quot;confinement & amp;quot;, & amp;quot;tableName & amp;quot;: & amp;quot;customer_cmt & amp;quot;, & amp;quot;columnName & amp;quot;: & amp;quot;custom_phone & amp;quot;, & amp;quot;dataType & amp;quot;: & amp; quot;text & amp;quot;, & amp;quot;value & amp;quot;: & amp;quot;customer phone& amp;quot;}" contenteditable="false" class="sde-bg"> <span class="sde-left" style="color:#0000FF" contenteditable="false">[</span> <span id="customPhone" title="Customer Phone" style="color:#000000;" class="sde-value" contenteditable="true">Customer Phone</span> <span style="color:#0000FF" contenteditable="false" class="sde-right">]</span></span> </p>
Document document = Jsoup. parse(s); Elements pList = document. getElementsByTag("p"); // Mainly deal with data repetition, that is, label id repetition for(Element p:pList){ // Repeat across lines Iterator<String> iterator = map.keySet().iterator(); while (iterator. hasNext()) { String key = iterator. next(); Element elementById = p.getElementById(key); if(elementById != null){ Elements span = p. getElementsByTag("span"); for(Element sp:span){ // Repeat in the same row Element byId = sp. getElementById(key); if(byId !=null){ byId.text(map.get(key).toString()); } } } } }
17.5. The style when the front end is passed in
17.6. The style returned after backend processing