Foreword
Most of this article is excerpted from IBM developerworks (mainly theory), and the following three articles are detailed. The excerpt is mainly to make myself understand a little deeper, just as a note…It is also a reference for future use again! The excerpt is not comprehensive, the content of the original text should be much richer, see the original text for details.
Reference article:
Parsing XML with StAX, Part 1: Introduction to the Streaming API for XML (StAX): http://www.ibm.com/developerworks/cn/xml/x-stax1.html
Parsing XML with StAX, Part 2: Pull parsing and events: http://www.ibm.com/developerworks/cn/xml/x-stax2.html
Parsing XML with StAX, Part 3: Using custom events and writing XML: http://www.ibm.com/developerworks/cn/xml/x-stax3.html
—————
Original link: https://blog.csdn.net/zhyh1986/article/details/8528649
The description of StAX will not be described too much, let me talk about the problems I encountered in parsing the xml file
Requirement:
I want to parse all the content tagged as entity in a 4GB xml file, including nested sub-tags and content, and write the parsed entity data evenly into 7 new xml files
There are generally four ways to parse xml:
- DOM parsing
- SAX parsing
- DOM4J Analysis
- JDOM parsing
The pros and cons of these four methods are compared:
1. SAX parsing (Simple API for XML)
SAX analysis method: scan the document line by line, and analyze while scanning. Compared with DOM, SAX can stop parsing and parsing at any time when parsing documents, which is a faster and more efficient method.
Advantages: There is no need to transfer the entire document in advance, and it takes up less resources. Parsing can start immediately, fast and without memory pressure.
Disadvantage: cannot modify the node
Applicable: reading XML files
2. DOM analysis (Document Object Model)
DOM parsing method: defines a set of interfaces for parsing XML documents. The parser reads in the entire document and builds a tree structure in memory, which can then be manipulated using the DOM interface.
Advantages: The entire document tree is in memory, easy to operate; supports multiple functions such as deletion, modification, rearrangement, etc.
Disadvantage: If the file is relatively large and the memory is under pressure, the parsing time will be longer. Bringing the entire document into memory (including useless nodes), wasting time and space.
Applicable: modifying XML data
3. JDOM
JDOM is a pure java api for processing xml. It uses concrete classes instead of interfaces. JDOM has tree traversal and SAX java rules. JDOM is different from DOM in two main aspects.
First, JDOM only uses concrete classes and not interfaces. This simplifies the API in some ways, but also limits flexibility.
Second, the API makes extensive use of the Collections class, simplifying its use for Java developers who are already familiar with these classes.JDOM itself does not contain a parser. It typically uses a SAX2 parser to parse and validate the input XML document (although it can also take as input a previously constructed DOM representation). It contains converters to output JDOM representations as SAX2 event streams, DOM models or XML text documents.
Advantages: 1. It is a tree-based Java API for processing xml, and loads the tree into memory.
2. There is no backward compatibility restriction, so it is simpler than DOM.
3. Fast speed.
4. Java rules with SAX.
Disadvantages: 1. Cannot handle documents larger than memory.
2. JDOM represents the logical model of XML documents, and cannot guarantee that each byte is truly transformed.
3. Does not provide any actual model of DTD and schema for instance documents.
4. It does not support corresponding traversal packages in DOM.
4. DOM4J
DOM4J has a more complex API, so dom4j has greater flexibility than jdom. DOM4J has the best performance, and even Sun’s JAXM is also using DOM4J. At present, many open source projects use DOM4J in large numbers, such as the famous Hibernate also uses DOM4J to read Take the XML configuration file. If portability is not a concern, use DOM4J.
Advantages: highest flexibility, ease of use and powerful functions, excellent performance
Disadvantages: complex api, poor portability
I have basically tried the above four methods to analyze the above requirements
The first one is DOM parsing, but this method can only parse smaller xml files, and if it is too large, it will cause memory overflow because it loads the entire document at once.
I have used DOM4J and SAX later, but due to the memory problem of the computer system, the problem of JVM memory overflow will still be reported.
There is no way, and finally found a way that StAX can also parse large XML files
Intercept a part of the xml file to be parsed:
<?xml version='1.0' encoding='UTF-8'?> <gwl> <version>20230417084108</version> <entities> <entity id="1123831" version="20230414163503"> <name>ALMOND, LINCOLN CARTER</name> <listId>1021</listId> <listCode>USP</listCode> <entityType>03</entityType> <createdDate>09/02/2004</createdDate> <lastUpdateDate>04/14/2023</lastUpdateDate> <source>USP</source> <OriginalSource>PEP</OriginalSource> <dobs> <dob Y="1936">06/16/1936</dob> </dobs> <pobs> <pob>Pawtucket, Rhode Island, United States</pob> </pobs> <titles> <title>FORMER GOVERNOR OF RHODE ISLAND (JANUARY 3, 1995 - JANUARY 7, 2003). DECEASED JANUARY 02, 2023.</title> </titles> <sdfs> <sdf name="OtherInformation">Career: Governor of Rhode Island (January 03, 1995 - January 07, 2003); United State Attorney for the District of Rhode Island (October 09, 1981 - January 20, 1993); State Attorney for the District of Rhode Island (1969 - 1978).</sdf> <sdf name="DirectID">https://accuity.worldcompliance.com/signin.aspx?ent=d14d930f-7943-4363-b4d0-aa2c59437e1b</sdf> <sdf name="EffectiveDate">1981</sdf> <sdf name="EntityLevel">State</sdf> <sdf name="ExpirationDate">1993</sdf> <sdf name="Gender">MALE</sdf> <sdf name="NameSource">Website</sdf> <sdf name="Org_PID">1706394</sdf> <sdf name="OriginalID">7031</sdf> <sdf name="Relationship">Father</sdf> <sdf name="SubCategory">Former PEP</sdf> </sdfs> <addresses> <address> <country>US</country> <countryName>UNITED STATES</countryName> </address> </addresses> </entity> <entity id="1124766" version="20230414163503"> <name>BAUCUS, MAX SIEBEN</name> <listId>1021</listId> <listCode>USP</listCode> <entityType>03</entityType> <createdDate>09/02/2004</createdDate> <lastUpdateDate>04/14/2023</lastUpdateDate> <source>USP</source> <OriginalSource>PEP</OriginalSource> <dobs> <dob Y="1941">12/11/1941</dob> </dobs> <pobs> <pob>Helena, Montana, United States</pob> </pobs> <aliases> <alias type="Alias">ENKE, MAX SIEBEN</alias> </aliases> <titles> <title>FORMER AMBASSADOR OF THE UNITED STATES TO CHINA (MARCH 20, 2014 - JANUARY 16, 2017).</title> </titles> <sdfs> <sdf name="OtherInformation">Political Party: Democratic. Career: Ambassador Extraordinary and Plenipotentiary of the United States to China, (March 20, 2014 - January 16, 2017); Member of the United States Congress, Senate from Montana (December 15, 1978 - February 06, 2014);</sdf> <sdf name="DirectID">https://accuity.worldcompliance.com/signin.aspx?ent=945fd382-f5b7-42c4-ad1f-a40c4bf0e285</sdf> <sdf name="EffectiveDate">1978</sdf> <sdf name="EntityLevel">National</sdf> <sdf name="ExpirationDate">2014</sdf> <sdf name="Gender">MALE</sdf> <sdf name="NameSource">Website</sdf> <sdf name="Org_PID">548118</sdf> <sdf name="OriginalID">7542</sdf> <sdf name="Relationship">Brother</sdf> <sdf name="SubCategory">Former PEP</sdf> </sdfs> <addresses> <address> <country>US</country> <countryName>UNITED STATES</countryName> <province>WASHINGTON, DC</province> <postalCode>20515</postalCode> </address> <address> <country>US</country> <countryName>UNITED STATES</countryName> <province>WASHINGTON, D.C.</province> <postalCode>20510</postalCode> </address> <address> <address1>55 ANJIALOU RD</address1> <city>BEIJING</city> <country>CN</country> <countryName>CHINA</countryName> <postalCode>100600</postalCode> </address> </addresses> </entity> <entity id="1124842" version="20230414163503"> <name>THOMAS, CRAIG LYLE</name> <listId>1021</listId> <listCode>USP</listCode> <entityType>03</entityType> <createdDate>09/02/2004</createdDate> <lastUpdateDate>04/14/2023</lastUpdateDate> <source>USP</source> <OriginalSource>PEP</OriginalSource> <dobs> <dob Y="1933">02/17/1933</dob> </dobs> <pobs> <pob>Cody, Wyoming, United States</pob> </pobs> <titles> <title>FORMER MEMBER OF THE UNITED STATES CONGRESS (JANUARY 03, 1995 - JUNE 04, 2007). DECEASED JUNE 04, 2007.</title> </titles> <sdfs> <sdf name="OtherInformation">Political Party: Republican. Career: Member of the United States Congress, Senate, Class I (January 03, 1995 - June 04, 2007); Member of the United States Congress, House of Representatives , At-Large (April 27, 1989 - January 03, 1995). Member of the</sdf> <sdf name="DirectID">https://accuity.worldcompliance.com/signin.aspx?ent=4e7b1050-36b5-4b1c-9037-c2349c519d40</sdf> <sdf name="EffectiveDate">1989</sdf> <sdf name="EntityLevel">National</sdf> <sdf name="ExpirationDate">1995</sdf> <sdf name="Gender">MALE</sdf> <sdf name="NameSource">Website</sdf> <sdf name="Org_PID">1817490</sdf> <sdf name="OriginalID">7629</sdf> <sdf name="Relationship">Father</sdf> <sdf name="SubCategory">Former PEP</sdf> </sdfs> <addresses> <address> <country>US</country> <countryName>UNITED STATES</countryName> <province>WASHINGTON D.C.</province> <postalCode>20510</postalCode> </address> <address> <address1>200 WEST 24TH STREET</address1> <city>CHEYENNE</city> <state>WY</state> <stateName>WYOMING</stateName> <country>US</country> <countryName>UNITED STATES</countryName> <postalCode>82002</postalCode> </address> </addresses> </entity> <entity id="1125230" version="20230414163051"> <name>PATRIAT, FRANCOIS</name> <listId>1020</listId> <listCode>PEP</listCode> <entityType>03</entityType> <createdDate>09/02/2004</createdDate> <lastUpdateDate>04/14/2023</lastUpdateDate> <source>PEP</source> <OriginalSource>PEP</OriginalSource> <dobs> <dob Y="1943">03/21/1943</dob> </dobs> <pobs> <pob>Semur-en-Auxois, , France</pob> </pobs> <titles> <title>MEMBER OF THE FRENCH PARLIAMENT (OCTOBER 01, 2008 - 2026).</title> </titles> <sdfs> <sdf name="OtherInformation">Political party: La Republique en marche (LREM) (currently known as Renaissance). Career: Member of the Executive Bureau of La Republique en Marche (LREM), The Republic on the Move (currently known as Renaissance), effective from November 18, 2017;</sdf> <sdf name="DirectID">https://accuity.worldcompliance.com/signin.aspx?ent=a4ffd4f3-5c75-440b-aeca-4e3a7d2ef642</sdf> <sdf name="EffectiveDate">2008</sdf> <sdf name="EntityLevel">National</sdf> <sdf name="ExpirationDate">2026</sdf> <sdf name="Gender">MALE</sdf> <sdf name="NameSource">Website</sdf> <sdf name="Org_PID">3759009</sdf> <sdf name="OriginalID">8117</sdf> <sdf name="Relationship">Associate</sdf> <sdf name="SubCategory">Govt Branch Member</sdf> </sdfs> <addresses> <address> <address1>15, RUE DE VAUGIRARD</address1> <city>PARIS</city> <country>FR</country> <countryName>FRANCE</countryName> <postalCode>75291</postalCode> </address> </addresses> </entity> <entity id="1125282" version="20230414163052"> <name>BENOUTIQ, ABDELKRIM</name> <listId>1020</listId> <listCode>PEP</listCode> <entityType>03</entityType> <createdDate>09/02/2004</createdDate> <lastUpdateDate>04/14/2023</lastUpdateDate> <source>PEP</source> <OriginalSource>PEP</OriginalSource> <dobs> <dob Y="1959">08/19/1959</dob> </dobs> <pobs> <pob>Rabat, Rabat-Sale-Kenitra Region, Morocco</pob> </pobs> <aliases> <alias type="Alias">BEN ATIQ, ABDELKRIM</alias> <alias type="Alias">BENATIQ, ABDELKRIM</alias> </aliases> <nativeCharNames> <nativeCharName charSet="" latinCharName="BEN ATIQ, ABDELKRIM" type="Alias">? ?</nativeCharName> <nativeCharName charSet="" latinCharName="BENATIQ, ABDELKRIM" type="Alias">? ?</nativeCharName> <nativeCharName charSet="" latinCharName="BENOUTIQ, ABDELKRIM" type="Primary">? ?</nativeCharName> </nativeCharNames> <titles> <title>FORMER MEMBER OF THE POLITICAL BUREAU OF SOCIALIST UNION OF POPULAR FORCES PARTY, MOROCCO, ELECTED JUNE 10, 2017, EFFECTIVE UNTIL APRIL 24, 2022.</title> </titles> <sdfs> <sdf name="OtherInformation">Political Party: Union Socialiste Des Forces Populaires (USFP) Career: Member of the Political Bureau of Union Socialiste Des Forces Populaires (USFP), Socialist Union of Popular Forces Party, elected June 10, 2017 , effective until April 24, 2022;</sdf> <sdf name="DirectID">https://accuity.worldcompliance.com/signin.aspx?ent=35f8bcea-6169-4a8f-9715-81de730d1c17</sdf> <sdf name="EffectiveDate">2000</sdf> <sdf name="EntityLevel">National</sdf> <sdf name="ExpirationDate">2001</sdf> <sdf name="Gender">MALE</sdf> <sdf name="NameSource">Website</sdf> <sdf name="OriginalID">8181</sdf> <sdf name="SubCategory">Former PEP</sdf> </sdfs> <addresses> <address> <address1>9, AVENUE AL ARAAR</address1> <city>RABAT</city> <country>MA</country> <countryName>MOROCCO</countryName> <province>RABAT-SALE-KENITRA REGION</province> </address> <address> <address1>AVENUE F. ROOSEVELT</address1> <city>RABAT</city> <country>MA</country> <countryName>MOROCCO</countryName> <province>RABAT-SALE-KENITRA REGION</province> </address> <address> <address1>NO. 9 ARAR STREET</address1> <city>RABAT</city> <country>MA</country> <countryName>MOROCCO</countryName> <province>RABAT-SALE-KENITRA REGION</province> </address> </addresses> </entity> <entity id="1125443" version="20230414163053"> <name>OLLING, SVEND</name> <listId>1020</listId> <listCode>PEP</listCode> <entityType>03</entityType> <createdDate>09/02/2004</createdDate> <lastUpdateDate>04/14/2023</lastUpdateDate> <source>PEP</source> <OriginalSource>PEP</OriginalSource> <dobs> <dob Y="1967">11/09/1967</dob> </dobs> <pobs> <pob>Glostrup, , Denmark</pob> </pobs> <titles> <title>AMBASSADOR OF DENMARK TO SOUTH KOREA, AS OF MARCH 30, 2023.</title> </titles> <sdfs> <sdf name="OtherInformation">Career: Ambassador of Denmark to South Korea, as of March 30, 2023; Ambassador of Denmark to Egypt, as of May 28, 2020, expiration reported March 20, 2023; Non-Resident Ambassador of Denmark to Azerbaijan, effective from March 26, 2017, expiration</sdf> <sdf name="DirectID">https://accuity.worldcompliance.com/signin.aspx?ent=ef160921-f06b-4942-9527-0ee7565467c0</sdf> <sdf name="EffectiveDate">2023</sdf> <sdf name="EntityLevel">International</sdf> <sdf name="Gender">MALE</sdf> <sdf name="NameSource">Website</sdf> <sdf name="Org_PID">8698914</sdf> <sdf name="OriginalID">8384</sdf> <sdf name="Relationship">Father</sdf> <sdf name="SubCategory">Diplomat</sdf> </sdfs> <addresses> <address> <address1>416, HANGANG-DAERO, JUNG-GU</address1> <city>SEOUL</city> <country>KR</country> <countryName>KOREA, REPUBLIC OF</countryName> <postalCode>04637</postalCode> </address> <address> <address1>TURAN GUENES BULVARI 106</address1> <city>ANKARA</city> <country>TR</country> <countryName>TURKEY</countryName> <postalCode>06550</postalCode> </address> <address> <address1>ASIATISK PLADS 2</address1> <city>COPENHAGEN</city> <country>DK</country> <countryName>DENMARK</countryName> <postalCode>1448</postalCode> </address> <address> <address1>NORTH AVENUE</address1> <city>DHAKA</city> <country>BD</country> <countryName>BANGLADESH</countryName> <postalCode>1212</postalCode> </address> <address> <city>CAIRO</city> <country>EG</country> <countryName>EGYPT</countryName> </address> </addresses> </entity> <entity id="1125610" version="20230414163054"> <name>TAKAHASHI, KOICHI</name> <listId>1020</listId> <listCode>PEP</listCode> <entityType>03</entityType> <createdDate>09/02/2004</createdDate> <lastUpdateDate>04/14/2023</lastUpdateDate> <source>PEP</source> <OriginalSource>PEP</OriginalSource> <dobs> <dob Y="1944">1944</dob> </dobs> <nativeCharNames> <nativeCharName charSet="" latinCharName="TAKAHASHI, KOICHI" type="Primary">たかはしこういち</nativeCharName> <nativeCharName charSet="" latinCharName="TAKAHASHI, KOICHI" type="Primary">Takahashi Hengichi</nativeCharName> </nativeCharNames> <titles> <title>FORMER AMBASSADOR OF JAPAN TO THE CZECH REPUBLIC (FEBRUARY 03, 2003 - 2005).</title> </titles> <sdfs> <sdf name="OtherInformation">Career: Ambassador of Japan to the Czech Republic (February 03, 2003 - 2005); Deputy Vice-Minister in charge of Immigration Bureau, Ministry of Justice (1999 - 2001); Consul-General of Japan to Berlin City, Germany (1995 - 1997); Minister of Japan to</sdf> <sdf name="DirectID">https://accuity.worldcompliance.com/signin.aspx?ent=9b2a063e-8d55-4806-b2f2-f2c79d815a33</sdf> <sdf name="EffectiveDate">1999</sdf> <sdf name="EntityLevel">National</sdf> <sdf name="ExpirationDate">2001</sdf> <sdf name="Gender">MALE</sdf> <sdf name="NameSource">Website</sdf> <sdf name="OriginalID">8483</sdf> <sdf name="SubCategory">Former PEP</sdf> </sdfs> <addresses> <address> <country>JP</country> <countryName>JAPAN</countryName> </address> </addresses> </entity> <entity id="1125925" version="20230414163054"> <name>PINTER, SANDOR</name> <listId>1020</listId> <listCode>PEP</listCode> <entityType>03</entityType> <createdDate>09/02/2004</createdDate> <lastUpdateDate>04/14/2023</lastUpdateDate> <source>PEP</source> <OriginalSource>PEP</OriginalSource> <dobs> <dob Y="1948">07/03/1948</dob> </dobs> <pobs> <pob>Budapest, , Hungary</pob> </pobs> <titles> <title>DEPUTY PRIME MINISTER OF HUNGARY, EFFECTIVE FROM MAY 04, 2018.</title> </titles> <sdfs> <sdf name="OtherInformation">Career: Deputy Prime Minister, effective from May 04, 2018; Minister of Interior, effective from May 29, 2010; Minister of Interior (July 08, 1998 - May 27, 2002); of the Hungarian National Police (September 18, 1991 - 1996).</sdf> <sdf name="DirectID">https://accuity.worldcompliance.com/signin.aspx?ent=cd135a22-6242-4999-bc6f-5aae5b0f92e2</sdf> <sdf name="EffectiveDate">2018</sdf> <sdf name="EntityLevel">National</sdf> <sdf name="Gender">MALE</sdf> <sdf name="NameSource">Website</sdf> <sdf name="Org_PID">2544374</sdf> <sdf name="OriginalID">11549</sdf> <sdf name="Relationship">Father</sdf> <sdf name="SubCategory">Govt Branch Member</sdf> </sdfs> <addresses> <address> <address1>TEVE U. 4-6.</address1> <city>BUDAPEST</city> <country>HU</country> <countryName>HUNGARY</countryName> <postalCode>1139</postalCode> </address> <address> <address1>JOZSEF ATTILA U. 2-4.</address1> <city>BUDAPEST</city> <country>HU</country> <countryName>HUNGARY</countryName> <postalCode>1051</postalCode> </address> </addresses> </entity> </entities> </gwl>
The following is to use the StAX parsing method to parse out all the content tagged as entity in the above xml file, and evenly write it into 7 new xml files, and each new xml file is a custom fixed format:
import java.io.FileInputStream; import java.io.FileOutputStream; import java.io.InputStream; import java.io.OutputStream; import javax.xml.stream.XMLInputFactory; import javax.xml.stream.XMLOutputFactory; import javax.xml.stream.XMLStreamConstants; import javax.xml.stream.XMLStreamException; import javax.xml.stream.XMLStreamReader; import javax.xml.stream.XMLStreamWriter; public class StAXParserTest { public static void main(String[] args) { String inputFile = "D:\Desktop\PEP\ENTITY.XML"; // input XML file path String outputPrefix = "D:\Desktop\PEP\"; // output XML file prefix int numFiles = 7; // number of new files try { // create XML input factory and reader XMLInputFactory inputFactory = XMLInputFactory. newInstance(); // create input stream InputStream inputStream = new FileInputStream(inputFile); //Create XMLStreamReader using input factory XMLStreamReader reader = inputFactory. createXMLStreamReader(inputStream); // create XML output factory and writer array XMLOutputFactory outputFactory = XMLOutputFactory. newInstance(); //Create an array of output streams: OutputStream[] outputStreams = new OutputStream[numFiles]; //Create XMLStreamWriter array XMLStreamWriter[] writers = new XMLStreamWriter[numFiles]; for (int i = 0; i < numFiles; i ++ ) { String outputFileName = outputPrefix + (i + 1) + ".xml"; outputStreams[i] = new FileOutputStream(outputFileName); writers[i] = outputFactory.createXMLStreamWriter(outputStreams[i]); //Start to write the XML file at the beginning of the head, such as: <?xml version='1.0' encoding='UTF-8'?> writers[i].writeStartDocument("UTF-8", "1.0"); //Here is a carriage return added writers[i].writeCharacters("\\ "); //Created the GWL tag writers[i].writeStartElement("gwl"); writers[i].writeCharacters("\\ "); //Create a Version tag and add a value inside the Version tag writers[i].writeStartElement("version"); writers[i].writeCharacters("20230417084108"); //Version tag ends, add back tag</Version> writers[i].writeEndElement(); writers[i].writeCharacters("\\ "); writers[i].writeStartElement("entities"); } // parse XML and write to new file int currentFileIndex = 0; int entityCount = 0; while (reader. hasNext()) { int event = reader. next(); switch (event) { case XMLStreamConstants.START_ELEMENT: String elementName = reader. getLocalName(); if ("entity".equals(elementName)) { // parse the entity element and its child elements writeEntityElement(reader, writers[currentFileIndex]); entityCount++; // switch to the next file currentFileIndex = (currentFileIndex + 1) % numFiles; } break; } } // close the writer and output stream for (int i = 0; i < numFiles; i ++ ) { writers[i].writeCharacters("\\ "); //entities return label writers[i].writeEndElement(); // entities writers[i].writeCharacters("\\ "); //gwl back tab writers[i].writeEndElement(); // gwl writers[i].writeCharacters("\\ "); writers[i].writeEndDocument(); writers[i].flush(); writers[i].close(); outputStreams[i].close(); } // close the input stream inputStream. close(); System.out.println("total number of entities: " + entityCount); System.out.println("Entities per file: " + (entityCount / numFiles)); } catch (Exception e) { e.printStackTrace(); } } private static void writeEntityElement(XMLStreamReader reader, XMLStreamWriter writer) throws XMLStreamException { writer.writeCharacters("\\ "); //Start writing to the Entity tag writer.writeStartElement("entity"); // Write the attributes of the entity element int attributeCount = reader. getAttributeCount(); //Read the attribute value in the entity tag: attributeName is id/version attributeValue is value for (int i = 0; i < attributeCount; i ++ ) { String attributeName = reader. getAttributeLocalName(i); String attributeValue = reader. getAttributeValue(i); writer.writeAttribute(attributeName, attributeValue); } // parse the child elements of the entity element while (reader. hasNext()) { int event = reader. next(); switch (event) { case XMLStreamConstants.START_ELEMENT: //Get the name of the element that is currently starting String childElementName = reader. getLocalName(); //code to write start element writer.writeStartElement(childElementName); break; case XMLStreamConstants.END_ELEMENT: String endElementName = reader. getLocalName(); //code to write end element writer.writeEndElement(); if ("entity".equals(endElementName)) { // The entity element is parsed and the writing is finished return; } break; case XMLStreamConstants.CHARACTERS: String text = reader. getText(); writer. writeCharacters(text); break; } } } }
There are a total of 8 entity elements in the xml file intercepted by the above example. After the parsing is completed, one of each of the 7 xml files will be stored on average, and the extra one will be stored in turn. Therefore, there are 2 in the first xml file. There is only one piece of data in the other 6
I have completely parsed the 4GB Entity.xml file, there is no problem of memory overflow, and the parsing speed is also very fast!