Recently, I was asked to make a rich text request, which requires converting the document content into rich text. The format, style, and pictures in the document must be displayed consistently; I have stepped on many pitfalls. It is said that the word document is actually a compressed Bao, I don’t know it very well but I can understand it. I just use it as a reference. Please don’t criticize me.
I won’t say anything else, let’s look at the code; there are two ways to export images. The larger one uses the base that comes with jdk8.
64, there is a difference in size. If it is the same picture, the difference in my actual measured picture is about 200k. If required, you can quote it interchangeably; the jar reference pom contains
<!--Please keep the version consistent poi poi-ooxml poi-scratchpad--> <dependency> <groupId>org.apache.poi</groupId> <artifactId>poi</artifactId> <version>4.1.2</version> </dependency> <!-- Operation doc ppt xls --> <dependency> <groupId>org.apache.poi</groupId> <artifactId>poi-scratchpad</artifactId> <version>4.1.2</version> </dependency> <!-- Operation docx pptx xlsx --> <dependency> <groupId>org.apache.poi</groupId> <artifactId>poi-ooxml</artifactId> <version>4.1.2</version> </dependency> <dependency> <groupId>fr.opensagres.xdocreport</groupId> <artifactId>fr.opensagres.poi.xwpf.converter.xhtml</artifactId> <version>2.0.2</version> </dependency>
import fr.opensagres.poi.xwpf.converter.core.BasicURIResolver; import fr.opensagres.poi.xwpf.converter.core.FileImageExtractor; import fr.opensagres.poi.xwpf.converter.xhtml.XHTMLConverter; import fr.opensagres.poi.xwpf.converter.xhtml.XHTMLOptions; import lombok.extern.slf4j.Slf4j; import org.apache.poi.hwpf.HWPFDocument; import org.apache.poi.hwpf.converter.WordToHtmlConverter; import org.apache.poi.hwpf.usermodel.PictureType; import org.apache.poi.xwpf.usermodel.XWPFDocument; import org.springframework.web.multipart.MultipartFile; import org.w3c.dom.Document; import sun.misc.BASE64Encoder; import javax.imageio.ImageIO; import javax.xml.parsers.DocumentBuilderFactory; import javax.xml.parsers.ParserConfigurationException; import javax.xml.transform.OutputKeys; import javax.xml.transform.Transformer; import javax.xml.transform.TransformerException; import javax.xml.transform.TransformerFactory; import javax.xml.transform.dom.DOMSource; import javax.xml.transform.stream.StreamResult; import java.awt.image.BufferedImage; import java.io.*; import java.util.Base64; /** * @author:Xiaoning Fan * @date: Created in 2023-10-16 3:49 pm * @description: Upload the word document and convert it into an html string and return it, keeping the style unchanged, and replacing the image with base64 * @version: 1.0 */ @Slf4j public class WordToHtmlStringConverter { /** * wordToHtml * * @return * @throwsIOException * @throws ParserConfigurationException * @throws TransformerException */ public static String wordToHtml(MultipartFile file) { //Extract the word document name and suffix String filename = file.getOriginalFilename(); try { if (filename.endsWith(".docx")) { // Pass the uploaded file into Document conversion return new WordToHtmlStringConverter().docxToHtmlText(file); } else if (filename.endsWith(".doc")) { return new WordToHtmlStringConverter().docToHtmlText(file); } else { log.error("Unsupported file format!"); return null; } } catch (FileNotFoundException e) { log.error("File not found exception!"); e.printStackTrace(); } catch (IOException e) { log.error("io conversion exception!"); e.printStackTrace(); } catch (Exception e) { log.error("File conversion exception!"); e.printStackTrace(); } return null; } /** * Upload Word document and return parsed Html */ public static String docToHtmlText(MultipartFile file) throws Exception { //Use character array stream to get the parsed content ByteArrayOutputStream baos = new ByteArrayOutputStream(); OutputStream outStream = new BufferedOutputStream(baos); try { //Pass the uploaded file into Document conversion HWPFDocument wordDocument = new HWPFDocument(file.getInputStream()); Document document = DocumentBuilderFactory.newInstance().newDocumentBuilder().newDocument(); WordToHtmlConverter wordToHtmlConverter = new WordToHtmlConverter(document); //Upload the read image and add the link address wordToHtmlConverter.setPicturesManager((imageStream, pictureType, name, width, height) -> { try { //First, determine whether the image can be recognized if (pictureType.equals(PictureType.UNKNOWN)) { return "[Unrecognized picture]"; } //Convert the image file here to Base64 return Base64.getEncoder().encodeToString(imageStream).trim(); } catch (Exception e) { log.info("upload exception", e); } return "[Picture upload failed]"; }); // Convert word document to Html document wordToHtmlConverter.processDocument(wordDocument); Document htmlDocument = wordToHtmlConverter.getDocument(); DOMSource domSource = new DOMSource(htmlDocument); StreamResult streamResult = new StreamResult(outStream); TransformerFactory factory = TransformerFactory.newInstance(); Transformer serializer = factory.newTransformer(); serializer.setOutputProperty(OutputKeys.ENCODING, "utf-8"); serializer.setOutputProperty(OutputKeys.INDENT, "yes"); serializer.setOutputProperty(OutputKeys.METHOD, "html"); serializer.transform(domSource, streamResult); String content = baos.toString(); log.info("docToHtmlText--->{}", content); return content; } catch (Exception e) { log.error("docToHtmlText exception", e); } finally { baos.close(); outStream.close(); } return null; } /** * Upload docx document and return parsed Html */ public static String docxToHtmlText(MultipartFile file) throws Exception { ByteArrayOutputStream htmlStream = new ByteArrayOutputStream(); ByteArrayOutputStream htmlImg = new ByteArrayOutputStream(); String htmlStr = null; try { // Pass the uploaded file into Document conversion XWPFDocument docxDocument = new XWPFDocument(file.getInputStream()); XHTMLOptions options = XHTMLOptions.create(); //Set the image storage path String path = System.getProperty("java.io.tmpdir"); String firstImagePathStr = path + "/" + System.currentTimeMillis(); options.setExtractor(new FileImageExtractor(new File(firstImagePathStr))); options.URIResolver(new BasicURIResolver(firstImagePathStr)); //convert html docxDocument.createNumbering(); XHTMLConverter.getInstance().convert(docxDocument, htmlStream, options); htmlStr = htmlStream.toString(); String middleImageDirStr = "/word/media"; String imageDirStr = firstImagePathStr + middleImageDirStr; File imageDir = new File(imageDirStr); String[] imageList = imageDir.list(); if (imageList != null) { for (int i = 0; i < imageList.length; i + + ) { try { String oneImagePathStr = imageDirStr + "/" + imageList[i]; File fileImage = new File(oneImagePathStr); if (fileImage.exists()) { log.info("Picture processing begins..."); // Process the image into Base64 format //Read the image byte array InputStream in = new FileInputStream(fileImage); byte[] data = new byte[in.available()]; in.read(data); String encode = new BASE64Encoder().encode(data); log.info("Picture processing ended..." + encode); //Modify the image information in the document htmlStr = htmlStr.replace(oneImagePathStr, "data:image/png;base64," + encode); /* BufferedImage bi = ImageIO.read(fileImage);//The image storage size is relatively large ByteArrayOutputStream baos = new ByteArrayOutputStream(); ImageIO.write(bi, "png", baos); byte[] bytes = baos.toByteArray(); String sd = Base64.getEncoder().encodeToString(bytes).trim(); log.info("Picture processing ended..." + sd); htmlStr = htmlStr.replace(oneImagePathStr, "data:image/png;base64," + sd);*/ } } catch (Exception e) { log.info("upload docxToHtmlText exception", e); } } } log.info("Processing result: {}", htmlStr); } catch (Exception e) { log.error("docxToHtmlText parsing exception", e); } finally { if (htmlStream != null) { htmlStream.close(); } return htmlStr; } } }
Just quote it directly, but one thing is that you must pay attention when the interface returns. If you directly return to the page interface, you need to add
@ResponseBody Otherwise it would be a tragedy; of course it doesn’t matter if it is stored directly
This time, let’s just do this, enjoy yourself, don’t be merciful! !