This article mainly introduces PDFBox to operate PDF documents such as creating PDF documents, loading PDF documents, adding blank pages, deleting pages, getting total pages, adding text content, and PDFBox coordinate system.
Directory
1. PDFBox component
1.1. What is PDFBox
1.2. Create PDF documents
1.3. Load PDF document
1.4. Add a blank page
1.5. Delete a page
1.6. Get the total number of pages in PDF
1.7. Add text content
(1) Write a single line of content
(2) Write multiple lines of content
1.8. Coordinate system in PDFBox
1. PDFBox component
1.1, what is PDFBox
PDFBox is a tool component specially provided by Apache for manipulating PDF documents. With PDFBox, it is very convenient to perform various operations on PDF documents, such as: creating PDF documents, reading PDF document content, loading PDF document content, and merging PDF documents , Splitting PDF documents, etc., using PDFBox requires the introduction of corresponding dependencies. This article is based on the following dependencies to introduce some common methods and usage methods in PDFBox.
PDFBox requires dependencies:
<dependencies> <!-- Import PDFBox-related dependencies start --> <dependency> <groupId>org.apache.pdfbox</groupId> <artifactId>pdfbox</artifactId> <version>2.0.29</version> </dependency> <dependency> <groupId>org.apache.pdfbox</groupId> <artifactId>fontbox</artifactId> <version>2.0.29</version> </dependency> <dependency> <groupId>org.apache.pdfbox</groupId> <artifactId>xmpbox</artifactId> <version>2.0.29</version> </dependency> <dependency> <groupId>org.apache.pdfbox</groupId> <artifactId>preflight</artifactId> <version>2.0.29</version> </dependency> <dependency> <groupId>org.apache.pdfbox</groupId> <artifactId>pdfbox-tools</artifactId> <version>2.0.29</version> </dependency> <dependency> <groupId>org.apache.pdfbox</groupId> <artifactId>jempbox</artifactId> <version>1.8.17</version> </dependency> <!-- Import PDFBox-related dependencies End --> </dependencies>
1.2, Create PDF document
pdf, the English full name is: Portable Document File, portable document file, also known as pdf document, a document contains multiple pages, and each page contains a lot of text, paragraphs, images and other content.
To create a PDF document, you only need to create a 【PDDocument】 object. This object represents the PDF document object. The PDF document created in this way does not contain any pages, so when you open this PDF, it will Prompt error.
package com.pdfbox.demo; import org.apache.pdfbox.pdmodel.PDDocument; import java.io.IOException; public class PDFBoxExample { public static void main(String[] args) throws IOException { // 1. Create a document object PDDocument doc = new PDDocument(); // 2. Generate a pdf file and save the pdf file // Here a demo.pdf blank pdf document will be generated under the D drive doc.save("D:\demo.pdf"); // 3. Close the document stream doc. close(); } }
1.3, load PDF document
Sometimes, we need to load an already created PDF document. At this time, we can use the [load()] method in the PDDocument class to load the specified pdf file. The code is as follows:
(1) Load local PDF document
// 1. Load the document object File file = new File("D:\demo.pdf"); PDDocument doc = PDDocument. load(file); // TODO do something // .... // close document doc. close();
(2)Load the network PDF document
PDFBox can also load PDF documents on the network through streaming, as follows:
package pdfbox.demo; import org.apache.pdfbox.pdmodel.PDDocument; import java.io.IOException; import java.net.URL; public class PDFDemo { public static void main(String[] args) throws IOException { // 1. Load the network PDF document PDDocument doc = PDDocument.load(new URL("https://ip:port/demo.pdf").openStream()); //..... // close document doc. close(); } }
1.4, add blank page
When we have created a PDF document object, we can continue to add a blank Page page to the document. The Page page is the visible content area. The page size can be set, for example: A4, A5, A6 and other sizes , in general, A4 size can meet the demand. To create a Page, you only need to create a [PDPage] object, and you can pass a [PDRectangle.A4] parameter in the construction method to set the page size.
package com.pdfbox.demo; import org.apache.pdfbox.pdmodel.PDDocument; import org.apache.pdfbox.pdmodel.PDPage; import org.apache.pdfbox.pdmodel.common.PDRectangle; import java.io.IOException; public class PDFBoxExample { public static void main(String[] args) throws IOException { // 1. Create a document object PDDocument doc = new PDDocument(); // 2. Add a blank page, the size is as big as A4 paper PDPage page = new PDPage(PDRectangle.A4); doc. addPage(page); // 3. Generate a pdf file and save the pdf file // Here a demo.pdf blank pdf document will be generated under the D drive doc.save("D:\demo.pdf"); // 4. Close the document stream doc. close(); } }
1.5, delete a page
The PDDocument document object provides a [removePage()] method, which can be used to delete the page with the specified subscript (starting from 0), and delete the specified Page object.
// 1. Create a document object PDDocument doc = new PDDocument(); \t\t // TODO delete the first page doc. removePage(0); // Or delete the specified Page object page // PDPage page = new PDPage(PDRectangle.A4); // doc. removePage(page);
1.6, Get the total number of PDF pages
The PDDocument document object provides a [getNumberOfPages()] method, which can get the total number of Pages in the current PDF and return the int type.
// 1. Create a document object PDDocument doc = new PDDocument(); \t\t // TODO Get the total number of pages in the document int pages = doc. getNumberOfPages(); System.out.println("Total page count: " + pages);
1.7, add text content
We have introduced creating PDF documents, adding PDF blank pages, obtaining PDF pages, etc., but we have not introduced how to write content to PDF pages. We can write text content, image content, form content, etc. to PDF documents. , here is how to write plain text content.
PDFBox abstracts the content of a Page into a form of content stream, so when we operate on the page content, we also need to complete it through the content stream, which is represented by the PDPageContentStream object.
(1) Write single-line content
Write a single line of content, that is, no matter how long the text content we write, this content will only be displayed on one line, and the content beyond the PDF page range will be blocked. The example code is as follows:
package pdfbox.demo; import org.apache.pdfbox.pdmodel.PDDocument; import org.apache.pdfbox.pdmodel.PDPage; import org.apache.pdfbox.pdmodel.PDPageContentStream; import org.apache.pdfbox.pdmodel.common.PDRectangle; import org.apache.pdfbox.pdmodel.font.PDType1Font; import java.io.IOException; public class PDFBoxDemo01 { public static void main(String[] args) throws IOException { // 1. Create a document object PDDocument doc = new PDDocument(); // 2. Add a blank page, the size is as big as A4 paper PDPage page = new PDPage(PDRectangle.A4); doc. addPage(page); // TODO add text content [single line], specify document object, page object PDPageContentStream stream = new PDPageContentStream(doc, page); stream.beginText(); // text begins stream.setFont(PDType1Font.TIMES_ROMAN, 14); // Set the font and font size of the text stream.newLineAtOffset(10, 200); // Set the starting coordinate position of text display String content = "hello world.hello world.hello world.hello world.hello world." + "hello world.hello world.hello world.hello world."; stream.showText(content); // Set the text content that needs to be added. Note: when writing Chinese content, you need to ensure that the font supports Chinese stream.endText(); // end of text stream.close(); // close the content stream // 3. Generate a pdf file and save the pdf file // Here a demo.pdf blank pdf document will be generated under the D drive doc.save("D:\demo.pdf"); // 4. Close the document stream doc. close(); } }
running result:
(2) Write multi-line content
To display multi-line text content in PDFBox, you need to use the [setLeading()] method and [newLine()] method. The [setLeading()] method is used to set the text line spacing, and the [newLine()] method is used to For newline display, It should be noted that although multiple lines of content are written here, if the content in the same line exceeds the width of the Page, it will be blocked and will not be automatically displayed .
package pdfbox.demo; import org.apache.pdfbox.pdmodel.PDDocument; import org.apache.pdfbox.pdmodel.PDPage; import org.apache.pdfbox.pdmodel.PDPageContentStream; import org.apache.pdfbox.pdmodel.common.PDRectangle; import org.apache.pdfbox.pdmodel.font.PDFont; import org.apache.pdfbox.pdmodel.font.PDType0Font; import org.apache.pdfbox.pdmodel.font.PDType1Font; import java.io.IOException; public class PDFBoxExample { public static void main(String[] args) throws IOException { // 1. Create a document object PDDocument doc = new PDDocument(); // 2. Add a blank page, the size is as big as A4 paper PDPage page = new PDPage(PDRectangle.A4); doc. addPage(page); // TODO add text content, specify document object, page object PDPageContentStream stream = new PDPageContentStream(doc, page); stream.beginText(); // text begins stream.setFont(PDType1Font.TIMES_ROMAN, 14); // Set the font and font size of the content stream text stream.newLineAtOffset(10, 350); // Set the starting coordinate position of content stream text display for (int i = 0; i < 10; i ++ ) { stream.setLeading(20 + i*2); // Set the leading of the text, that is, the line spacing of the text. If this line spacing is not set, the text will overlap String content = "hello world.hello world.hello world.hello world.hello world." + "hello world.hello world.hello world.hello world."; stream.showText(content); // Set the text content that needs to be added. Note: when writing Chinese content, you need to ensure that the font supports Chinese stream.newLine(); // add a new line display } stream.endText(); // end of text stream.close(); // close the content stream // 3. Generate a pdf file and save the pdf file // Here a demo.pdf blank pdf document will be generated under the D drive doc.save("D:\demo.pdf"); // 4. Close the document stream doc. close(); } }
running result:
1.8, coordinate system in PDFBox
Coordinate position in PDFBox: In PDFBox, the bottom left foot of the page is used as the coordinate dot, the horizontal direction is the x-axis, and the vertical direction is the y-axis, as shown in the figure below:
In addition, PDFBox generally uses [pt] as the unit. Sometimes we may encounter [px] pixel unit, so we need to convert pt and px units. The conversion relationship between pt and px units is: [1pt= 1px * 3 / 4].
In summary, this article is over. It mainly introduces PDFBox to operate PDF documents to create PDF documents, load PDF documents, add blank pages, delete pages, get total number of pages, add text content, and PDFBox coordinate system.