[PDFBox] PDFBox operates PDF documents to create PDF documents, load PDF documents, add blank pages, delete pages, get total number of pages, add text content, PDFBox coordinate system

This article mainly introduces PDFBox to operate PDF documents such as creating PDF documents, loading PDF documents, adding blank pages, deleting pages, getting total pages, adding text content, and PDFBox coordinate system.

Directory

1. PDFBox component

1.1. What is PDFBox

1.2. Create PDF documents

1.3. Load PDF document

1.4. Add a blank page

1.5. Delete a page

1.6. Get the total number of pages in PDF

1.7. Add text content

(1) Write a single line of content

(2) Write multiple lines of content

1.8. Coordinate system in PDFBox


1. PDFBox component

1.1, what is PDFBox

PDFBox is a tool component specially provided by Apache for manipulating PDF documents. With PDFBox, it is very convenient to perform various operations on PDF documents, such as: creating PDF documents, reading PDF document content, loading PDF document content, and merging PDF documents , Splitting PDF documents, etc., using PDFBox requires the introduction of corresponding dependencies. This article is based on the following dependencies to introduce some common methods and usage methods in PDFBox.

PDFBox requires dependencies:

<dependencies>
    <!-- Import PDFBox-related dependencies start -->
    <dependency>
        <groupId>org.apache.pdfbox</groupId>
        <artifactId>pdfbox</artifactId>
        <version>2.0.29</version>
    </dependency>
    <dependency>
        <groupId>org.apache.pdfbox</groupId>
        <artifactId>fontbox</artifactId>
        <version>2.0.29</version>
    </dependency>
    <dependency>
        <groupId>org.apache.pdfbox</groupId>
        <artifactId>xmpbox</artifactId>
        <version>2.0.29</version>
    </dependency>
    <dependency>
        <groupId>org.apache.pdfbox</groupId>
        <artifactId>preflight</artifactId>
        <version>2.0.29</version>
    </dependency>
    <dependency>
        <groupId>org.apache.pdfbox</groupId>
        <artifactId>pdfbox-tools</artifactId>
        <version>2.0.29</version>
    </dependency>
    <dependency>
        <groupId>org.apache.pdfbox</groupId>
        <artifactId>jempbox</artifactId>
        <version>1.8.17</version>
    </dependency>
    <!-- Import PDFBox-related dependencies End -->
</dependencies>

1.2, Create PDF document

pdf, the English full name is: Portable Document File, portable document file, also known as pdf document, a document contains multiple pages, and each page contains a lot of text, paragraphs, images and other content.

To create a PDF document, you only need to create a 【PDDocument】 object. This object represents the PDF document object. The PDF document created in this way does not contain any pages, so when you open this PDF, it will Prompt error.

package com.pdfbox.demo;

import org.apache.pdfbox.pdmodel.PDDocument;
import java.io.IOException;

public class PDFBoxExample {
    public static void main(String[] args) throws IOException {
        // 1. Create a document object
        PDDocument doc = new PDDocument();
        // 2. Generate a pdf file and save the pdf file
        // Here a demo.pdf blank pdf document will be generated under the D drive
        doc.save("D:\demo.pdf");
        // 3. Close the document stream
        doc. close();
    }
}

1.3, load PDF document

Sometimes, we need to load an already created PDF document. At this time, we can use the [load()] method in the PDDocument class to load the specified pdf file. The code is as follows:

(1) Load local PDF document

// 1. Load the document object
File file = new File("D:\demo.pdf");
PDDocument doc = PDDocument. load(file);
// TODO do something
// ....

// close document
doc. close();

(2)Load the network PDF document

PDFBox can also load PDF documents on the network through streaming, as follows:

package pdfbox.demo;

import org.apache.pdfbox.pdmodel.PDDocument;

import java.io.IOException;
import java.net.URL;

public class PDFDemo {
    public static void main(String[] args) throws IOException {
        // 1. Load the network PDF document
        PDDocument doc = PDDocument.load(new URL("https://ip:port/demo.pdf").openStream());
        //.....
        // close document
        doc. close();
    }
}

1.4, add blank page

When we have created a PDF document object, we can continue to add a blank Page page to the document. The Page page is the visible content area. The page size can be set, for example: A4, A5, A6 and other sizes , in general, A4 size can meet the demand. To create a Page, you only need to create a [PDPage] object, and you can pass a [PDRectangle.A4] parameter in the construction method to set the page size.

package com.pdfbox.demo;

import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.pdmodel.PDPage;
import org.apache.pdfbox.pdmodel.common.PDRectangle;

import java.io.IOException;

public class PDFBoxExample {
    public static void main(String[] args) throws IOException {
        // 1. Create a document object
        PDDocument doc = new PDDocument();
        // 2. Add a blank page, the size is as big as A4 paper
        PDPage page = new PDPage(PDRectangle.A4);
        doc. addPage(page);
        // 3. Generate a pdf file and save the pdf file
        // Here a demo.pdf blank pdf document will be generated under the D drive
        doc.save("D:\demo.pdf");
        // 4. Close the document stream
        doc. close();
    }
}

1.5, delete a page

The PDDocument document object provides a [removePage()] method, which can be used to delete the page with the specified subscript (starting from 0), and delete the specified Page object.

// 1. Create a document object
PDDocument doc = new PDDocument();
\t\t
// TODO delete the first page
doc. removePage(0);

// Or delete the specified Page object page
// PDPage page = new PDPage(PDRectangle.A4);
// doc. removePage(page);

1.6, Get the total number of PDF pages

The PDDocument document object provides a [getNumberOfPages()] method, which can get the total number of Pages in the current PDF and return the int type.

// 1. Create a document object
PDDocument doc = new PDDocument();
\t\t
// TODO Get the total number of pages in the document
int pages = doc. getNumberOfPages();
System.out.println("Total page count: " + pages);

1.7, add text content

We have introduced creating PDF documents, adding PDF blank pages, obtaining PDF pages, etc., but we have not introduced how to write content to PDF pages. We can write text content, image content, form content, etc. to PDF documents. , here is how to write plain text content.

PDFBox abstracts the content of a Page into a form of content stream, so when we operate on the page content, we also need to complete it through the content stream, which is represented by the PDPageContentStream object.

(1) Write single-line content

Write a single line of content, that is, no matter how long the text content we write, this content will only be displayed on one line, and the content beyond the PDF page range will be blocked. The example code is as follows:

package pdfbox.demo;

import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.pdmodel.PDPage;
import org.apache.pdfbox.pdmodel.PDPageContentStream;
import org.apache.pdfbox.pdmodel.common.PDRectangle;
import org.apache.pdfbox.pdmodel.font.PDType1Font;

import java.io.IOException;

public class PDFBoxDemo01 {
    public static void main(String[] args) throws IOException {
        // 1. Create a document object
        PDDocument doc = new PDDocument();
        // 2. Add a blank page, the size is as big as A4 paper
        PDPage page = new PDPage(PDRectangle.A4);
        doc. addPage(page);

        // TODO add text content [single line], specify document object, page object
        PDPageContentStream stream = new PDPageContentStream(doc, page);
        stream.beginText(); // text begins
        stream.setFont(PDType1Font.TIMES_ROMAN, 14); // Set the font and font size of the text
        stream.newLineAtOffset(10, 200); // Set the starting coordinate position of text display
        String content = "hello world.hello world.hello world.hello world.hello world." +
                "hello world.hello world.hello world.hello world.";
        stream.showText(content); // Set the text content that needs to be added. Note: when writing Chinese content, you need to ensure that the font supports Chinese
        stream.endText(); // end of text
        stream.close(); // close the content stream

        // 3. Generate a pdf file and save the pdf file
        // Here a demo.pdf blank pdf document will be generated under the D drive
        doc.save("D:\demo.pdf");
        // 4. Close the document stream
        doc. close();
    }
}

running result:

(2) Write multi-line content

To display multi-line text content in PDFBox, you need to use the [setLeading()] method and [newLine()] method. The [setLeading()] method is used to set the text line spacing, and the [newLine()] method is used to For newline display, It should be noted that although multiple lines of content are written here, if the content in the same line exceeds the width of the Page, it will be blocked and will not be automatically displayed .

package pdfbox.demo;

import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.pdmodel.PDPage;
import org.apache.pdfbox.pdmodel.PDPageContentStream;
import org.apache.pdfbox.pdmodel.common.PDRectangle;
import org.apache.pdfbox.pdmodel.font.PDFont;
import org.apache.pdfbox.pdmodel.font.PDType0Font;
import org.apache.pdfbox.pdmodel.font.PDType1Font;

import java.io.IOException;

public class PDFBoxExample {
    public static void main(String[] args) throws IOException {
        // 1. Create a document object
        PDDocument doc = new PDDocument();
        // 2. Add a blank page, the size is as big as A4 paper
        PDPage page = new PDPage(PDRectangle.A4);
        doc. addPage(page);

        // TODO add text content, specify document object, page object
        PDPageContentStream stream = new PDPageContentStream(doc, page);
        stream.beginText(); // text begins
        stream.setFont(PDType1Font.TIMES_ROMAN, 14); // Set the font and font size of the content stream text
        stream.newLineAtOffset(10, 350); // Set the starting coordinate position of content stream text display
        for (int i = 0; i < 10; i ++ ) {
            stream.setLeading(20 + i*2); // Set the leading of the text, that is, the line spacing of the text. If this line spacing is not set, the text will overlap
            String content = "hello world.hello world.hello world.hello world.hello world." +
                    "hello world.hello world.hello world.hello world.";
            stream.showText(content); // Set the text content that needs to be added. Note: when writing Chinese content, you need to ensure that the font supports Chinese
            stream.newLine(); // add a new line display
        }
        stream.endText(); // end of text
        stream.close(); // close the content stream

        // 3. Generate a pdf file and save the pdf file
        // Here a demo.pdf blank pdf document will be generated under the D drive
        doc.save("D:\demo.pdf");
        // 4. Close the document stream
        doc. close();
    }
}

running result:

1.8, coordinate system in PDFBox

Coordinate position in PDFBox: In PDFBox, the bottom left foot of the page is used as the coordinate dot, the horizontal direction is the x-axis, and the vertical direction is the y-axis, as shown in the figure below:

In addition, PDFBox generally uses [pt] as the unit. Sometimes we may encounter [px] pixel unit, so we need to convert pt and px units. The conversion relationship between pt and px units is: [1pt= 1px * 3 / 4].

In summary, this article is over. It mainly introduces PDFBox to operate PDF documents to create PDF documents, load PDF documents, add blank pages, delete pages, get total number of pages, add text content, and PDFBox coordinate system.