Apache PDFBox is a Java library for working with PDF documents. It provides many functions and methods to read, create, manipulate and extract the content of PDF documents.
Introduce maven dependencies
<!-- https://mvnrepository.com/artifact/org.apache.pdfbox/pdfbox --> <dependency> <groupId>org.apache.pdfbox</groupId> <artifactId>pdfbox</artifactId> <version>2.0.24</version> </dependency>
pdfbox generates pdf examples
try { // Create a blank PDF document PDDocument document = new PDDocument(); // create a page PDPage page = new PDPage(PDRectangle.A4); document. addPage(page); // create a content stream PDPageContentStream contentStream = new PDPageContentStream(document, page); // set font and font size contentStream.setFont(PDType1Font.HELVETICA_BOLD, 12); // draw text on the page contentStream.beginText(); contentStream. newLineAtOffset(100, 700); contentStream.showText("Hello, World!"); contentStream. endText(); // close the content stream contentStream. close(); // Save the PDF document document.save("output.pdf"); // close the PDF document document. close(); System.out.println("PDF generated successfully!"); } catch (IOException e) { e.printStackTrace(); }
Common method
PDDocument class
Refer to the description of the PDDocument class in the source code
This is the in-memory representation of the PDF document
This is the memory representation of a PDF document. In a java program, you can simply understand that it is a pdf document, and a series of subsequent operations on it are a series of operations on the pdf document.
Create a brand new pdf document: no pages in the document
PDDocument document=new PDDocument();
If you want to fill the original pdf template with dynamic data, you can use the PDDocument.load() method to load the already made pdf template,
PDDocument document = PDDocument.load(new ClassPathResource("/static/reportTemplate.pdf").getInputStream());
You can also load the pdf template as a file, but the file stream is more recommended
PDDocument document = PDDocument.load(new ClassPathResource("/static/reportTemplate.pdf").getFile());
If you want to encrypt the generated pdf, you can use the PDDocument load(InputStream input, String password) method, and set the decrypted password to 123456 as follows.
PDDocument document = PDDocument.load(new ClassPathResource("/static/reportTemplate.pdf").getInputStream(),"123456");
There are many overloaded methods in PDDocument.load(), so I won’t list them here. Those who are interested can view the source code of pdfbox,
ByteArrayOutputStream baos = new ByteArrayOutputStream();; document.save(baos); //Save the file to the file stream document.save("output.pdf"); //Save the file to the file
After saving as a file stream, sometimes we need to transfer the file to the front end for downloading.
// Convert PDF file to byte array byte[] pdfBytes = baos.toByteArray(); // Create an InputStreamResource object ByteArrayInputStream bis = new ByteArrayInputStream(pdfBytes); InputStreamResource resource = new InputStreamResource(bis); // Set HTTP response header information HttpHeaders headers = new HttpHeaders(); headers.add(HttpHeaders.CONTENT_DISPOSITION, "attachment; filename=output.pdf"); headers.add(HttpHeaders.CONTENT_TYPE, MediaType.APPLICATION_PDF_VALUE); // return response entity with PDF content return ResponseEntity. ok() .headers(headers) .body(resource);
After completing the document operation, be sure to execute the document.close() method to close the pdf document.
document. close();
PDPage class
PDPage belongs to the pages in the pdf document,
int pageNumber=document. getNumberOfPages();
Get the specified page,
PDPage page = document. getPage(0);
If you are operating on a pdf template, you can use the document.getPage(index) method to obtain the specified page of the pdf document and operate on it (index starts from 0). You can also create a brand new page through new PDPage();
PDPage newPage = new PDPage(PDRectangle.A4);
If we generate a page page through new PDPage(), we need to add the page page to the pdf document (document),
document. addPage(newPage);
However, this method will add the page to the end of the pdf document. Sometimes we need to add the page to the specified location. The following method can be used.
PDPage page=document.getPage(1); //Get the second page PDPage newPage = new PDPage(PDRectangle.A4); PDPageTree pages = document. getPages(); pages.insertAfter(newPage,page); //Insert after page 2 pages.insertBefore(newPage,page); //Insert before page 2
Get the total height and width of the page, which is useful in the subsequent text coordinate positioning. In the page, the origin coordinates are located in the lower left corner. If you want your element to have a left margin of 10 and a top margin of 10, then your coordinates will be (10, pageHeight-10)
float pageWidth = page.getMediaBox().getWidth();
float pageHeight = page.getMediaBox().getHeight();
PDPageContentStream
The PDPageContentStream class provides the function of writing the page content stream, which needs to bind the pdf document and the specified page page, which is equivalent to creating the content stream of the current page of the page.
PDPageContentStream contentStream = new PDPageContentStream(document, page);
If PDPageContentStream.AppendMode is not specified, it will be executed in rewrite mode by default, and subsequent addition of elements to the page page will overwrite the existing page content stream.
PDPageContentStream contentStream = new PDPageContentStream(document, page, PDPageContentStream.AppendMode.APPEND, true);
mode code |
model |
note |
PDPageContentStream.AppendMode.OVERWRITE |
rewrite mode |
Overwrite existing page content flow |
PDPageContentStream.AppendMode.APPEND |
append mode |
Appends the content stream after all existing page content streams |
PREPENDDPPageContentStream.AppendMode. |
ready mode |
Inserted before all other page content flow |
After the operation on the contentStream is completed, the content stream needs to be closed.
contentStream. close();
pdf write content
About fonts
In Apache PDFBox, font-related classes are mainly located under the org.apache.pdfbox.pdmodel.font package. Here are some commonly used font classes:
-
PDType1Font: This class represents a Type 1 font, which is an outline-based font format. Type 1 fonts are commonly used in PDF documents, such as Helvetica, Times Roman, and Courier.
Example:
PDType1Font font = PDType1Font.HELVETICA_BOLD;
public static final PDType1Font TIMES_ROMAN = new PDType1Font("Times-Roman"); public static final PDType1Font TIMES_BOLD = new PDType1Font("Times-Bold"); public static final PDType1Font TIMES_ITALIC = new PDType1Font("Times-Italic"); public static final PDType1Font TIMES_BOLD_ITALIC = new PDType1Font("Times-BoldItalic"); public static final PDType1Font HELVETICA = new PDType1Font("Helvetica"); public static final PDType1Font HELVETICA_BOLD = new PDType1Font("Helvetica-Bold"); public static final PDType1Font HELVETICA_OBLIQUE = new PDType1Font("Helvetica-Oblique"); public static final PDType1Font HELVETICA_BOLD_OBLIQUE = new PDType1Font("Helvetica-BoldOblique"); public static final PDType1Font COURIER = new PDType1Font("Courier"); public static final PDType1Font COURIER_BOLD = new PDType1Font("Courier-Bold"); public static final PDType1Font COURIER_BOLD_OBLIQUE = new PDType1Font("Courier-BoldOblique"); public static final PDType1Font SYMBOL = new PDType1Font("Symbol"); public static final PDType1Font ZAPF_DINGBATS = new PDType1Font("ZapfDingbats");
-
PDTrueTypeFont: This class represents a TrueType font, which is also an outline-based font format. TrueType fonts are also common in PDFs.
PDTrueTypeFont font = PDType1Font.TIMES_ROMAN;
-
PDType0Font: This class represents a Type 0 font, which is a composite font format that can contain multiple subfonts. Type 0 fonts are usually used to support multi-language and complex glyph requirements, and you can use it to load your own custom font files.
PDType0Font font = PDType0Font.load(document, new ClassPathResource("/static/wryhRegular.ttf").getInputStream());
Write a single line of text
contentStream.setFont(PDType1Font.COURIER_BOLD_OBLIQUE, 16); contentStream.beginText(); contentStream.newLineAtOffset(50, pageHeight-50); contentStream.showText("test text"); contentStream.endText();
Before writing text, you need to set the font and font size through the contentStream.setFont(PDFont font, float fontSize) method, start a new text paragraph through the beginText() method, and set the coordinate position of the text through the newLineAtOffset(x, y); method. Here, setting (50, pageHeight-50) means that the text position is located in the upper left corner, 50 units away from the top and left. Then display the text you need to display through showText(String text), and finally end the text paragraph with the endText() method.
Continuously write multiple lines of text
contentStream.setFont(PDType1Font.HELVETICA_BOLD, 12); // Set the text start coordinates float startX = 50; float startY = page.getMediaBox().getHeight() - 50; // set line spacing float leading = 15; // write multiple lines of text String[] lines = { "The first line of text", "The second line of text", "The third line of text" }; contentStream.beginText(); contentStream. newLineAtOffset(startX, startY); for (String line : lines) { contentStream. showText(line); contentStream. newLineAtOffset(0, -leading); } contentStream.endText();
The process of writing multi-line text is similar to that of single-line text. You need to set the font and font size first, and determine the coordinates of the written text. The difference is that we have executed showText() and newLineAtOffset() multiple times between the beginText() method and endText() method. Add multiple lines of text to a pdf document after many loops.
Insert image
PDImageXObject image = PDImageXObject.createFromFileByExtension(new File("path/to/image.jpg"), document); float imageWidth = image. getWidth(); float imageHeight = image. getHeight(); PDPageContentStream contentStream = new PDPageContentStream(document, page); contentStream.drawImage(image, x, y, imageWidth, imageHeight);
Here we use the PDImageXObject.createFromFileByExtension() method to load the image file and create a PDImageXObject object. Make sure the \ “path/to/image.jpg ” is replaced with the path of the actual picture file. Here I set the width and height of the picture to the width of the real picture. In the actual situation, you can also customize the height of the height. Finally Write the picture into the PDF document, x, y represents its XY coordinates, and the later ImageWidth and Imageheight represent the width and height of the picture, respectively.
Add a rectangle
//Set border color contentStream.setStrokingColor(new Color(213, 213, 213)); //Set border width to 1 contentStream.setLineWidth(1); // Add a rectangle to the page content flow contentStream.addRect(50, pageHeight-50, 100, 100); // Draw the border of the rectangle contentStream.stroke(); //Restore the original color, otherwise it will affect the text color contentStream.setStrokingColor(Color.BLACK);
Common methods for calculating text coordinates
/** * Get font height * */ float getFontHeight(PDType0Font customFont, float fontSize){ return customFont.getFontDescriptor().getFontBoundingBox().getHeight() / 1000 * fontSize; } /** * Calculate text width * */ float getTextWidth(String text, float fontSize){ return fontSize * text. length(); }