This article uses the system CentOS 7
1. Obtain the compressed package
Click to download pdf2htmlEX-0.14.6.tar.gz
2. Unzip
tar zxvf pdf2htmlEX-0.14.6.tar.gz
3. There will be problems with direct installation at this time. You need to compile and install libfontforge first
Install all required dependent programs sudo yum install automake unzip libtool autoconf pango-devel pangoxft-devel sudo yum install cmake gcc gnu-getopt java-1.8.0-openjdk libpng-devel fontforge-devel cairo-devel poppler-devel libspiro-devel freetype-devel poppler-data libjpeg-turbo-devel git make gcc-c++ wget https://github.com/coolwanglu/fontforge/archive/pdf2htmlEX.zip # Download the compressed package unzip pdf2htmlEX.zip # Unzip cd fontforge-pdf2htmlEX/ # Move to directory ./autogen.sh #Execute script
The above prompt appears to indicate success.
Continue to execute in sequence
./configure make
If the above error occurs, modify the ufo.c file in the decompressed file /fontforge-pdf2htmlEX/fontforge/ufo.c
Search the file for the SplinePointListInterpretGlif keyword
Change the parameters in the method to SplineFont *sf,char *filename,char *memory, int memlen, int em_size, int ascent,int stroked
As shown below
Continue execution make install
The above file appears indicating success.
4. Return to the pdf2htmlEX directory to continue the installation
Configure environment variables echo 'export PKG_CONFIG_PATH=/usr/local/lib/pkgconfig' >> /etc/profile Perform installation cmake. sudo make sudo make install
Execute both of the following export LD_LIBRARY_PATH=/usr/local/lib echo 'export LD_LIBRARY_PATH=/usr/local/lib' >> /etc/profile Finally check whether the installation is successful pdf2htmlEX -h
As shown above, it indicates success
5. Basic use of commands
The specific parameter descriptions are below pdf2htmlEX --zoom 1.5 --dest-dir [The directory where the converted files are saved, create it if it does not exist] [File to be converted] eg: pdf2htmlEX --zoom 1.5 --dest-dir /usr/local/pdf2html/temp/ /usr/local/pdf2html/temp/test conversion.pdf
Converting
Conversion successful
Common parameter description
-f,--first-page <int> The starting page to be converted (default: 1) -l,--last-page <int> The last page to be converted (default: 2147483647) --zoom <fp> zoom ratio --fit-width <fp> fit width <fp> pixels --fit-height <fp> fit height <fp> pixels --use-cropbox <int> Use cropbox (default: 1) --hdpi <fp> Image horizontal resolution (default: 144) --vdpi <fp> Image vertical resolution (default: 144) --embed <string> specifies which elements should be embedded in the output --embed-css <int> Embed CSS files into the output (default: 1) --embed-font <int> Embed font file into output (default: 1) --embed-image <int> Embed image file into output (default: 1) --embed-javascript <int> Embed javascript files into the output (default: 1) --embed-outline <int> Embed link in output (default: 1) --split-pages <int> Split pages into separate files (default: 0) --dest-dir <string> Specify the destination directory (default: ".") --css-filename <string> The file name of the generated css file (default: "") --page-filename <string> Split web page name (default:"") --outline-filename <string> Generated link file name (default:"") --process-nontext <int> Render image lines, except text (default: 1) --process-outline <int> display link in html (default: 1) --printing <int> support printing (default: 1) --fallback <int> Output in fallback mode (default: 0) --embed-external-font <int> Embed locally matched external font (default: 1) --font-format <string> Embedded font file suffix (ttf,otf,woff,svg) (default: "woff") --decompose-ligature <int> Decompose ligature-> fi (default:0) --auto-hint <int> Do not prompt when using fonts on fontforge's autohint (default: 0) --external-hint-tool <string> Font external hint tool (overrides --auto-hint) (default: "") --stretch-narrow-glyph <int> Stretch narrow glyphs instead of padding (default: 0) --squeeze-wide-glyph <int> Squeeze wider glyphs instead of truncation (default: 1) --override-fstype <int> clear the fstype bits in TTF/OTF fonts (default:0) --process-type3 <int> convert Type 3 fonts for web (experimental) (default: 0) --heps <fp> Horizontal threshold for merging text, unit: pixel (default: 1) --veps <fp> vertical threshold for merging text, in pixels (default: 1) --space-threshold <fp> Hyphenation threshold (threshold * em) (default:0.125) --font-size-multiplier <fp> A value greater than 1 increases rendering accuracy (default: 4) --space-as-offset <int> Use space characters as offset (default: 0) --tounicode <int> How to handle ToUnicode's CMap (0=auto, 1=force,-1=ignore) (default: 0) --optimize-text <int> Minimize the number of HTML elements used for text (default: 0) --bg-format <string> Specify background image format (default: "png") -o,--owner-password <string> Owner password (for encrypting files) -u,--user-password <string> User password (for encrypting files) --no-drm <int> Override the document's DRM settings (default: 0) --clean-tmp <int> Delete temporary files after conversion (default: 1) --data-dir <string> Specified data directory (default: ".\share\pdf2htmlEX") --debug <int> Print debugging information (default: 0) -v,--version print copyright and version information -h,--help print usage help information
6. Java calling method
public static String pdfTohtml(File pdfTempFile){ String absolutePath = pdfTempFile.getAbsolutePath(); //Instructions to be executed String instruct = String.format("pdf2htmlEX --zoom 1.5 --dest-dir %s %s" ,absolutePath.replace(absolutePath.substring(absolutePath.lastIndexOf("/")),"") ,absolutePath); //Create a ProcessBuilder object and set the Shell script file as a parameter ProcessBuilder processBuilder = new ProcessBuilder("/bin/bash","-c", instruct); int exitCode = 0; try { //Start the Shell process to execute the script Process process = processBuilder.start(); // Wait for command execution to complete exitCode = process.waitFor(); } catch (Exception e) { e.printStackTrace(); } if (exitCode==0){ log.info("Pdf: {} converted successfully" ,absolutePath); }else { throw new BusinessException("Failed to convert pdf to html"); } //Return the file path after successful conversion return absolutePath.replace(FileUtil.extName(absolutePath), OfficeConstants.OFFICE_HTML); }