Java implements Pdf to Html-pdf2htmlEX tool

This article uses the system CentOS 7

1. Obtain the compressed package

Click to download pdf2htmlEX-0.14.6.tar.gz

2. Unzip

tar zxvf pdf2htmlEX-0.14.6.tar.gz

3. There will be problems with direct installation at this time. You need to compile and install libfontforge first

Install all required dependent programs
  sudo yum install automake unzip libtool autoconf pango-devel pangoxft-devel
  sudo yum install cmake gcc gnu-getopt java-1.8.0-openjdk libpng-devel fontforge-devel cairo-devel poppler-devel libspiro-devel freetype-devel poppler-data libjpeg-turbo-devel git make gcc-c++
wget https://github.com/coolwanglu/fontforge/archive/pdf2htmlEX.zip # Download the compressed package
unzip pdf2htmlEX.zip # Unzip
cd fontforge-pdf2htmlEX/ # Move to directory
./autogen.sh #Execute script


The above prompt appears to indicate success.
Continue to execute in sequence

./configure
 make


If the above error occurs, modify the ufo.c file in the decompressed file /fontforge-pdf2htmlEX/fontforge/ufo.c
Search the file for the SplinePointListInterpretGlif keyword
Change the parameters in the method to SplineFont *sf,char *filename,char *memory, int memlen, int em_size, int ascent,int stroked
As shown below

 Continue execution
 make install


The above file appears indicating success.

4. Return to the pdf2htmlEX directory to continue the installation

Configure environment variables
echo 'export PKG_CONFIG_PATH=/usr/local/lib/pkgconfig' >> /etc/profile
Perform installation
cmake.
sudo make
sudo make install

Execute both of the following
export LD_LIBRARY_PATH=/usr/local/lib
echo 'export LD_LIBRARY_PATH=/usr/local/lib' >> /etc/profile
Finally check whether the installation is successful
pdf2htmlEX -h


As shown above, it indicates success

5. Basic use of commands

The specific parameter descriptions are below
pdf2htmlEX --zoom 1.5 --dest-dir [The directory where the converted files are saved, create it if it does not exist] [File to be converted]
eg: pdf2htmlEX --zoom 1.5 --dest-dir /usr/local/pdf2html/temp/ /usr/local/pdf2html/temp/test conversion.pdf


Converting

Conversion successful

Common parameter description

 -f,--first-page <int> The starting page to be converted (default: 1)
  -l,--last-page <int> The last page to be converted (default: 2147483647)
  --zoom <fp> zoom ratio
  --fit-width <fp> fit width <fp> pixels
  --fit-height <fp> fit height <fp> pixels
  --use-cropbox <int> Use cropbox (default: 1)
  --hdpi <fp> Image horizontal resolution (default: 144)
  --vdpi <fp> Image vertical resolution (default: 144)
  --embed <string> specifies which elements should be embedded in the output
  --embed-css <int> Embed CSS files into the output (default: 1)
  --embed-font <int> Embed font file into output (default: 1)
  --embed-image <int> Embed image file into output (default: 1)
  --embed-javascript <int> Embed javascript files into the output (default: 1)
  --embed-outline <int> Embed link in output (default: 1)
  --split-pages <int> Split pages into separate files (default: 0)
  --dest-dir <string> Specify the destination directory (default: ".")
  --css-filename <string> The file name of the generated css file (default: "")
  --page-filename <string> Split web page name (default:"")
  --outline-filename <string> Generated link file name (default:"")
  --process-nontext <int> Render image lines, except text (default: 1)
  --process-outline <int> display link in html (default: 1)
  --printing <int> support printing (default: 1)
  --fallback <int> Output in fallback mode (default: 0)
  --embed-external-font <int> Embed locally matched external font (default: 1)
  --font-format <string> Embedded font file suffix (ttf,otf,woff,svg) (default: "woff")
  --decompose-ligature <int> Decompose ligature-> fi (default:0)
  --auto-hint <int> Do not prompt when using fonts on fontforge's autohint (default: 0)
  --external-hint-tool <string> Font external hint tool (overrides --auto-hint) (default: "")
  --stretch-narrow-glyph <int> Stretch narrow glyphs instead of padding (default: 0)
  --squeeze-wide-glyph <int> Squeeze wider glyphs instead of truncation (default: 1)
  --override-fstype <int> clear the fstype bits in TTF/OTF fonts (default:0)
  --process-type3 <int> convert Type 3 fonts for web (experimental) (default: 0)
  --heps <fp> Horizontal threshold for merging text, unit: pixel (default: 1)
  --veps <fp> vertical threshold for merging text, in pixels (default: 1)
  --space-threshold <fp> Hyphenation threshold (threshold * em) (default:0.125)
  --font-size-multiplier <fp> A value greater than 1 increases rendering accuracy (default: 4)
  --space-as-offset <int> Use space characters as offset (default: 0)
  --tounicode <int> How to handle ToUnicode's CMap (0=auto, 1=force,-1=ignore) (default: 0)
  --optimize-text <int> Minimize the number of HTML elements used for text (default: 0)
  --bg-format <string> Specify background image format (default: "png")
  -o,--owner-password <string> Owner password (for encrypting files)
  -u,--user-password <string> User password (for encrypting files)
  --no-drm <int> Override the document's DRM settings (default: 0)
  --clean-tmp <int> Delete temporary files after conversion (default: 1)
  --data-dir <string> Specified data directory (default: ".\share\pdf2htmlEX")
  --debug <int> Print debugging information (default: 0)
  -v,--version print copyright and version information
  -h,--help print usage help information

6. Java calling method

 public static String pdfTohtml(File pdfTempFile){
        String absolutePath = pdfTempFile.getAbsolutePath();
        //Instructions to be executed
        String instruct = String.format("pdf2htmlEX --zoom 1.5 --dest-dir %s %s"
                ,absolutePath.replace(absolutePath.substring(absolutePath.lastIndexOf("/")),"")
                ,absolutePath);
        //Create a ProcessBuilder object and set the Shell script file as a parameter
        ProcessBuilder processBuilder = new ProcessBuilder("/bin/bash","-c", instruct);
        int exitCode = 0;
        try {
            //Start the Shell process to execute the script
            Process process = processBuilder.start();
            // Wait for command execution to complete
            exitCode = process.waitFor();
        } catch (Exception e) {
            e.printStackTrace();
        }
        if (exitCode==0){
            log.info("Pdf: {} converted successfully" ,absolutePath);
        }else {
            throw new BusinessException("Failed to convert pdf to html");
        }
        //Return the file path after successful conversion
        return absolutePath.replace(FileUtil.extName(absolutePath), OfficeConstants.OFFICE_HTML);
    }