Research and discussion on the solution of converting LaTeX mathematical formulas to images in Python environment

Introduction

Recently, some formula recognition projects have been involved. The input is an image of the formula, and the output is a mathematical formula string in LaTeX format.

Such projects generally use deep learning methods, which involves constructing formula LaTeX strings and data sets corresponding to rendered images. to train the model.

After research, there are generally two sources of this data, one is manual annotation; the other is synthesis. Given the sheer amount of data required to train the model, synthesis of this data is a priority. When synthesizing this kind of data set, you need to render the LaTeX string of the formula into an image of the formula, as shown in the following figure: To this end, I did some research to find a solution that can achieve the above effect.

Option 1: Based on LaTeX environment

This solution requires the installation of a LaTeX environment, and the installation package under MacOS is about 5.2G.

The advantage is that it supports the rendering of all LaTeX documents, but the disadvantage is that the environment takes up too much space.

If the usage scenario involves complex and diverse formulas, it is necessary to install this environment and then use python to call rendering.

You can search the Internet for specific operating documents, so I won’t go into details here.

Option 2: Based on KaTeX

KaTeX is a fast, easy-to-use JavaScript library for TeX math rendering on the web. Supports most LaTeX syntax.

The scheme of synthesizing the data set for training based on the KaTeX scheme is just my idea. You can separately start a KaTeX service that supports formula rendering, and then python calls this service, inputs the formula LaTeX string, and returns the rendered mathematical formula image.

It is worth mentioning that I have not really tried this solution, but it is feasible. At the same time, I did not find a project with this solution on Github.

(Recommended) Option 3: Based on Matplotlib

I prefer the solution based on Matlplotib. There is no need to install an additional LaTeX environment, because Matplotlib implements a lightweight TeX expression parser and layout engine, and Mathtext is a subset of Tex tags supported by the engine. For a detailed introduction to this part, please refer to the official documentation: Writing mathematical expressions

Usage example:

import matplotlib.pyplot as plt

fig = plt.figure(figsize=(3, 3), linewidth=1, edgecolor='black')
fig.text(.2, .7, "plain text: alpha > beta")
fig.text(.2, .5, "Mathtext: $\alpha > \beta$")
fig.text(.2, .3, r"raw string Mathtext: $\alpha > \beta$")

The rendering result is as follows:

You don’t need to install TeX to use Mathtext because Matplotlib comes with a Mathtext parser and engine. The Mathtext layout engine is a fairly straightforward adaptation of the layout algorithm in Donald Knuth’s TeX.

Imagine: Based on the function of matplotlib, you can write a small tool that automatically synthesizes the data set mentioned in the beginning. Input the LaTeX string of the formula and output the rendered image of the mathematical formula. To this end, I wrote a demo code. The general idea is:

flowchart LR
A (Formula LaTeX string) --> B (Matplotilb rendering image) --> C (Crop the excess part) --> D (Formula only image)

The overall flow chart is as follows: The relevant code is as follows: With the help of matplotlib rendering formula part:

from matplotlib import pyplot as plt

fig = plt.figure(linewidth=1, facecolor="white", layout="tight")
fig.text(0.2, 0.5, r"$c = a^2 + b^2$")
fig.savefig("equation.png")

Code to crop excess parts of an image:

import cv2
import numpy as np

class CropByProject:
    """Projection cropping"""

    def __init__(self, threshold: int = 250):
        self.threshold = threshold

    def __call__(self, origin_img):
        image = cv2.cvtColor(origin_img, cv2.COLOR_BGR2GRAY)

        # Invert the color, set the value greater than threshold to 0, and change the value less than threshold to 255
        retval, img = cv2.threshold(image, self.threshold, 255, cv2.THRESH_BINARY_INV)

        # Make the text grow into blocks
        closed = cv2.dilate(img, None, iterations=1)

        # Horizontal projection
        x0, x1 = self.get_project_loc(closed, direction="width")

        # vertical projection
        y0, y1 = self.get_project_loc(closed, direction="height")

        return origin_img[y0:y1, x0:x1]

    @staticmethod
    def get_project_loc(img, direction):
        """Get the starting and ending index positions of cropping
        Args:
            img (ndarray): image obtained after binarization
            direction (str): 'width/height'
        Raises:
            ValueError: Unsupported summation direction
        Returns:
            tuple: starting index position
        """
        if direction == "width":
            axis = 0
        elif direction == "height":
            axis=1
        else:
            raise ValueError(f"direction {direction} is not supported!")

        loc_sum = np.sum(img == 255, axis=axis)
        loc_range = np.argwhere(loc_sum > 0)
        i0, i1 = loc_range[0][0], loc_range[-1][0]
        return i0, i1

if __name__ == "__main__":
    cropper = CropByProject()

    img_path = "equation.png"

    img = cv2.imread(img_path)

    result = cropper(img)

    cv2.imwrite("res.png", result)

Write at the end

At present, there are many public formula recognition data sets, including some formula recognition competitions and open source projects. I will not list them one by one here. You can find them by yourself.

———————————END——————- ——–

Digression

Interested friends will receive a complete set of Python learning materials, including interview questions, resume information, etc. See below for details.

CSDN gift package:The most complete “Python learning materials” on the entire network are given away for free! (Safe link, click with confidence)

1. Python learning routes in all directions

The technical points in all directions of Python have been compiled to form a summary of knowledge points in various fields. Its usefulness is that you can find corresponding learning resources according to the following knowledge points to ensure that you learn more comprehensively.

2. Essential development tools for Python

The tools have been organized for you, and you can get started directly after installation!

3. Latest Python study notes

When I learn a certain basic and have my own understanding ability, I will read some books or handwritten notes compiled by my seniors. These notes record their understanding of some technical points in detail. These understandings are relatively unique and can be learned. to a different way of thinking.

4. Python video collection

Watch a comprehensive zero-based learning video. Watching videos is the fastest and most effective way to learn. It is easy to get started by following the teacher’s ideas in the video, from basic to in-depth.

5. Practical cases

What you learn on paper is ultimately shallow. You must learn to type along with the video and practice it in order to apply what you have learned into practice. At this time, you can learn from some practical cases.

6. Interview Guide

CSDN gift package:The most complete “Python learning materials” on the entire network are given away for free! (Safe link, click with confidence)

If there is any infringement, please contact us for deletion.