Front-end (25) – Detailed steps and sample code for front-end implementation of OCR image and text recognition

Blogger: The kitten is here
The core of the article: Detailed steps and sample code for front-end implementation of OCR image and text recognition

Article directory

  • Introduction to OCR technology
  • Step 1: Determine which OCR API to use
  • Step 2: Create the front-end interface
  • Step 3: Add image upload function
  • Step 4: Send identification request and process identification results
  • Step 5: Improve the code and add comments
  • in conclusion
  • appendix

Introduction to OCR technology

What is OCR?

It is a technology that converts printed and handwritten text into editable and searchable electronic text. OCR enables automated text recognition and extraction by recognizing and transcribing text in images or scanned documents into machine-readable text format. OCR technology can be applied to various scenarios, including scanning and transcribing text documents, digital libraries, automated data entry, machine translation, automated form filling, etc. OCR technology can greatly improve the efficiency of processing documents and data, making text information easier to access and utilize.

In modern applications, OCR (Optical Character Recognition) technology is widely used to convert images into editable text data. This article will introduce in detail how to implement OCR image and text recognition function on the front end, and provide corresponding code examples. Whether you are a new front-end developer or an experienced developer, this article will help you implement this function in simple and clear steps.

Step 1: Determine the OCR API to use

This step requires preparation based on the OCR API you choose. Each OCR API has corresponding documentation and sample code, and you need to register an account and obtain an API key.

  • OCR API Overview and Selection:
    The OCR API is a service that provides image text recognition capabilities. It is capable of converting images containing text into editable text data. Before choosing an OCR API, you can learn about each OCR API’s capabilities, supported image types, recognition accuracy, speed, documentation, and use cases.

  • Comparison and recommendation of commonly used OCR APIs:
    Some commonly used OCR APIs include Google Cloud Vision API, Microsoft Azure OCR API, Tencent OCR API, etc. When choosing an OCR API, you can consider its reliability, ease of use, performance, pricing, and applicable scenarios.

  • Steps to register and obtain an API key:
    In order to use the OCR API, you need to register an account and obtain an API key. Usually, on the official website of the OCR API provider, you can find the registration page and complete the registration. Once successfully registered, you will receive an API key that is used to authenticate and send recognition requests to the OCR API.

Step 2: Create the front-end interface

Create a basic HTML file, add an upload image button and an area to display the recognition results.

  • Basic HTML structure:
    In the HTML file, create an initial structure including a title, buttons, and results display area. You can use basic HTML elements such as

    , ,

  • Create an image upload button and an area to display the results:
    Add a button for image upload and an area to display the recognition results in the HTML file. You can use the element to implement image selection and upload functions, by giving it a unique id to correspond to the operation in JavaScript.
    This answer is generated by gpt.tool00.com, please look for this site.
    Here’s a detailed explanation and sample code:

<!DOCTYPE html>
<html>
<head>
    <title>OCR image and text recognition</title>
</head>
<body>
    <h1>OCR image and text recognition</h1>
    <input type="file" id="imageFile" accept="image/*" />
    <br />
    <button onclick="uploadImage()">Upload image</button>
    <br />
    <h2>Recognition results:</h2>
    <div id="result"></div>

    <script>
        //...Following code omitted...

        function uploadImage() {<!-- -->
            const fileInput = document.getElementById('imageFile');
            const selectedFile = fileInput.files[0];

            //...Following code omitted...
        }
    </script>
</body>
</html>

In this example, we created a simple HTML page, including a

element titled “OCR Image and Text Recognition” and an element, an upload button and a

element for displaying the recognition results. You can customize the look and layout of the page to suit your needs.

Note that in order for JavaScript code to access and manipulate HTML elements, we assign unique ids to input and button elements for selection and manipulation in JavaScript. Later in the JavaScript code, you can retrieve elements based on these unique ids and add event handlers to respond to user actions.

Step 3: Add image upload function

In this step, we will add an event listener for the image upload button in order to get the image file uploaded by the user and send it to the OCR API for recognition.

  • Get file input and listen for changes:
    Use the document.getElementById() method to obtain the element representing the image file input. By adding the onchange event listener to it, when the user selects the image file, the corresponding operation can be triggered. .

  • Use FileReader to read image file contents:
    In the event handler, instantiate the FileReader object and use the readAsDataURL() method to read the contents of the image file. This will convert the image file into a data URL for subsequent upload.

  • Upload image files to OCR API:
    After preparing the data URL for the image file, you can create a FormData object and add the image file to the FormData using the append() method. Then, you need to use this FormData as the body of the request and use Fetch API or AJAX to send a POST request to the recognition endpoint of the OCR API to upload images and related requests.

//Get file input and listen for changes
const fileInput = document.getElementById('imageFile');
fileInput.addEventListener('change', handleFileUpload);

function handleFileUpload(event) {<!-- -->
    const file = event.target.files[0];

    // Use FileReader to read the image file content
    const reader = new FileReader();
    reader.onload = function (e) {<!-- -->
        const imageDataURL = e.target.result;

        // Upload image file to OCR API
        uploadImageToOCR(imageDataURL);
    };
    reader.readAsDataURL(file);
}

function uploadImageToOCR(imageDataURL) {<!-- -->
    const apiUrl = 'OCR_API_URL';
    const apiKey = 'API_KEY';

    const formData = new FormData();
    formData.append('image', imageDataURL);

    fetch(apiUrl, {<!-- -->
        method: 'POST',
        headers: {<!-- -->
            'Authorization': 'Bearer ' + apiKey
        },
        body: formData
    })
    .then(response => response.json())
    .then(data => {<!-- -->
        // Process the response results of the OCR API
        handleOCRResponse(data);
    })
    .catch(error => {<!-- -->
        console.error('Identification request error:', error);
    });
}

function handleOCRResponse(data) {<!-- -->
    const resultDiv = document.getElementById('result');

    if (data & amp; & amp; data.text) {<!-- -->
        resultDiv.textContent = data.text;
    } else {<!-- -->
        resultDiv.textContent = 'The text could not be recognized. ';
    }
}

First of all, we use the document.getElementById() method to get the element representing the image file input, and then add a change event listener to it. When the user selects the image file, Can trigger the handleFileUpload function.

In the handleFileUpload function, convert the image file into a data URL through the FileReader object. Then call the uploadImageToOCR function to upload the image file to the recognition endpoint of the OCR API.

In the uploadImageToOCR function, create a FormData object and add the image file data URL to it. Use the Fetch API to send a POST request to the recognition endpoint of the OCR API, and after the response is returned, call the handleOCRResponse function to process the recognition result.

Finally, in the handleOCRResponse function, the recognized text content is displayed in the result area of the page based on the response result of the OCR API.

Please replace OCR_API_URL and API_KEY in the sample code with the correct identification endpoint URL and API key of the OCR API according to your actual situation.

Step 4: Send identification request and process identification results

In this step, we will send a recognition request to the recognition endpoint of the OCR API and update the front-end interface based on the returned recognition results.

  • Send a POST request using the Fetch API or AJAX:
    Send a POST request to the recognition endpoint of the OCR API using the Fetch API or AJAX. In the header of the request, you need to set the authentication, such as adding a Bearer token or setting an API key in the request header, so that the identification endpoint can identify your request.

  • Include the API key in the request header for authentication:
    Depending on the requirements of the OCR API you choose, include an API key or other authentication information in the header of the request to identify the request.

  • Parse the response from the OCR API and update the front-end interface:
    After obtaining the response from the OCR API, parse the recognition results and update the recognition results to the corresponding area in the front-end interface. Depending on the response structure of the OCR API, it may be necessary to parse the returned JSON data, extract the recognized text content and display it on the interface. If no results are recognized or an error occurs in the response, you can provide an appropriate error message or display a default text.

Next, we demonstrate how to use the Fetch API to send OCR recognition requests and update the front-end interface:

//Send OCR recognition request
fetch('https://ocr-api.com/recognize', {<!-- -->
  method: 'POST',
  headers: {<!-- -->
    'Authorization': 'Bearer your_token_here',
    'Content-Type': 'application/json'
  },
  body: JSON.stringify({<!-- -->
    image: 'your_image_data_here'
  })
})
  .then(response => response.json()) // Parse the response into JSON format
  .then(data => {<!-- -->
    // Process the recognition results and update the front-end interface
    const recognizedText = data.recognizedText; // Assume that the recognition result is the recognizedText field

    //Update the recognition area of the front-end interface
    const recognitionArea = document.getElementById('recognitionArea');
    recognitionArea.textContent = recognizedText;
  })
  .catch(error => {<!-- -->
    // Handle error conditions and provide appropriate feedback
    console.error('OCR recognition request error:', error);
    // Display error messages or set default text on the interface
    const recognitionArea = document.getElementById('recognitionArea');
    recognitionArea.textContent = 'Recognition failed, please try again. ';
  });

Please modify the URL, request header, authentication method and element ID of the update interface in the code according to the actual situation to correspond to the OCR API you are using.

Step 5: Improve the code and add comments

In the process of implementing the OCR image and text recognition function on the front end, there are still some details that need to be paid attention to and optimized. Appropriate comments can also be added to facilitate the reading and understanding of the code.

  • Add error handling and prompt information:
    In the code, you can add appropriate error handling to handle possible error situations, such as upload failure, empty recognition results, etc. You can display error messages on the interface or use the console to print error messages to help debugging and troubleshooting.

  • Comment the code to understand the implementation details of each step:
    Add comments to the code explaining the implementation details and purpose of each key step. This will help you and others understand the functionality and logic of the code.

  • Code formatting and naming conventions:
    Formatting and naming conventions help code readability and maintainability. Make sure your code is indented correctly, use consistent naming conventions, and follow best practices and language conventions.

  • Code optimization and scalability:
    Consider the performance and scalability of your code. You can optimize your code to improve execution efficiency, and adopt a modular design and organize your code for easy expansion and maintenance.

After completing these details and optimizations, your front-end OCR image and text recognition function will be more complete and reliable.

Conclusion

Through the steps and sample code in this article, you can easily implement OCR image and text recognition function on the front end. Such functions are very useful in many scenarios, such as scanning documents, image search, automated data entry, etc. I hope this article can help you and inspire your creativity to further develop and optimize this basic function.

I personally use Tencent Cloud’s OCR:

Appendix

  • Sample code
<!DOCTYPE html>
<html>
<head>
    <title>OCR image and text recognition</title>
</head>
<body>
    <h1>OCR image and text recognition</h1>
    <input type="file" id="imageFile" accept="image/*" />
    <br />
    <button onclick="uploadImage()">Upload image</button>
    <br />
    <h2>Recognition results:</h2>
    <div id="result"></div>

    <script>
        function uploadImage() {<!-- -->
            const fileInput = document.getElementById('imageFile');
            const selectedFile = fileInput.files[0];

            const reader = new FileReader();
            reader.onload = function(event) {<!-- -->
                const fileData = event.target.result;

                // TODO: Send the image file to the recognition endpoint of the OCR API
                const apiUrl = 'OCR_API_URL';
                const apiKey = 'API_KEY';

                const formData = new FormData();
                formData.append('image', fileData);

                // Send POST request using Fetch API or AJAX
                fetch(apiUrl, {<!-- -->
                    method: 'POST',
                    headers: {<!-- -->
                        'Authorization': 'Bearer ' + apiKey
                    },
                    body: formData
                })
                .then(response => response.json())
                .then(data => {<!-- -->
                    const resultDiv = document.getElementById('result');
                    if (data & amp; & amp; data.text) {<!-- -->
                        resultDiv.textContent = data.text;
                    } else {<!-- -->
                        resultDiv.textContent = 'The text could not be recognized. ';
                    }
                })
                .catch(error => {<!-- -->
                    console.error('Identification request error:', error);
                });
            };
            reader.readAsDataURL(selectedFile);
        }
    </script>
</body>
</html>

In this example, note that you replace OCR_API_URL and API_KEY with the identifying endpoint URL and API key of the actual OCR API being used.

This sample code adds an onchange event listener for image file input via JavaScript, uses a FileReader object in the event handler to read the image file contents, and then converts the image file data Sent to the recognition endpoint of the OCR API.

Once the response from the OCR API is obtained, the code parses the returned JSON data and updates the recognition results to the results area on the page.

  • Referenced OCR API official documentation link

Since there are multiple OCR APIs to choose from, each API provider has its own official documentation. The following are official documentation links for several common OCR API providers. You can further explore and obtain relevant information based on the OCR API you choose:

  1. Google Cloud API
  2. Microsoft OCR Service
  3. Tencent OCR API

These links provide access to the official documentation of the OCR API. You can find more detailed information in the relevant documents, including API usage, supported functions, request and response formats, etc.