OpenCV implements OCR (optical character recognition)

Introduction:

With the continuous development of computer vision technology, OCR (optical character recognition) technology has become more and more mature. OCR technology can identify text information in images and convert it into editable text format, providing convenience for various application scenarios. This article will introduce how to use the OpenCV library to implement camera OCR.

Steps:

1. Install the OpenCV library
First, you need to install the OpenCV library. The OpenCV library can be installed in the Python environment through the pip command. Enter the following command on the command line to install:

pip install opencv-python

2. Capture camera data
Capturing the video stream from a camera is easy using the OpenCV library. In Python, you can use the following code to open the camera and read the video stream:

import cv2
  
cap = cv2.VideoCapture(0) # Use the default camera
while True:
    ret, frame = cap.read() # Read a frame of image
    if not ret:
        break
    cv2.imshow('frame', frame)
    if cv2.waitKey(1) == ord('q'): # Press the q key to exit
        break
cap.release()
cv2.destroyAllWindows()

3. Image preprocessing
Before OCR, the image needs to be preprocessed to improve the accuracy of OCR. Common preprocessing operations include grayscale, binarization, noise reduction, expansion/corrosion, etc. The following is a sample code showing how to perform grayscale and binarization operations:

gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY) # Grayscale
_, binary = cv2.threshold(gray, 150, 255, cv2.THRESH_BINARY_INV) # Binarization

4. Text positioning
Before OCR, the text area in the image needs to be located. Text positioning can be achieved using some algorithms of OpenCV. For example, text areas in images can be detected using the MSER algorithm. Here is a sample code showing how to use the MSER algorithm to locate text:

import cv2
importpytesseract
from PIL import Image
  
#Set the path to Tesseract
pytesseract.pytesseract.tesseract_cmd = r'C:\Program Files\Tesseract-OCR\tesseract.exe' # Modify according to your Tesseract installation path
  
# read image
img = cv2.imread('test.jpg')
  
# Convert to grayscale image
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
  
# Use MSER algorithm to detect text areas
mser = cv2.ximgproc.segmentation.createMSER().detectRegions(gray)
  
# Traverse all detected areas
for i in range(len(mser)):
    # Get the bounding box of the area
    x, y, w, h = mser[i].boundingRect()
    # Draw a bounding box on the original image
    cv2.rectangle(img, (x, y), (x + w, y + h), (0, 255, 0), 1)
  
# show image
cv2.imshow('img', img)
cv2.waitKey(0)
cv2.destroyAllWindows()

5.OCR recognition
Finally, use the OCR library to perform character recognition on the located text area. Recognition can be done using the Tesseract OCR engine. The following is a sample code showing how to use Tesseract for OCR recognition:

# Perform OCR recognition on the located text area
text = pytesseract.image_to_string(binary, lang='eng')
print(text)

Summary:

Using OpenCV to implement camera OCR requires image preprocessing, text positioning, and OCR recognition. Through reasonable preprocessing and parameter adjustment, the accuracy of OCR can be improved.

Complete code display

Below we use a custom function to complete this step:

# -*- coding: utf-8 -*-
# @Time : 2023/10/23 10:27
# @Author :Muzi
# @File: Camera OCR.py
# @Software: PyCharm
#Import toolkit
import numpy as np
import cv2


def cv_show(name, img):
    cv2.imshow(name, img)
    cv2.waitKey(120)


def order_points(pts):
    # A total of 4 coordinate points
    rect = np.zeros((4, 2), dtype="float32")

    # Find the corresponding coordinates 0123 in order: upper left, upper right, lower right, lower left
    # Calculate the upper left and lower right
    s = pts.sum(axis=1)
    rect[0] = pts[np.argmin(s)]
    rect[2] = pts[np.argmax(s)]

    # Calculate upper right and lower left
    diff = np.diff(pts, axis=1)
    rect[1] = pts[np.argmin(diff)]
    rect[3] = pts[np.argmax(diff)]

    return rect


def four_point_transform(image, pts):
    # Get the input coordinate point
    rect = order_points(pts)
    (tl, tr, br, bl) = rect

    # Calculate the input w and h values
    widthA = np.sqrt(((br[0] - bl[0]) ** 2) + ((br[1] - bl[1]) ** 2))
    widthB = np.sqrt(((tr[0] - tl[0]) ** 2) + ((tr[1] - tl[1]) ** 2))
    maxWidth = max(int(widthA), int(widthB))

    heightA = np.sqrt(((tr[0] - br[0]) ** 2) + ((tr[1] - br[1]) ** 2))
    heightB = np.sqrt(((tl[0] - bl[0]) ** 2) + ((tl[1] - bl[1]) ** 2))
    maxHeight = max(int(heightA), int(heightB))

    #Corresponding coordinate position after transformation
    dst = np.array([
        [0, 0],
        [maxWidth - 1, 0],
        [maxWidth - 1, maxHeight - 1],
        [0, maxHeight - 1]], dtype="float32")

    # Calculate transformation matrix
    M = cv2.getPerspectiveTransform(rect, dst)
    warped = cv2.warpPerspective(image, M, (maxWidth, maxHeight))

    #Return the transformed result
    return warped


def resize(image, width=None, height=None, inter=cv2.INTER_AREA):
    dim=None
    (h, w) = image.shape[:2]
    if width is None and height is None:
        return image
    if width is None:
        r = height / float(h)
        dim = (int(w * r), height)
    else:
        r = width / float(w)
        dim = (width, int(h * r))
    resized = cv2.resize(image, dim, interpolation=inter)
    return resized


# Read input

import cv2

cap = cv2.VideoCapture(0) # Make sure the camera can be started.
if not cap.isOpened(): # Failed to open
    print("Cannot open camera")
    exit()

while True:
    flag = 0 # Used to identify whether the document is currently detected
    ret, image = cap.read() # If the frame is read correctly, ret is True
    orig = image.copy()
    if not ret: # If reading fails, exit the loop
        print("Unable to read camera")
        break #
    cv_show("image", image)

    gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY) #Image processing-convert to grayscale image
    # Preprocessing
    gray = cv2.GaussianBlur(gray, (5, 5), 0) # Gaussian filter
    edged = cv2.Canny(gray, 75, 200)

    #Contour detection
    cnts = cv2.findContours(edged.copy(), cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)[0]

    cnts = sorted(cnts, key=cv2.contourArea, reverse=True)[:3]
    image_contours = cv2.drawContours(image, cnts, -1, (0, 255, 0), 2)
    cv_show("image_contours", image_contours)

    # Traverse contours
    for c in cnts:
        # Calculate contour approximation
        peri = cv2.arcLength(c, True)
        # C represents the input point set
        # epsilon represents the maximum distance from the original contour to the approximate contour, which is an accuracy parameter
        #True means closed
        approx = cv2.approxPolyDP(c, 0.05 * peri, True) # Contour approximation
        area = cv2.contourArea(approx)
        # Take it out at 4 o'clock
        if area > 20000 and len(approx) == 4:
            screenCnt = approx
            flag=1
            print(peri,area)
            print('Document detected')
            break

    if flag == 1:
        # Show results
        # print("STEP 2: Get outline")
        image_contours = cv2.drawContours(image, [screenCnt], 0, (0, 255, 0), 2)
        cv_show("image", image_contours)

        # Perspective transformation
        warped = four_point_transform(orig, screenCnt.reshape(4, 2))
        cv_show("warped", warped)

        # Binary processing
        warped = cv2.cvtColor(warped, cv2.COLOR_BGR2GRAY)
        ref = cv2.threshold(warped, 220, 255, cv2.THRESH_BINARY)[1]
        cv_show("ref", ref)
    key_pressed = cv2.waitKey(100)
    if key_pressed == 27:#If the esc key is pressed, exit the loop
        break

cap.release() # Release the capturer
cv2.destroyAllWindows() # Close the image window

Results display:

The knowledge points of the article match the official knowledge files, and you can further learn related knowledge. OpenCV skill tree Home page Overview 23582 people are learning the system