Introduction:
With the continuous development of computer vision technology, OCR (optical character recognition) technology has become more and more mature. OCR technology can identify text information in images and convert it into editable text format, providing convenience for various application scenarios. This article will introduce how to use the OpenCV library to implement camera OCR.
Steps:
1. Install the OpenCV library
First, you need to install the OpenCV library. The OpenCV library can be installed in the Python environment through the pip command. Enter the following command on the command line to install:
pip install opencv-python
2. Capture camera data
Capturing the video stream from a camera is easy using the OpenCV library. In Python, you can use the following code to open the camera and read the video stream:
import cv2 cap = cv2.VideoCapture(0) # Use the default camera while True: ret, frame = cap.read() # Read a frame of image if not ret: break cv2.imshow('frame', frame) if cv2.waitKey(1) == ord('q'): # Press the q key to exit break cap.release() cv2.destroyAllWindows()
3. Image preprocessing
Before OCR, the image needs to be preprocessed to improve the accuracy of OCR. Common preprocessing operations include grayscale, binarization, noise reduction, expansion/corrosion, etc. The following is a sample code showing how to perform grayscale and binarization operations:
gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY) # Grayscale _, binary = cv2.threshold(gray, 150, 255, cv2.THRESH_BINARY_INV) # Binarization
4. Text positioning
Before OCR, the text area in the image needs to be located. Text positioning can be achieved using some algorithms of OpenCV. For example, text areas in images can be detected using the MSER algorithm. Here is a sample code showing how to use the MSER algorithm to locate text:
import cv2 importpytesseract from PIL import Image #Set the path to Tesseract pytesseract.pytesseract.tesseract_cmd = r'C:\Program Files\Tesseract-OCR\tesseract.exe' # Modify according to your Tesseract installation path # read image img = cv2.imread('test.jpg') # Convert to grayscale image gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY) # Use MSER algorithm to detect text areas mser = cv2.ximgproc.segmentation.createMSER().detectRegions(gray) # Traverse all detected areas for i in range(len(mser)): # Get the bounding box of the area x, y, w, h = mser[i].boundingRect() # Draw a bounding box on the original image cv2.rectangle(img, (x, y), (x + w, y + h), (0, 255, 0), 1) # show image cv2.imshow('img', img) cv2.waitKey(0) cv2.destroyAllWindows()
5.OCR recognition
Finally, use the OCR library to perform character recognition on the located text area. Recognition can be done using the Tesseract OCR engine. The following is a sample code showing how to use Tesseract for OCR recognition:
# Perform OCR recognition on the located text area text = pytesseract.image_to_string(binary, lang='eng') print(text)
Summary:
Using OpenCV to implement camera OCR requires image preprocessing, text positioning, and OCR recognition. Through reasonable preprocessing and parameter adjustment, the accuracy of OCR can be improved.
Complete code display
Below we use a custom function to complete this step:
# -*- coding: utf-8 -*- # @Time : 2023/10/23 10:27 # @Author :Muzi # @File: Camera OCR.py # @Software: PyCharm #Import toolkit import numpy as np import cv2 def cv_show(name, img): cv2.imshow(name, img) cv2.waitKey(120) def order_points(pts): # A total of 4 coordinate points rect = np.zeros((4, 2), dtype="float32") # Find the corresponding coordinates 0123 in order: upper left, upper right, lower right, lower left # Calculate the upper left and lower right s = pts.sum(axis=1) rect[0] = pts[np.argmin(s)] rect[2] = pts[np.argmax(s)] # Calculate upper right and lower left diff = np.diff(pts, axis=1) rect[1] = pts[np.argmin(diff)] rect[3] = pts[np.argmax(diff)] return rect def four_point_transform(image, pts): # Get the input coordinate point rect = order_points(pts) (tl, tr, br, bl) = rect # Calculate the input w and h values widthA = np.sqrt(((br[0] - bl[0]) ** 2) + ((br[1] - bl[1]) ** 2)) widthB = np.sqrt(((tr[0] - tl[0]) ** 2) + ((tr[1] - tl[1]) ** 2)) maxWidth = max(int(widthA), int(widthB)) heightA = np.sqrt(((tr[0] - br[0]) ** 2) + ((tr[1] - br[1]) ** 2)) heightB = np.sqrt(((tl[0] - bl[0]) ** 2) + ((tl[1] - bl[1]) ** 2)) maxHeight = max(int(heightA), int(heightB)) #Corresponding coordinate position after transformation dst = np.array([ [0, 0], [maxWidth - 1, 0], [maxWidth - 1, maxHeight - 1], [0, maxHeight - 1]], dtype="float32") # Calculate transformation matrix M = cv2.getPerspectiveTransform(rect, dst) warped = cv2.warpPerspective(image, M, (maxWidth, maxHeight)) #Return the transformed result return warped def resize(image, width=None, height=None, inter=cv2.INTER_AREA): dim=None (h, w) = image.shape[:2] if width is None and height is None: return image if width is None: r = height / float(h) dim = (int(w * r), height) else: r = width / float(w) dim = (width, int(h * r)) resized = cv2.resize(image, dim, interpolation=inter) return resized # Read input import cv2 cap = cv2.VideoCapture(0) # Make sure the camera can be started. if not cap.isOpened(): # Failed to open print("Cannot open camera") exit() while True: flag = 0 # Used to identify whether the document is currently detected ret, image = cap.read() # If the frame is read correctly, ret is True orig = image.copy() if not ret: # If reading fails, exit the loop print("Unable to read camera") break # cv_show("image", image) gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY) #Image processing-convert to grayscale image # Preprocessing gray = cv2.GaussianBlur(gray, (5, 5), 0) # Gaussian filter edged = cv2.Canny(gray, 75, 200) #Contour detection cnts = cv2.findContours(edged.copy(), cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)[0] cnts = sorted(cnts, key=cv2.contourArea, reverse=True)[:3] image_contours = cv2.drawContours(image, cnts, -1, (0, 255, 0), 2) cv_show("image_contours", image_contours) # Traverse contours for c in cnts: # Calculate contour approximation peri = cv2.arcLength(c, True) # C represents the input point set # epsilon represents the maximum distance from the original contour to the approximate contour, which is an accuracy parameter #True means closed approx = cv2.approxPolyDP(c, 0.05 * peri, True) # Contour approximation area = cv2.contourArea(approx) # Take it out at 4 o'clock if area > 20000 and len(approx) == 4: screenCnt = approx flag=1 print(peri,area) print('Document detected') break if flag == 1: # Show results # print("STEP 2: Get outline") image_contours = cv2.drawContours(image, [screenCnt], 0, (0, 255, 0), 2) cv_show("image", image_contours) # Perspective transformation warped = four_point_transform(orig, screenCnt.reshape(4, 2)) cv_show("warped", warped) # Binary processing warped = cv2.cvtColor(warped, cv2.COLOR_BGR2GRAY) ref = cv2.threshold(warped, 220, 255, cv2.THRESH_BINARY)[1] cv_show("ref", ref) key_pressed = cv2.waitKey(100) if key_pressed == 27:#If the esc key is pressed, exit the loop break cap.release() # Release the capturer cv2.destroyAllWindows() # Close the image window
Results display:
The knowledge points of the article match the official knowledge files, and you can further learn related knowledge. OpenCV skill tree Home page Overview 23582 people are learning the system