Teach you step by step how to use a monocular camera (OpenCv+Python)

Table of Contents

?edit

1. Monocular application prospects

2. Turn on the camera

3. Set the resolution

Fourth, take pictures with the camera

5. Record video

6. Practical application of monocular combined with OpenCV

1. Monocular application prospects

Monocular vision is widely used in deep learning and is one of the popular research directions in the field of computer vision and machine learning. Here are some of the main applications of monocular vision in deep learning:

Depth Estimation: Monocular depth estimation uses a single camera to infer the depth information of objects in the scene. Deep learning models, such as convolutional neural networks (CNN) and recurrent neural networks (RNN), have made significant progress in this field. These models can predict the depth value of each pixel based on the input image, thereby achieving the effect of stereoscopic vision.
SLAM (Simultaneous Localization and Mapping): Monocular SLAM refers to using a single camera to simultaneously locate the camera’s position and build a map of the scene. Deep learning can be used to improve key steps in SLAM such as visual feature extraction, motion estimation, and map construction.
Object detection and tracking: Monocular cameras can be used to detect and track objects in a scene. Deep learning models, such as YOLO (You Only Look Once) and Faster R-CNN, have been widely used in target detection and tracking tasks. This is especially important for areas such as autonomous driving, intelligent surveillance and drones.
Semantic Segmentation: Monocular image semantic segmentation refers to the task of labeling each pixel in an image as belonging to a specific category. Deep learning models can achieve high-precision image segmentation and are used to identify roads, pedestrians, vehicles, etc.
Human pose estimation: The monocular camera can be used to estimate the pose of the human body in the scene, including joint positions and bone structure. Deep learning models have made great progress in pose estimation and can be used in sports analysis, virtual reality and human-computer interaction.
Image generation and super-resolution: Deep learning models, such as generative adversarial networks (GAN) and convolutional neural networks (CNN), can be used for image generation and super-resolution. These techniques can be used for image restoration, style transfer, and image quality enhancement.
Autonomous driving: Monocular vision plays a key role in the field of autonomous driving. It is used to detect roads, vehicles, pedestrians and obstacles, and is used for autonomous driving decision-making and path planning.
Virtual Reality: Monocular vision is used in virtual reality applications such as head tracking, hand tracking and environment reconstruction in headsets.

Two, turn on the camera

OpenCV has the VideoCapture() function, which can be used to define a “camera” object. 0 represents the first camera (usually the computer’s built-in camera); if there are two cameras, the second camera corresponds to VideoCapture(1).

Use the read() function of the “camera object” in the while loop to read the camera screen data frame by frame.

The imshow function displays a certain frame of the camera; cv2.waitKey(1) waits for 1ms. If the keyboard input q is detected during the period, it exits the while loop.

# -*- coding: utf-8 -*-
import cv2
 
cap = cv2.VideoCapture(0) # 0 represents the first camera
while(1):
    # get a frame
    ret, frame = cap.read()
    # show a frame
    cv2.imshow("capture", frame)
    if cv2.waitKey(1) & 0xFF == ord('q'):
        break
cap.release()
cv2.destroyAllWindows()

Three, set resolution

Sometimes it is necessary to specify the resolution of the camera, such as 1920*1080;

cap.set(3,1920) sets the width of the frame to 1920. cap.set(4,1080) sets the length of the frame to 1080.

# -*- coding: utf-8 -*-
import cv2
 
cap = cv2.VideoCapture(0)
#Set the resolution first, width: 1920, length: 1080
cap.set(3,1920)
cap.set(4,1080)
while(1):
    # get a frame
    ret, frame = cap.read()
    # show a frame
    cv2.imshow("capture", frame)
    if cv2.waitKey(1) & 0xFF == ord('q'):
        break
cap.release()
cv2.destroyAllWindows()

Added to determine whether to read the video

# -*- coding: utf-8 -*-
import cv2
 
cap = cv2.VideoCapture(0)
#Set the resolution first, width: 1920, length: 1080
cap.set(3,1920)
cap.set(4,1080)
while(1):
    # get a frame
    ret, frame = cap.read()
    if ret:
        # show a frame
        cv2.imshow("capture", frame)
        if cv2.waitKey(1) & 0xFF == ord('q'):
            break
    else:
        print("Image data acquisition failed!!")
        break
cap.release()
cv2.destroyAllWindows()

Four, camera takes pictures

Take a picture = save the picture, use cv2.imwrite to write the frame data of the current camera and save it as a picture; if the s key is pressed, the picture will be saved;

cap.set(3,1920)

cap.set(4,1080

3 represents the width of the video frame, that is, setting the width of the image.
4 represents the height of the video frame, that is, setting the height of the image.

# -*- coding: utf-8 -*-
import cv2
 
cap = cv2.VideoCapture(0)
#Set the resolution first, width: 1920, length: 1080
cap.set(3,1920)
cap.set(4,1080)
# Image count starts from 1
img_count = 1
 
while(1):
    # get a frame
    ret, frame = cap.read()
    if ret:
        # show a frame
        cv2.imshow("capture", frame)
        # Wait for the key event to occur, wait 1ms
        key = cv2.waitKey(1)
        if key == ord('q'):
            # If the key is q, it means quit to exit the program.
            print("The program exited normally..")
            break
        elif key == ord('s'):
            ## If the s key is pressed, save the image
            # Write the picture and name the picture as picture serial number.png
            cv2.imwrite("{}.png".format(img_count), frame)
            print("Save the picture with the name {}.png".format(img_count))
            #The picture number count increases by 1
            img_count + = 1
 
    else:
        print("Image data acquisition failed!!")
        break
cap.release()
cv2.destroyAllWindows()

Five, record video

To save images, cv2.imwrite() is used. To save videos, you need to create a VideoWriter object and pass in four parameters.

Output file name, such as ‘output.avi’
Coding method FourCC code
Frame rate FPS
Resolution size to save

# -*- coding: utf-8 -*-
import cv2
 
cap = cv2.VideoCapture(0)
# Define encoding method and create VideoWriter object
fourcc = cv2.VideoWriter_fourcc(*'MJPG')
outfile = cv2.VideoWriter('output.avi', fourcc, 25., (640, 480))
 
while(cap.isOpened()):
    ret, frame = cap.read()
    if ret:
        outfile.write(frame) #Write file
        cv2.imshow('frame', frame)
        if cv2.waitKey(1) == ord('q'):
            break
    else:
        break

fourcc = cv2.VideoWriter_fourcc(*'MJPG') Defines the video encoding method, here the MJPG encoder is used. The VideoWriter_fourcc function is used to specify the video encoder. This is to ensure that the video can be saved to the file correctly.

outfile = cv2.VideoWriter('output.avi', fourcc, 25., (640, 480)) Create a video file writing object named ‘output.avi’, specify the encoding method ( MJPG), a frame rate of 25 frames per second, and a resolution of (640, 480) pixels per frame.

Use cv2.waitKey(1) to wait for keyboard input. If the user presses the ‘q’ key, the loop will be exited.

Six, practical application of monocular combined with OpenCV

Image capture and display: Use a monocular camera to capture live images, and then use OpenCV to display the images. This is the most basic use and can be used for monitoring, live image display and debugging.
Image processing and filtering: OpenCV provides various image processing and filtering techniques, such as blurring, edge detection, color space conversion, etc. These techniques can be used for image enhancement, noise removal, and feature extraction.
Target detection and tracking: OpenCV includes functions for target detection and tracking, which can be used for monitoring, autonomous driving, security and robot navigation.
Face detection and recognition: OpenCV provides face detection and recognition functions for various applications, including face unlocking, face recognition access control systems, and expression analysis.
Document Scanning and OCR: Documents can be captured using a monocular camera, and then OpenCV can be used for document scanning and optical character recognition (OCR) to extract text from the image.
Virtual and Augmented Reality: Monocular cameras are used in virtual reality and augmented reality applications, including head tracking, hand tracking, object recognition, and environment reconstruction.
Deep Learning: OpenCV integrates deep learning libraries that can be used for tasks such as image classification, object recognition, image segmentation and depth estimation. Monocular cameras combined with deep learning can be used for a variety of vision tasks.
Machine Vision: Monocular cameras combined with OpenCV are used for machine vision tasks such as parts inspection, assembly line inspection, quality control and industrial automation.
Autonomous driving: Monocular cameras are used in autonomous driving systems, including lane keeping, traffic sign detection and obstacle detection.
Medical image analysis: Monocular cameras are used for medical image analysis, including X-ray images, MRI images and skin lesion detection.
Environmental monitoring: Monocular cameras combined with OpenCV are used to monitor environmental conditions, such as weather, air quality and natural disasters.