Competition: Gesture detection and recognition algorithm based on machine vision

0 Preface

A series of high-quality competition projects, what I want to share today is

Gesture detection and recognition algorithm based on deep learning

This project is relatively new and suitable as a competition topic. It is highly recommended by senior students!

More information, project sharing:

1 Realize the effect

Without further ado, let’s take a look at the results achieved by the seniors.

2 Technical principles

2.1 Hand detection

Mainstream gesture segmentation methods are mainly divided into two categories: static gesture segmentation and dynamic gesture segmentation.

  • Static gesture segmentation method: A single picture is segmented using the difference between the hand and the background.

  • Dynamic gesture segmentation method: uses the information of video frame sequence for segmentation.

2.1.1 Gesture detection method based on skin color space

Skin color is the most obvious distinguishing feature between hands and other backgrounds. The color range of hands is relatively uniform and has clustering properties. The skin color-based segmentation method also has fast processing speed and is invariant to rotation, local occlusion, and posture transformation. Therefore, it can use different The color space for gesture segmentation is now the most commonly used method.

There are mainly the following methods for skin color segmentation: parametric and non-parametric explicit skin color clustering methods. The parametric model uses Gaussian color distribution, while the non-parametric model obtains the skin color histogram from the training data to estimate the skin color interval. Skin color clustering explicitly defines the boundaries of skin color in a specific color space. Broadly speaking, it is a static skin color filter. For example, Khan proposed an adaptive skin color model based on the detected face.

Skin color is a low-level feature that consumes little calculation. Perceptually uniform color spaces such as CIELAB, CIELUV, etc. have been used for skin color detection. Orthogonal color spaces such as YCbCr, YCgCr, YIQ, YUV, etc. are also used for skin color segmentation. For example, Julian et al. use YCrCb color space and use the CrCb component to build a Gaussian model for segmentation. The problem with using skin color segmentation is that the false detection rate is very high, so it is necessary to reduce external interference and improve the accuracy of segmentation through operations such as color correction and image normalization.

Hand detection based on YCrCb color space Cr, Cb range screening method, the implementation code is as follows:

# Skin color detection 2: YCrCb medium 140<=Cr<=175 100<=Cb<=120
img = cv2.imread(imname, cv2.IMREAD_COLOR)
ycrcb = cv2.cvtColor(img, cv2.COLOR_BGR2YCrCb) # Convert the image to YUV color gamut
(y, cr, cb) = cv2.split(ycrcb) # Image segmentation, obtain y, cr, br channel component images respectively

skin2 = np.zeros(cr.shape, dtype=np.uint8) # Create an all-0 matrix based on the size of the source image to save image data
(x, y) = cr.shape # Get the length and width of the source image data

# Traverse the image and determine the values of Cr and Br channels. If it is within the specified range, set the point of the new image to 255, otherwise set it to 0
for i in range(0, x):
for j in range(0, y):
if (cr[i][j] > 140) and (cr[i][j] < 175) and (cb[i][j] > 100) and (cb[i][j] < 120):
skin2[i][j] = 255
skin2[i][j] = 0

cv2.imshow(imname, img)
cv2.imshow(imname + " Skin2 Cr + Cb", skin2)

Detection effect:

2.1.2 Motion-based gesture detection method

Motion-based gesture segmentation methods separate the moving foreground and the static background, mainly including background difference method, inter-frame difference method, optical flow method, etc.

Inter-frame difference selects adjacent frames in the video stream for difference, and sets a certain threshold to distinguish the foreground and background, thereby extracting the target object. The principle of the frame difference method is simple, and the calculation is convenient and fast. However, when the current background color is the same, the target detection will be incomplete, and the stationary target cannot be detected.

Background difference requires establishing a background image, and using the current frame and the background image to make a difference to separate the front and rear backgrounds. Background difference is often used in target detection. There are background difference based on single Gaussian model, double Gaussian model, kernel density estimation method, etc. Scene difference can extract complete targets very well, but it is greatly affected by environmental changes. Therefore, it is necessary to establish a stable and reliable background model and an effective background update method.

1, read camera
2. Background subtraction
fgbg1 = cv.createBackgroundSubtractorMOG2(detectShadows=True)
fgbg2 = cv.createBackgroundSubtractorKNN(detectShadows=True)
# fgmask = fgbg1.apply(frame)
fgmask = fgbg2.apply(frame) # Two methods
3. Convert the frame-free image into a grayscale image, perform Gaussian denoising and finally binarize the image.
gray = cv.cvtColor(res, cv.COLOR_BGR2GRAY)
blur = cv.GaussianBlur(gray, (11, 11), 0)
ret, binary = cv.threshold(blur, 0, 255, cv.THRESH_BINARY | cv.THRESH_OTSU)
4. Select the ROI area of the hand and draw the outline
gesture = dst[50:600, 400:700]
contours, heriachy = cv.findContours(gesture, cv.RETR_EXTERNAL, cv.CHAIN_APPROX_SIMPLE) # Get the contour itself
for i, contour in enumerate(contours): # Get contours
cv.drawContours(frame, contours, i, (0, 0, 255), -1) # Draw contours

2.1.3 Edge-based gesture detection method

The edge-based gesture segmentation method uses edge detection operators to calculate the outline of the image in the image. The first-order operators commonly used for edge detection include (Roberts operator, Prewitt operator, Sobel operator, Canny operator, etc.). The second-order operator is (Marr-
Hildreth operator, Laplacian operator, etc.), these operators find the edges of the hand in the image. However, edge detection is sensitive to noise, so the accuracy is often not high.

Edge detection code example:

import numpy as np
from PIL import Image
import matplotlib.pyplot as plt
import as cm
import scipy.signal as signal #Import the signal module of sicpy

# Laplace operator
suanzi1 = np.array([[0, 1, 0],
                    [1,-4, 1],
                    [0, 1, 0]])

# Laplace expansion operator
suanzi2 = np.array([[1, 1, 1],
                    [1,-8, 1],
                    [1, 1, 1]])

# Open the image and convert it to grayscale
image ="pika.jpg").convert("L")
image_array = np.array(image)

#Use signal's convolve to calculate convolution
image_suanzi1 = signal.convolve2d(image_array,suanzi1,mode="same")
image_suanzi2 = signal.convolve2d(image_array,suanzi2,mode="same")

#Convert the convolution result to 0~255
image_suanzi1 = (image_suanzi1/float(image_suanzi1.max()))*255
image_suanzi2 = (image_suanzi2/float(image_suanzi2.max()))*255

# In order to see the edge detection results clearly, change the grayscale greater than the grayscale average to 255 (white)
image_suanzi1[image_suanzi1>image_suanzi1.mean()] = 255
image_suanzi2[image_suanzi2>image_suanzi2.mean()] = 255

# show image

2.1.4 Template-based gesture detection method

The template-based gesture segmentation method requires the establishment of a gesture template database, which records gesture templates in different gestures and scenarios. Calculate the distance between a certain image patch and each gesture in the database, and then do the same calculation over the entire image using a sliding window to find the best match in the database at the correct location in the image. Template matching is robust to the environment and noise, but the database needs to cover gestures of various hand shapes, sizes, positions, and angles, and because it needs to traverse the entire image to perform the same calculation, the real-time performance is poor.

2.1.5 Gesture detection method based on machine learning

Bayesian networks, cluster analysis, Gaussian classifiers, etc. are also used for skin color-based segmentation. Random forest is an integrated classifier that is easy to train and has high accuracy. It is used in segmentation and gesture recognition. Establish a skin color classification model and use random forest to classify pixels. It is found that the segmentation results obtained by random forest are more accurate than the above methods.

3 Hand recognition

There is no doubt that deep learning has a natural advantage in accuracy for image recognition. The effect of using deep learning convolutional network algorithms for gesture recognition is very good.

3.1 SSD Network

The SSD network is a convolutional neural network proposed in 2016, which has achieved good results in object detection. The SSD network is the same as the FCN network. The final prediction result uses feature map information of different scales and is detected on feature maps of different scales. Large feature maps can detect small objects, and small feature maps can detect large objects. A pyramid structure is used. feature map to achieve multi-scale detection. The network scores the detection frame of each detected object, obtains the category of the object in the frame, and adjusts the proportion and position of the border to adapt to the shape of the object.

3.2 Dataset

Data sets collected by our laboratory:

The dataset contains 48 gesture videos, which were captured by Google Glass and captured indoor and outdoor multi-person interactions from a first-person perspective. The data set contains 4 categories of gestures: one’s own left and right hands, and other people’s left and right hands. The data set contains high-quality, pixel-level annotated segmentation data sets and detection frame annotation data sets. The hands in the video are not subject to any constraints, including activities such as building blocks, playing chess, and guessing puzzles.

Students who need the data set can contact their senior to obtain it

3.3 Final improved network structure

In the end, the overall effect was pretty good:

4 Finally

More information, project sharing: