OpenCV ~ object tracking

This article will show how to use some basic functions in OpenCV to perform complex object tracking tasks

OpenCV is a great tool for working with images and videos. Whether you want to give your photos a black-and-white look from the 90s, or perform complex math, OpenCV is there for you.

If you are interested in computer vision, knowledge of OpenCV is a must. The library contains more than 2500 optimized algorithms that can be used to perform various tasks. It is used by many industry giants like Google, Microsoft, IBM, and is widely used by research groups. The library supports several languages including java, c++ and python.

This article will show you how to use some basic functions in OpenCV to perform complex object tracking tasks.

Object tracking

Object tracking is the process of locating moving objects in a video. You can consider the example of a football game.

You have a live feed of the game in progress, and it’s your job to keep track of where the ball is at all times. This task may seem simple to ordinary people, but it is far too complicated for even the smartest machines.

As you probably know, computers only understand numbers. It doesn’t know what an image is, but the pixel values associated with it. Two images that look identical to the human eye may not look the same to a computer because even small changes in pixels can cause differences between pictures. Because object tracking is considered to be one of the most complex tasks in computer vision. While complex, it is not impossible.

Object tracking can be performed using machine learning as well as deep learning based methods.

On the one hand, deep learning methods provide better results on complex tasks and are very general, requiring large amounts of training data. Whereas ML-based methods are very simple, but not general.

In this article, we will use ML-based methods along with various computer vision techniques that we will discuss later in the article.

This technology is widely used in monitoring, security, traffic monitoring, robot vision, video communication and other fields. Furthermore, object tracking has several use cases such as crowd counting, self-driving cars, face detection, etc. Can you think of more examples where object tracking can be used in everyday life?

Since there are so many real-life applications, there is continuous research going on in this field to achieve higher accuracy and make the models more robust.

For this article, we will use this video (https://drive.google.com/file/d/1N6NcFpveLQLc_DnFjfuMMvfuCMTAJRFm/view?usp=sharing).

As you will see, there is a red ball moving in the maze, and our task is to detect the position of the ball and find its centroid. You can also see loud noises (crowds) in the background, making the task even more challenging.

1.

First, we import the required libraries we will be using.

import numpy as np
import cv2

2.

We’ll define a function to resize images so they’re large enough to fit on our screen. This step is completely optional, feel free to skip it.

def resize\(img\):
        return cv2.resize\(img,\(512,512\)\) # arg1- input image, arg- output\_width, output\_height

3.

As you probably know, video is made up of frames. A frame is just one of many still images that make up an entire moving picture. The next step is to read these frames using the VideoCapture() function in OpenCV, and using a while loop, we can see the frames moving.

You can adjust the speed of the video with cv2.waitKey(x) which pauses the screen for x milliseconds.

cap=cv2.VideoCapture\(vid\_file\_path\)
ret,frame=cap.read\(\)
  
while ret==True:
    ret,frame=cap.read\(\)
    cv2.imshow\("frame",resize\(frame\)\)
    key=cv2.waitKey\(1\)
    if key==ord\('q'\):
        break
cv2.waitKey\(0\)
cv2.destroyAllWindows\(\)

4.

Now it’s time to perform some thresholding and preprocessing. OpenCV reads images in BGR format, so we will convert the color space from BGR to HSV.

Why HSV and not BGR or any other format?

We use the HSV color format because it is more sensitive to small changes in external lighting. Therefore, it will provide a more accurate mask and thus a better result.

After converting the color space, what we do is filter out the red channel and create a mask box.

The red channel in hsv format occurs in the range [0,230,170] to [255,255,220].

cap=cv2.VideoCapture\(vid\_file\_path\)
  
  
ret,frame=cap.read\(\)
l\_b=np.array\(\[0,230,170\]\)# lower hsv bound for red
u\_b=np.array\(\[255,255,220\]\)# upper hsv bound to red
  
while ret==True:
    ret,frame=cap.read\(\)
  
    hsv=cv2.cvtColor\(frame,cv2.COLOR\_BGR2HSV\)
    mask=cv2.inRange\(hsv,l\_b,u\_b\)
  
    cv2.imshow\("frame",resize\(frame\)\)
  
    cv2.imshow\("mask",mask\)
  
  
    key=cv2.waitKey\(1\)
    if key==ord\('q'\):
        break
cv2.waitKey\(0\)
cv2.destroyAllWindows\(\)

5.

So far we’ve created a masked image of the frame, and we’ve filtered out most of the noise. Next is to get the bounds of the ball. For this, we will use the concept of contour detection.

The outline is nothing but the border around our sphere. Thankfully, we don’t have to find these boundaries ourselves, since OpenCV allows a function findContours() that we can use for our purposes. It takes a mask image and returns an array of contours. whaosoft aiot http://143ai.com

More info on contours here: https://docs.opencv.org/4.5.2/d4/d73/tutorial_py_contours_begin.html

Ideally, in our case, the value of the contour should be 1, since we only have one ball, but since some people are wearing red hats, we will get more than one. Can you think of some way to further reduce this noise?

To solve this problem, we will use another function cv2.contourArea() in OpenCV. We know that in the mask image, the ball has the largest area and so does its outline. Therefore, we will get the contour with the largest area.

We have the contours of the ball and we can directly draw these contours using the cv2.drawContours() function. But for detection tasks, what we generally do is to use a tightly bound rectangle to indicate that the object has been detected.

For this, we will use the cv2.boundingRect() function. This function will return the coordinates of the rectangle, then the cv2.rectangle() function will draw the rectangle for us.

cap=cv2.VideoCapture\(vid\_file\_path\)
  
  
ret,frame=cap.read\(\)
l\_b=np.array\(\[0,230,170\]\)# lower hsv bound for red
u\_b=np.array\(\[255,255,220\]\)# upper hsv bound to red
  
while ret==True:
    ret,frame=cap.read\(\)
  
    hsv=cv2.cvtColor\(frame,cv2.COLOR\_BGR2HSV\)
    mask=cv2.inRange\(hsv,l\_b,u\_b\)
  
    contours,\_= cv2.findContours\(mask,cv2.RETR\_TREE,cv2.CHAIN\_APPROX\_SIMPLE\)
  
    max\_contour = contours\[0\]
         for contour in contours:
                if cv2.contourArea\(contour\)>cv2.contourArea\(max\_contour\):
  
                      max\_contour=contour
  
         contour=max\_contour
         approx=cv2.approxPolyDP\(contour, 0.01\*cv2.arcLength\(contour,True\),True\)
         x,y,w,h=cv2.boundingRect\(approx\)
         cv2.rectangle\(frame,\(x,y\),\(x + w,y + h\),\(0,255,0\),4\)
  
    cv2.imshow\("frame",resize\(frame\)\)
  
    cv2.imshow\("mask",mask\)

6.

Also, what we can do is detect the center of mass of the ball at the same time. For this, we will use cv2.moments. cv2.moments computes a weighted average sum of pixel intensities within a contour, thus allowing to extract some more useful information from a blob like its radius, centroid, etc.

Make sure to convert the image to binary format before using this function. You can learn more about moments here: https://docs.opencv.org/3.4/d0/d49/tutorial_moments.html.

cap=cv2.VideoCapture\(vid\_file\_path\)
  
  
ret,frame=cap.read\(\)
l\_b=np.array\(\[0,230,170\]\)# lower hsv bound for red
u\_b=np.array\(\[255,255,220\]\)# upper hsv bound to red
  
while ret==True:
    ret,frame=cap.read\(\)
  
    hsv=cv2.cvtColor\(frame,cv2.COLOR\_BGR2HSV\)
    mask=cv2.inRange\(hsv,l\_b,u\_b\)
  
    contours,\_= cv2.findContours\(mask,cv2.RETR\_TREE,cv2.CHAIN\_APPROX\_SIMPLE\)
  
    max\_contour = contours\[0\]
         for contour in contours:
  
  
                if cv2.contourArea\(contour\)>cv2.contourArea\(max\_contour\):
  
                  max\_contour = contour
  
         approx=cv2.approxPolyDP\(contour, 0.01\*cv2.arcLength\(contour,True\),True\)
         x,y,w,h=cv2.boundingRect\(approx\)
         cv2.rectangle\(frame,\(x,y\),\(x + w,y + h\),\(0,255,0\),4\)
  
         M=cv2.moments\(contour\)
cx=int\(M\['m10'\]//M\['m00'\]\)
cy=int\(M\['m01'\]//M\['m00'\]\)
cv2.circle\(frame, \(cx,cy\), 3,\(255,0,0\),\-1\)
 cv2.imshow\("frame",resize\(frame\)\)
  
    cv2.imshow\("mask",mask\)
  
    key=cv2.waitKey\(1\)
    if key==ord\('q'\):
        break
cv2.waitKey\(0\)
cv2.destroyAllWindows\(\)