Depth Map

Article directory

Depth map
What is a depth map
How to obtain depth map
- Sensor methods such as lidar or structured light
- - lidar
  - RGB-D camera
- Depth calculation using disparity information from binocular or multi-camera
- Depth estimation using deep learning models
Depth map application scenarios
Further reading

What is a depth map

A depth map is a grayscale image in which each pixel has distance information from the camera. It is an image representation commonly used in computer vision to describe the three-dimensional structure of a scene.

The external link image transfer failed. The source site may have an anti-leeching mechanism. It is recommended to save the image. Upload directly

A simple and straightforward expression in the picture is that the redder the place, the closer it is to the observer (i.e. the screen). When you see the cone in the picture, it is closer to where we observe it, so the color is redder. As for the mask in the picture, because it is placed at an angle, the bottom is closer to us, so the color at the bottom is redder than the color at the top.

How to obtain depth map

The history of the development of depth maps can be traced back to the 1960s. Initially, depth images are manually annotated or inferred using prior knowledge. With the development of computer vision technology, depth image acquisition methods and algorithms have also been continuously improved and improved.

There are many ways to obtain depth maps. Common methods include:

Depth information is obtained through sensors such as lidar or structured light, and then converted into depth images.
Depth is calculated using disparity information from binocular or multi-camera cameras and then converted into a depth image.
Use prior knowledge or models to analyze the image and infer the depth information of each pixel.

Methods of sensors such as lidar or structured light

The depth obtained by sensors such as lidar or structured light can obtain the absolute depth, because their data are measured and the real distance is calculated based on TOF. So in a continuous picture sequence, since the depth is absolute, they have the same reference value.

Lidar

This method is also called the TOF method (Time Of Fly), which uses the time difference between the emission and reception of the laser/radar wave, combined with the speed of light, to calculate the distance traveled by the signal during this time, so it is also possible to obtain the distance of the laser to different objects. distance from the launch point.

RGB-D camera

In addition, in addition to radar, lidar cameras can also achieve similar effects. You can also use the combination of lidar and structured light to obtain more accurate depth values.

Disparity information of binocular or multi-camera to calculate depth

We can simulate the way the human eye works, using two cameras, an approach called stereoscopic vision.

In the figure below, p is a point in space, z is its depth, Ol and Or are the two left and right cameras, corresponding to the above-mentioned O and O’. f is the camera focal length.

According to the formula for similar triangles:

The focal length of the camera and the distance between the two cameras are known, so we can easily know that the depth of a point is inversely proportional to x and x’, thus obtaining the depth map of all points in the picture.

OpenCV also provides related calculation functions:

In this way we can roughly calculate the depth in the picture:

You may want to ask, the above depth pictures are all in color, why does this one turn into black and white, but in fact
This is what a depth map should look like
We used OpenCV’s applycolor map function to color map the grayscale image to make the result look more vivid and fancy.

Here is an example to illustrate the role of applyColorMap:

def ColorMap_demo():
    img = cv.imread("lena.jpg",cv.IMREAD_GRAYSCALE)
    if img.shape[0]==0:
        print("load image fail!")
        return
    windowname="applyColorMap"
    cv.namedWindow(windowame,cv.WINDOW_AUTOSIZE)
    pos=0
    cv.createTrackbar("Type",windaname,pos,22,callback_trackbar)
    while True:
        pos = cv.getTrackbarPos("Type",windowname)
        imgdst = np.copy(img)
        if pos != 0 :
            imgdst = cv.applyColorMap(img,pos-1)
        cv.imshow(windowname,imgdst)
        if cv.waitKey(10) == 27:
            break
if __name__ == "__main__":
    ColorMap_demo()

The depth obtained by a binocular or multi-camera is also an absolute depth, because the principle is to calculate the distance through similar triangles of fixed-position cameras. So in a continuous picture sequence, since the depth is absolute, they have the same reference value.

Estimating depth using deep learning model

A typical method using deep learning to estimate the depth of an RGB image is as follows:

From the output of the RGBD camera, the depth component is obtained to obtain a real depth map.
Only input the RGB image and let the model generate the corresponding depth estimation map.
Calculate the difference between the model’s depth estimation map and the actual depth map to obtain the estimated error
The optimization goal of the deep learning network is to reduce the error between the estimated depth and the actual depth.
After a lot of training, you can obtain a network that can estimate the depth map based on the RGB image.

This mainly refers to using the model to estimate the depth of objects in the picture. The result obtained in this way is the relative depth difference between different pixels in a picture, but in a continuous picture sequence, the difference between two frames The depth estimation results are not necessarily related. For example, assuming the mask above is a video sequence, in the first frame the depth of the left eye of the mask is 100, and the depth of the right eye of the mask is estimated to be 110. In the second frame, the depth of the left eye of the mask might be 1000, and the depth of the right eye The depth may be 1010.

It can be found that when the depth of the same area between two frames is estimated using a deep learning model, the absolute value result has no reference value: for example, for the left eye, the depth in the first frame and the depth in the second frame are The depth estimates aren’t even on the same scale.

However, the relative depth of different positions within a frame is for reference: for example, no matter which frame the depth difference between the left eye and the right eye is always 10.

This is precisely due to the strategy used in deep learning model training. Monocular depth prediction is basically this kind of fitting regression. It is an ill-posed mathematical problem in itself. It is impossible to recover a three-dimensional image from a single 2D picture. Information. Because there is a lack of information in itself.

Application scenarios of depth maps

3D reconstruction: Depth maps can be used to create 3D models such as buildings, sculptures, human bodies, etc.
Virtual Reality: Depth maps can be used to create virtual reality environments such as games, training simulators, and more.
Autonomous Driving: Depth maps can be used to help self-driving cars identify roads, obstacles, and other vehicles.
Medical Imaging: Depth maps can be used in medical imaging, such as X-rays, CT scans, and MRIs.
Layer separation: Determine the distance relationship between objects in picture materials and obtain information before and after layers.

Here we can give an example of autonomous driving, which uses lidar to obtain surrounding environment information and use it to sense the distance between various objects and the car body.
The external link image transfer failed. The source site may have an anti-leeching mechanism. It is recommended to save the image. Upload directly

Extended reading

Saliency Map