Projection of the world coordinate system to the pixel coordinate system [python experiment]

For three-dimensional vision, it is necessary to clearly understand the correspondence between the world coordinate system and the pixel coordinate system. This article uses python for experiments.
For the mathematical derivation of internal and external parameters of the camera, please see my previous blog “[AI Mathematics] Internal Parameters of Camera Imaging”
“, “[AI Mathematics] Parameters other than camera imaging”.

Experiment begins
First, clarify the internal and external parameters of the camera:

intri = [[1111.0, 0.0, 400.0],
         [0.0, 1111.0, 400.0],
         [0.0, 0.0, 1.0]]
         
extri = [[-9.9990e-01, 4.1922e-03, -1.3346e-02, -5.3798e-02],
        [-1.3989e-02, -2.9966e-01, 9.5394e-01, 3.8455e + 00],
        [-4.6566e-10, 9.5404e-01, 2.9969e-01, 1.2081e + 00],
        [0.0, 0.0, 0.0, 1.0]]

We find a point on the world coordinate system, such as p1=[0, 0, 0], we project it to the pixel coordinate system, and then observe its projected coordinates.

First, we convert it to the camera coordinate system: (We can get the camera coordinate system only by using the external parameter extri)

import numpy as np

extri = np.array(
        [[-9.9990e-01, 4.1922e-03, -1.3346e-02, -5.3798e-02],
        [-1.3989e-02, -2.9966e-01, 9.5394e-01, 3.8455e + 00],
        [-4.6566e-10, 9.5404e-01, 2.9969e-01, 1.2081e + 00],
        [0.0, 0.0, 0.0, 1.0]])
p1 = np.array([0, 0, 0])

R = extri[:3, :3] # rotation
T = extri[:3, -1] # trans
cam_p1 = np.dot(p1 - T, R)

print(cam_p1)

Then, we transform the camera coordinate system to the pixel coordinate system:

intri = [[1111.0, 0.0, 400.0],
         [0.0, 1111.0, 400.0],
         [0.0, 0.0, 1.0]]
screen_p1 = np.array([-cam_p1[0] * intri[0][0] / cam_p1[2] + intri[0][2],
cam_p1[1] * intri[1][1] / cam_p1[2] + intri[1][2]
])
#Formula:
#xy
# x_im = f_x * --- + offset_x y_im = f_y * --- + offset_y
# z z
       

print(screen_p1)
# [400.00057322 400.00211168]

It can be seen that after the origin of the world coordinate system [0, 0, 0] is transformed by this set of camera internal and external parameter matrices, the position projected onto the screen is [400, 400]. Our image size is exactly [800, 800], which is exactly the middle.

The above experiment was changed to object-oriented writing, which is more convenient to call:

import numpy as np

classCamTrans:
    def __init__(self, intri, extri, h=None, w=None):
        self.intri = intri
        self.extri = extri
        self.h = int(2 * intri[1][2]) if h is None else h
        self.w = int(2 * intri[0][2]) if w is None else w

        self.R = extri[:3, :3]
        self.T = extri[:3, -1]
        self.focal_x = intri[0][0]
        self.focal_y = intri[1][1]
        self.offset_x = intri[0][2]
        self.offset_y = intri[1][2]

    def world2cam(self, p):
        return np.dot(p - self.T, self.R)

    def cam2screen(self, p):
        """
                          x y
            x_im = f_x * --- + offset_x y_im = f_y * --- + offset_y
                          z z
        """
        x, y, z = p
        return [-x * self.focal_x / z + self.offset_x, y * self.focal_y / z + self.offset_y]

    def world2screen(self, p):
        return self.cam2screen(self.world2cam(p))


if __name__ == "__main__":
    p = [0, 0, 1]
    intri = [[1111.0, 0.0, 400.0],
         [0.0, 1111.0, 400.0],
         [0.0, 0.0, 1.0]]
    extri = np.array(
        [[-9.9990e-01, 4.1922e-03, -1.3346e-02, -5.3798e-02],
        [-1.3989e-02, -2.9966e-01, 9.5394e-01, 3.8455e + 00],
        [-4.6566e-10, 9.5404e-01, 2.9969e-01, 1.2081e + 00],
        [0.0, 0.0, 0.0, 1.0]])
    
    # another camera
    extri1 = np.array([[-3.0373e-01, -8.6047e-01, 4.0907e-01, 1.6490e + 00],
        [9.5276e-01, -2.7431e-01, 1.3041e-01, 5.2569e-01],
        [-7.4506e-09, 4.2936e-01, 9.0313e-01, 3.6407e + 00],
        [0.0, 0.0, 0.0, 1.0]])

    ct = CamTrans(intri, extri)
    print(ct.world2screen(p))
    
    ct1 = CamTrans(intri, extri1)
    print(ct1.world2screen(p))

Through experiments, we found that the origin of the world coordinate system is located at the center point (400, 400) in the pixel coordinates of both cameras. This verifies that our coordinate projection is correct~

Ray projection experiment
With the two matrices intri and extri, we can also emit rays from pixel coordinates to world coordinates, as shown in the figure:

A ray can be determined by connecting the optical center to a specific pixel. Our experiment is to verify that sampling points on the same ray will also be projected at the same point when projected back to pixel coordinates.

import numpy as np
import torch


def get_rays(h, w, K, c2w):
    i, j = torch.meshgrid(torch.linspace(0, w - 1, w),
                          torch.linspace(0, h - 1, h))
    # pytorch's meshgrid has indexing='ij'

    i = i.t()
    j = j.t()

    #directions
    dirs = torch.stack([(i - K[0][2]) / K[0][0],
                        -(j - K[1][2]) / K[1][1],
                        -torch.ones_like(i)],
                        -1)

    #cameratoworld
    # Rotate ray directions from camera frame to the world frame
    # rays_d = torch.sum(dirs[..., np.newaxis, :] * c2w[:3, :3], -1)
    rays_d = torch.sum(dirs[..., None, :] * c2w[:3, :3], -1)
    # dot product, equals to: [c2w.dot(dir) for dir in dirs]

    # Translate camera frame's origin to the world frame. It is the origin of all rays.
    rays_o = c2w[:3, -1].expand(rays_d.shape)

    return rays_o, rays_d

def _main():
    intri = [[1111.0, 0.0, 400.0],
         [0.0, 1111.0, 400.0],
         [0.0, 0.0, 1.0]]
    extri = np.array(
        [[-9.9990e-01, 4.1922e-03, -1.3346e-02, -5.3798e-02],
        [-1.3989e-02, -2.9966e-01, 9.5394e-01, 3.8455e + 00],
        [-4.6566e-10, 9.5404e-01, 2.9969e-01, 1.2081e + 00],
        [0.0, 0.0, 0.0, 1.0]])
    h, w = 800, 800
    rays_o, rays_d = get_rays(h, w, K, pose) # all rays
    # rays_o means origins, rays_d means directions
    
    ray_o = rays_o[400][400] # choose (400, 400)th rays
    ray_d = rays_d[400][400] # choose (400, 400)th rays
    
    z_vals = [2., 6.] # pick 2 z-values
    pts = ray_o[..., None, :] + ray_d[..., None, :] * z_vals[..., :, None] # sampling 2 points from the ray
    print(pts)
    ct = CamTrans(intri, extri)
    print(ct.world2screen(pts[0]))
    print(ct.world2screen(pts[1]))
    # [399.99682432604334, 400.010187052666]
    # [399.99608371858295, 399.992518381194]

Through experiments, it was found that sampling points on the same ray will back-project the same two-dimensional point.