For three-dimensional vision, it is necessary to clearly understand the correspondence between the world coordinate system and the pixel coordinate system. This article uses python for experiments.
For the mathematical derivation of internal and external parameters of the camera, please see my previous blog “[AI Mathematics] Internal Parameters of Camera Imaging”
“, “[AI Mathematics] Parameters other than camera imaging”.
Experiment begins
First, clarify the internal and external parameters of the camera:
intri = [[1111.0, 0.0, 400.0], [0.0, 1111.0, 400.0], [0.0, 0.0, 1.0]] extri = [[-9.9990e-01, 4.1922e-03, -1.3346e-02, -5.3798e-02], [-1.3989e-02, -2.9966e-01, 9.5394e-01, 3.8455e + 00], [-4.6566e-10, 9.5404e-01, 2.9969e-01, 1.2081e + 00], [0.0, 0.0, 0.0, 1.0]]
We find a point on the world coordinate system, such as p1=[0, 0, 0], we project it to the pixel coordinate system, and then observe its projected coordinates.
First, we convert it to the camera coordinate system: (We can get the camera coordinate system only by using the external parameter extri)
import numpy as np extri = np.array( [[-9.9990e-01, 4.1922e-03, -1.3346e-02, -5.3798e-02], [-1.3989e-02, -2.9966e-01, 9.5394e-01, 3.8455e + 00], [-4.6566e-10, 9.5404e-01, 2.9969e-01, 1.2081e + 00], [0.0, 0.0, 0.0, 1.0]]) p1 = np.array([0, 0, 0]) R = extri[:3, :3] # rotation T = extri[:3, -1] # trans cam_p1 = np.dot(p1 - T, R) print(cam_p1)
Then, we transform the camera coordinate system to the pixel coordinate system:
intri = [[1111.0, 0.0, 400.0], [0.0, 1111.0, 400.0], [0.0, 0.0, 1.0]] screen_p1 = np.array([-cam_p1[0] * intri[0][0] / cam_p1[2] + intri[0][2], cam_p1[1] * intri[1][1] / cam_p1[2] + intri[1][2] ]) #Formula: #xy # x_im = f_x * --- + offset_x y_im = f_y * --- + offset_y # z z print(screen_p1) # [400.00057322 400.00211168]
It can be seen that after the origin of the world coordinate system [0, 0, 0] is transformed by this set of camera internal and external parameter matrices, the position projected onto the screen is [400, 400]. Our image size is exactly [800, 800], which is exactly the middle.
The above experiment was changed to object-oriented writing, which is more convenient to call:
import numpy as np classCamTrans: def __init__(self, intri, extri, h=None, w=None): self.intri = intri self.extri = extri self.h = int(2 * intri[1][2]) if h is None else h self.w = int(2 * intri[0][2]) if w is None else w self.R = extri[:3, :3] self.T = extri[:3, -1] self.focal_x = intri[0][0] self.focal_y = intri[1][1] self.offset_x = intri[0][2] self.offset_y = intri[1][2] def world2cam(self, p): return np.dot(p - self.T, self.R) def cam2screen(self, p): """ x y x_im = f_x * --- + offset_x y_im = f_y * --- + offset_y z z """ x, y, z = p return [-x * self.focal_x / z + self.offset_x, y * self.focal_y / z + self.offset_y] def world2screen(self, p): return self.cam2screen(self.world2cam(p)) if __name__ == "__main__": p = [0, 0, 1] intri = [[1111.0, 0.0, 400.0], [0.0, 1111.0, 400.0], [0.0, 0.0, 1.0]] extri = np.array( [[-9.9990e-01, 4.1922e-03, -1.3346e-02, -5.3798e-02], [-1.3989e-02, -2.9966e-01, 9.5394e-01, 3.8455e + 00], [-4.6566e-10, 9.5404e-01, 2.9969e-01, 1.2081e + 00], [0.0, 0.0, 0.0, 1.0]]) # another camera extri1 = np.array([[-3.0373e-01, -8.6047e-01, 4.0907e-01, 1.6490e + 00], [9.5276e-01, -2.7431e-01, 1.3041e-01, 5.2569e-01], [-7.4506e-09, 4.2936e-01, 9.0313e-01, 3.6407e + 00], [0.0, 0.0, 0.0, 1.0]]) ct = CamTrans(intri, extri) print(ct.world2screen(p)) ct1 = CamTrans(intri, extri1) print(ct1.world2screen(p))
Through experiments, we found that the origin of the world coordinate system is located at the center point (400, 400) in the pixel coordinates of both cameras. This verifies that our coordinate projection is correct~
Ray projection experiment
With the two matrices intri and extri, we can also emit rays from pixel coordinates to world coordinates, as shown in the figure:
A ray can be determined by connecting the optical center to a specific pixel. Our experiment is to verify that sampling points on the same ray will also be projected at the same point when projected back to pixel coordinates.
import numpy as np import torch def get_rays(h, w, K, c2w): i, j = torch.meshgrid(torch.linspace(0, w - 1, w), torch.linspace(0, h - 1, h)) # pytorch's meshgrid has indexing='ij' i = i.t() j = j.t() #directions dirs = torch.stack([(i - K[0][2]) / K[0][0], -(j - K[1][2]) / K[1][1], -torch.ones_like(i)], -1) #cameratoworld # Rotate ray directions from camera frame to the world frame # rays_d = torch.sum(dirs[..., np.newaxis, :] * c2w[:3, :3], -1) rays_d = torch.sum(dirs[..., None, :] * c2w[:3, :3], -1) # dot product, equals to: [c2w.dot(dir) for dir in dirs] # Translate camera frame's origin to the world frame. It is the origin of all rays. rays_o = c2w[:3, -1].expand(rays_d.shape) return rays_o, rays_d def _main(): intri = [[1111.0, 0.0, 400.0], [0.0, 1111.0, 400.0], [0.0, 0.0, 1.0]] extri = np.array( [[-9.9990e-01, 4.1922e-03, -1.3346e-02, -5.3798e-02], [-1.3989e-02, -2.9966e-01, 9.5394e-01, 3.8455e + 00], [-4.6566e-10, 9.5404e-01, 2.9969e-01, 1.2081e + 00], [0.0, 0.0, 0.0, 1.0]]) h, w = 800, 800 rays_o, rays_d = get_rays(h, w, K, pose) # all rays # rays_o means origins, rays_d means directions ray_o = rays_o[400][400] # choose (400, 400)th rays ray_d = rays_d[400][400] # choose (400, 400)th rays z_vals = [2., 6.] # pick 2 z-values pts = ray_o[..., None, :] + ray_d[..., None, :] * z_vals[..., :, None] # sampling 2 points from the ray print(pts) ct = CamTrans(intri, extri) print(ct.world2screen(pts[0])) print(ct.world2screen(pts[1])) # [399.99682432604334, 400.010187052666] # [399.99608371858295, 399.992518381194]
Through experiments, it was found that sampling points on the same ray will back-project the same two-dimensional point.