Skip to content

cvt.camera

Pose File Formats

Please see the resources page on pose file formats to learn more about the expected/supported formats commonly used in this library.

A suite of common camera utilities.

This module includes several functions for manipulating and extracting information from camera intrinsics and extrinsics, as well as converting between specific formats.

build_o3d_traj(poses, K, width, height)

Converts an array of camera poses and an intrinsics matrix into Open3D trajectory format.

Parameters:

Name Type Description Default
poses ndarray

Nx4x4 array of camera pose in world-to-camera format.

required
K ndarray

3x3 camera intrinsics matrix.

required
width int

Expected image width.

required
height int

Expected image height.

required

Returns:

Type Description
PinholeCameraTrajectory

Open3D camera trajectory model.

Source code in src/cvt/camera.py
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
def build_o3d_traj(poses: np.ndarray, K: np.ndarray, width: int, height: int) -> o3d.camera.PinholeCameraTrajectory:
    """Converts an array of camera poses and an intrinsics matrix into Open3D trajectory format.

    Parameters:
        poses: Nx4x4 array of camera pose in world-to-camera format.
        K: 3x3 camera intrinsics matrix.
        width: Expected image width.
        height: Expected image height.

    Returns:
        Open3D camera trajectory model.
    """
    trajectory = o3d.camera.PinholeCameraTrajectory()
    for pose in poses:
        camera_params = o3d.camera.PinholeCameraParameters()
        camera_params.intrinsic = o3d.camera.PinholeCameraIntrinsic(width, height, K[0,0], K[1,1], K[0,2], K[1,2])
        camera_params.extrinsic = pose
        trajectory.parameters += [(camera_params)]
    return trajectory

camera_center_(pose)

Computes the center of a camera in world coordinates.

Parameters:

Name Type Description Default
pose ndarray

The extrinsics matrix (4x4) of a given camera in world-to-camera format.

required

Returns:

Type Description
ndarray

The camera center vector (3x1) in world coordinates.

Source code in src/cvt/camera.py
39
40
41
42
43
44
45
46
47
48
49
50
51
def camera_center_(pose: np.ndarray) -> np.ndarray:
    """Computes the center of a camera in world coordinates.

    Parameters:
        pose: The extrinsics matrix (4x4) of a given camera in world-to-camera format.

    Returns:
        The camera center vector (3x1) in world coordinates.
    """
    C = null_space(cam[:3,:4])
    C /= C[3,:]

    return C

compute_baselines_(poses)

Computes the minimum and maximum baseline between a reference camera pose (poses[0]) and a cluster of supporting camera poses (poses[1:]).

Parameters:

Name Type Description Default
poses ndarray

Array of camera poses (Nx4x4)

required

Returns:

Type Description
Tuple[float, float]

A tuple of baselines where the first element is the minimum baseline and the second element is the maximum baseline.

Source code in src/cvt/camera.py
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
def compute_baselines_(poses: np.ndarray) -> Tuple[float, float]:
    """Computes the minimum and maximum baseline between a reference camera pose (poses[0]) and a cluster of supporting camera poses (poses[1:]).

    Parameters:
        poses: Array of camera poses (Nx4x4)

    Returns:
        A tuple of baselines where the first element is the minimum baseline and the second element is the maximum baseline.
    """
    ref_pose = poses[0]
    ref_camera_center = null_space(ref_pose[:3,:])
    ref_camera_center = ref_camera_center[:3,0] / ref_camera_center[3,0]
    min_baseline = np.Inf
    max_baseline = 0.0

    num_views = len(poses)
    for i in range(1, num_views):
        sup_pose = poses[i]
        sup_camera_center = null_space(sup_pose[:3,:])
        sup_camera_center = sup_camera_center[:3,0] / sup_camera_center[3,0]
        b = np.linalg.norm(ref_camera_center - sup_camera_center)
        if b < min_baseline:
            min_baseline = b
        if b > max_baseline:
            max_baseline = b
    return min_baseline, max_baseline

crop_intrinsics_(K, crop_row, crop_col)

Adjusts intrinsics matrix principle point coordinates corresponding to image cropping.

Parameters:

Name Type Description Default
K ndarray

3x3 camera intrinsics matrix.

required
crop_row int

Offset for cy (corresponding to cropping the left and right side of an image by 'crop_row').

required
crop_col int

Offset for cx (corresponding to cropping the top and bottom side of an image by 'crop_col').

required

Returns:

Type Description
ndarray

The updated 3x3 intrinsics matrix

Source code in src/cvt/camera.py
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
def crop_intrinsics_(K: np.ndarray, crop_row: int, crop_col: int) -> np.ndarray:
    """Adjusts intrinsics matrix principle point coordinates corresponding to image cropping.

    Parameters:
        K: 3x3 camera intrinsics matrix.
        crop_row: Offset for cy (corresponding to cropping the left and right side of an image by 'crop_row').
        crop_col: Offset for cx (corresponding to cropping the top and bottom side of an image by 'crop_col').

    Returns:
        The updated 3x3 intrinsics matrix
    """

    K[0,2] -= crop_col
    K[1,2] -= crop_row
    return K

intrinsic_pyramid(K, levels)

Computes camera intrinsics pyramid corresonding to several levels of image downsampling.

Parameters:

Name Type Description Default
K tensor

[Batch x 3 x 3] camera intrinsics matrix.

required
levels int

The number of pyramid levels to compute.

required

Returns:

Type Description
tensor

The intrinsics pyramid of shape [Batch x L x 3 x 3].

Source code in src/cvt/camera.py
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
def intrinsic_pyramid(K: torch.tensor, levels: int) -> torch.tensor:
    """Computes camera intrinsics pyramid corresonding to several levels of image downsampling.

    Parameters:
        K: [Batch x 3 x 3] camera intrinsics matrix.
        levels: The number of pyramid levels to compute.

    Returns:
        The intrinsics pyramid of shape [Batch x L x 3 x 3].
    """
    batch_size, _, _ = K.shape
    K_pyr = torch.zeros((batch_size, levels, 3, 3)).to(K)

    for l in range(levels):
        if l==0:
            k = K
        else:
            k = torch.clone(K) / (2**l)
            k[:,2,2] = 1.0
        K_pyr[:,l] = k

    return K_pyr

intrinsic_pyramid_(K, levels)

Computes camera intrinsics pyramid corresonding to several levels of image downsampling.

Parameters:

Name Type Description Default
K ndarray

[3 x 3] camera intrinsics matrix.

required
levels int

The number of pyramid levels to compute.

required

Returns:

Type Description
ndarray

The intrinsics pyramid of shape [L x 3 x 3].

Source code in src/cvt/camera.py
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
def intrinsic_pyramid_(K: np.ndarray, levels: int) -> np.ndarray:
    """Computes camera intrinsics pyramid corresonding to several levels of image downsampling.

    Parameters:
        K: [3 x 3] camera intrinsics matrix.
        levels: The number of pyramid levels to compute.

    Returns:
        The intrinsics pyramid of shape [L x 3 x 3].
    """
    K_pyr = np.zeros((levels, 3, 3))

    for l in range(levels):
        if l==0:
            k = K
        else:
            k = np.copy(K) / (2**l)
            k[2,2] = 1.0
        K_pyr[l] = k

    return K_pyr

relative_transform_(poses_1, poses_2)

Computes the approximate relative transformation between two sets of camera trajectories.

Parameters:

Name Type Description Default
poses_1 ndarray

Array of the first set of cameras (Nx4x4).

required
poses_2 ndarray

Array of the second set of cameras (Nx4x4).

required

Returns:

Type Description
ndarray

The relative transformation matrix (4x4) between the two trajectories.

Source code in src/cvt/camera.py
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
def relative_transform_(poses_1: np.ndarray, poses_2: np.ndarray) -> np.ndarray:
    """Computes the approximate relative transformation between two sets of camera trajectories.

    Args:
        poses_1: Array of the first set of cameras (Nx4x4).
        poses_2: Array of the second set of cameras (Nx4x4).

    Returns:
        The relative transformation matrix (4x4) between the two trajectories.
    """
    centers_1 = np.squeeze(np.array([ camera_center(p) for p in poses_1 ]), axis=2)
    centers_2 = np.squeeze(np.array([ camera_center(p) for p in poses_2 ]), axis=2)

    ### determine scale
    # grab first camera pair
    c1_0 = centers_1[0][:3]
    c2_0 = centers_2[0][:3]

    # grab last camera pair
    c1_1 = centers_1[-1][:3]
    c2_1 = centers_2[-1][:3]

    # calculate the baseline between both sets of cameras
    baseline_1 = np.linalg.norm(c1_0 - c1_1)
    baseline_2 = np.linalg.norm(c2_0 - c2_1)

    # compute the scale based on the baseline ratio
    scale = baseline_2/baseline_1

    ### determine 1->2 Rotation 
    b1 = np.array([c[:3] for c in centers_1])
    b2 = np.array([c[:3] for c in centers_2])
    R = Rotation.align_vectors(b2,b1)[0].as_matrix()
    R = scale * R

    ### create transformation matrix
    M = np.eye(4)
    M[:3,:3] = R

    ### determine 1->2 Translation
    num_cams = len(poses_1)
    t = np.array([ c2-(M@c1) for c1,c2 in zip(centers_1,centers_2) ])
    t = np.mean(t, axis=0)

    ### add translation
    M[:3,3] = t[:3]

    return M

scale_intrinsics(K, scale)

Adjust intrinsics matrix focal length and principle point corresponding to image scaling.

Parameters:

Name Type Description Default
K tensor

Batchx3x3 camera intrinsics matrices.

required
scale float

Scale amount (i.e. scale=0.5 corresponds to scaling an image by a factor of 1/2).

required

Returns:

Type Description
tensor

The updated 3x3 intrinsics matrix.

Source code in src/cvt/camera.py
194
195
196
197
198
199
200
201
202
203
204
205
206
207
def scale_intrinsics(K: torch.tensor, scale: float) -> torch.tensor:
    """Adjust intrinsics matrix focal length and principle point corresponding to image scaling.

    Parameters:
        K: Batchx3x3 camera intrinsics matrices.
        scale: Scale amount (i.e. scale=0.5 corresponds to scaling an image by a factor of 1/2).

    Returns:
        The updated 3x3 intrinsics matrix.
    """
    K_clone = torch.clone(K)
    K_clone[:,:2,:] = K_clone[:,:2,:] * scale

    return K_clone

scale_intrinsics_(K, scale)

Adjust intrinsics matrix focal length and principle point corresponding to image scaling.

Parameters:

Name Type Description Default
K ndarray

3x3 camera intrinsics matrix.

required
scale float

Scale amount (i.e. scale=0.5 corresponds to scaling an image by a factor of 1/2).

required

Returns:

Type Description
ndarray

The updated 3x3 intrinsics matrix.

Source code in src/cvt/camera.py
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
def scale_intrinsics_(K: np.ndarray, scale: float) -> np.ndarray :
    """Adjust intrinsics matrix focal length and principle point corresponding to image scaling.

    Parameters:
        K: 3x3 camera intrinsics matrix.
        scale: Scale amount (i.e. scale=0.5 corresponds to scaling an image by a factor of 1/2).


    Returns:
        The updated 3x3 intrinsics matrix.
    """
    K_copy = K.copy()
    K_copy[:2, :] = K_copy[:2, :] * scale

    return K_copy

sfm_to_trajectory(cams, log_file)

Convert a set of cameras from SFM format to Trajectory File format.

Parameters:

Name Type Description Default
cams ndarray

Array of camera extrinsics (Nx4x4) to be converted.

required
log_file str

Output path to the *.log file that is to be created.

required
Source code in src/cvt/camera.py
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
def sfm_to_trajectory(cams: np.ndarray, log_file: str) -> None:
    """Convert a set of cameras from SFM format to Trajectory File format.

    Parameters:
        cams: Array of camera extrinsics (Nx4x4) to be converted.
        log_file: Output path to the *.log file that is to be created.
    """
    num_cams = len(cams)

    with open(log_file, 'w') as f:
        for i,cam in enumerate(cams):
            # write camera to output_file
            f.write("{} {} 0\n".format(str(i),str(i)))
            for row in cam:
                for c in row:
                    f.write("{} ".format(str(c)))
                f.write("\n")

    return

to_opengl_pose(pose)

OpenGL pose: (right-up-back) (cam-to-world)

Parameters:

Name Type Description Default
pose tensor

The pose in non-OpenGL format

required

Returns:

Type Description
tensor

Camera pose in OpenGL format.

Source code in src/cvt/camera.py
245
246
247
248
249
250
251
252
253
254
255
256
257
def to_opengl_pose(pose: torch.tensor) -> torch.tensor:
    """OpenGL pose: (right-up-back) (cam-to-world)

    Parameters:
        pose: The pose in non-OpenGL format

    Returns:
        Camera pose in OpenGL format.
    """
    pose = torch.linalg.inv(pose)
    pose[:3,1] *= -1
    pose[:3,2] *= -1
    return pose

to_opengl_pose_(pose)

OpenGL pose: (right-up-back) (cam-to-world)

Parameters:

Name Type Description Default
pose ndarray

The pose in non-OpenGL format

required

Returns:

Type Description
ndarray

Camera pose in OpenGL format.

Source code in src/cvt/camera.py
259
260
261
262
263
264
265
266
267
268
269
270
271
def to_opengl_pose_(pose: np.ndarray) -> np.ndarray:
    """OpenGL pose: (right-up-back) (cam-to-world)

    Parameters:
        pose: The pose in non-OpenGL format

    Returns:
        Camera pose in OpenGL format.
    """
    pose = np.linalg.inv(pose)
    pose[:3,1] *= -1
    pose[:3,2] *= -1
    return pose

trajectory_to_sfm(log_file, camera_path, intrinsics)

Convert a set of cameras from Trajectory File format to SFM format.

Parameters:

Name Type Description Default
log_file str

Input *.log file that stores the trajectory information.

required
camera_path str

Output path where the SFM camera files will be written.

required
intrinsics ndarray

Array of intrinsics matrices (Nx3x3) for each camera.

required
Source code in src/cvt/camera.py
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
def trajectory_to_sfm(log_file: str, camera_path: str, intrinsics: np.ndarray) -> None:
    """Convert a set of cameras from Trajectory File format to SFM format.

    Args:
        log_file: Input *.log file that stores the trajectory information.
        camera_path: Output path where the SFM camera files will be written.
        intrinsics: Array of intrinsics matrices (Nx3x3) for each camera.
    """
    with open(log_file, 'r') as f:
        lines = f.readlines()
        num_lines = len(lines)
        i = 0

        while(i < num_lines-5):
            view_num = int(lines[i].strip().split(" ")[0])

            cam = np.zeros((2,4,4))
            cam[0,0,:] = np.asarray(lines[i+1].strip().split(" "), dtype=float)
            cam[0,1,:] = np.asarray(lines[i+2].strip().split(" "), dtype=float)
            cam[0,2,:] = np.asarray(lines[i+3].strip().split(" "), dtype=float)
            cam[0,3,:] = np.asarray(lines[i+4].strip().split(" "), dtype=float)
            cam[0,:,:] = np.linalg.inv(cam[0,:,:])

            cam_file = "{:08d}_cam.txt".format(view_num)
            cam_path = os.path.join(camera_path, cam_file)

            cam[1,:,:] = intrinsics[view_num]

            write_cam(cam_path, cam)
            i = i+5
    return

y_axis_rotation(P, theta)

Applies a rotation to the given camera extrinsics matrix along the y-axis.

Parameters:

Name Type Description Default
P ndarray

Initial extrinsics camera matrix.

required
theta float

Angle (in radians) to rotate the camera.

required

Returns:

Type Description
ndarray

The rotated extrinsics matrix for the camera.

Source code in src/cvt/camera.py
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
def y_axis_rotation(P: np.ndarray, theta: float) -> np.ndarray:
    """Applies a rotation to the given camera extrinsics matrix along the y-axis.

    Parameters:
        P: Initial extrinsics camera matrix.
        theta: Angle (in radians) to rotate the camera.

    Returns:
        The rotated extrinsics matrix for the camera.
    """
    R = np.eye(4)
    R[0,0] = math.cos(theta)
    R[0,2] = math.sin(theta)
    R[2,0] = -(math.sin(theta))
    R[2,2] = math.cos(theta)

    P_rot = R @ P

    return P_rot

z_planes_from_disp(Z, b, f, delta)

Computes the near and far Z planes corresponding to 'delta' disparity steps between two cameras.

Parameters:

Name Type Description Default
Z tensor

Z buffer storing D depth plane hypotheses [B x C x D x H x W]. (shape resembles a typical PSV).

required
b tensor

The baseline between cameras [B].

required
f tensor

The focal length of camera [B].

required
delta float

The disparity delta for the near and far planes.

required

Returns:

Type Description
Tuple[tensor, tensor]

The tuple of near and far Z planes corresponding to 'delta' disparity steps.

Source code in src/cvt/camera.py
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
def z_planes_from_disp(Z: torch.tensor, b: torch.tensor, f: torc.tensor, delta: float) -> Tuple[torch.tensor, torch.tensor]:
    """Computes the near and far Z planes corresponding to 'delta' disparity steps between two cameras.

    Parameters:
        Z: Z buffer storing D depth plane hypotheses [B x C x D x H x W]. (shape resembles a typical PSV).
        b: The baseline between cameras [B].
        f: The focal length of camera [B].
        delta: The disparity delta for the near and far planes.

    Returns:
        The tuple of near and far Z planes corresponding to 'delta' disparity steps.
    """
    B, C, D, H, W = Z.shape
    b = b.reshape(B,1,1,1,1).repeat(1,C,D,H,W)
    f = f.reshape(B,1,1,1,1).repeat(1,C,D,H,W)

    near = Z*(b*f/((b*f) + (delta*Z)))
    far = Z*(b*f/((b*f) - (delta*Z)))

    return (near, far)