Notes for An Invitation to 3-D Vision.
A geometric model of image formation
Goal: project points in 3-D space onto images in a 2-D image plane.
Process:
- coordinate transformations between the camera frame and the world frame
- projection of 3-D coordinates onto 2-D image coordinates
- coordinate transformations between possible choices of image coordinate frame
Inversed chain - camera calibration
An ideal perspective camera
Recall the coordinates $X=[X,Y,Z]^T$ of the same point $p$ relative to the camera frame are given by a rigid-body transformation $g=(R,T)$ of $X_0$:
Adopting the frontal pinhole camera model, we see that the point $X$ is projected onto the image plane at point
$Z$: the depth of the point $p$.
In homogeneous coordinates, this relationship can be written as
which is equivalent as
The coordinate $Z$ (or the depth of the point $p$) is usually unknown, so we simply write it as an arbitrary positive scalar $\lambda \in \Bbb R_+$.
Define two matrices
$\Pi_0$: standard (or canonical) projection matrix.
The overall geometric model for an ideal camera can be describe as
or in matrix form,
Camera with intrinsic parameters
Goal: specify the relationship between the retinal plane coordinate frame and the pixel array.
Actual image coordinates
$(o_x,o_y)$: coordinates (in pixel) of the principal point (where the z-axis intersects the image plane) relative to the image reference frame.
$s_x, s_y$: scale factors. When $s_x=s_y$, each pixel is square.
$s_\theta$: skew factor, proportional to $cot(\theta)$, where $\theta$ is the angle between the image axes $x_s,y_s$.
The transformation matrix
An intrinsic parameter matrix / calibration matrix refers to