Image Formation

Notes for An Invitation to 3-D Vision.

A geometric model of image formation

Goal: project points in 3-D space onto images in a 2-D image plane.

Process:

coordinate transformations between the camera frame and the world frame
projection of 3-D coordinates onto 2-D image coordinates
coordinate transformations between possible choices of image coordinate frame

Inversed chain - camera calibration

An ideal perspective camera

Recall the coordinates $X=[X,Y,Z]^T$ of the same point $p$ relative to the camera frame are given by a rigid-body transformation $g=(R,T)$ of $X_0$:

$X=RX_0+T \ \in \Bbb R^3.$

Adopting the frontal pinhole camera model, we see that the point $X$ is projected onto the image plane at point

$x=\begin{bmatrix}x\\y\end{bmatrix}=\frac{f}{Z}\begin{bmatrix}X\\Y\end{bmatrix}.$

$Z$: the depth of the point $p$.

In homogeneous coordinates, this relationship can be written as

$Z\begin{bmatrix}x\\y\\1\end{bmatrix}=\begin{bmatrix}f&0&0&0\\0&f&0&0\\0&0&1&0\end{bmatrix}\begin{bmatrix}X\\Y\\Z\\1\end{bmatrix},$

which is equivalent as

$Zx=\begin{bmatrix}f&0&0&0\\0&f&0&0\\0&0&1&0\end{bmatrix}X.$

The coordinate $Z$ (or the depth of the point $p$) is usually unknown, so we simply write it as an arbitrary positive scalar $\lambda \in \Bbb R_+$.

Define two matrices

$K_f \dot{=}\begin{bmatrix}f&0&0\\0&f&0\\0&0&1\end{bmatrix}\in \Bbb R^{3\times 3},\ \Pi_0\dot{=}\begin{bmatrix}1&0&0&0\\0&1&0&0\\0&0&1&0\end{bmatrix} \in \Bbb R^{3\times 4}.$

$\Pi_0$: standard (or canonical) projection matrix.

The overall geometric model for an ideal camera can be describe as

$\lambda\begin{bmatrix}x\\y\\z\end{bmatrix}=\begin{bmatrix}f&0&0\\0&f&0\\0&0&1\end{bmatrix}\begin{bmatrix}1&0&0&0\\0&1&0&0\\0&0&1&0\end{bmatrix}\begin{bmatrix}R&T\\0&1\end{bmatrix}\begin{bmatrix}X_0\\Y_0\\Z_0\\1\end{bmatrix},$

or in matrix form,

$\lambda x=K_f\Pi_0X=K_f\Pi_0 gX_0.$

Camera with intrinsic parameters

Goal: specify the relationship between the retinal plane coordinate frame and the pixel array.

Actual image coordinates

$x'\dot{=}\begin{bmatrix}x'\\y'\\1\end{bmatrix}=\begin{bmatrix}s_x&s_\theta&o_x\\0&s_y&o_y\\0&0&1\end{bmatrix}\begin{bmatrix}x\\y\\1\end{bmatrix},$

$(o_x,o_y)$: coordinates (in pixel) of the principal point (where the z-axis intersects the image plane) relative to the image reference frame.

$s_x, s_y$: scale factors. When $s_x=s_y$, each pixel is square.

$s_\theta$: skew factor, proportional to $cot(\theta)$, where $\theta$ is the angle between the image axes $x_s,y_s$.

The transformation matrix

$K_s\dot{=}\begin{bmatrix}s_x&s_\theta&o_x\\0&s_y&o_y\\0&0&1\end{bmatrix} \in \Bbb R^{3\times 3}.$

An intrinsic parameter matrix / calibration matrix refers to

$K\dot{=}K_sK_f\dot{=}\begin{bmatrix}s_x&s_\theta&o_x\\0&s_y&o_y\\0&0&1\end{bmatrix}\begin{bmatrix}f&0&0\\0&f&0\\0&0&1\end{bmatrix}=\begin{bmatrix}fs_x&fs_\theta&o_x\\0&fs_y&o_y\\0&0&1\end{bmatrix}.$