Representation of a Three-Dimensional Moving Scene

Notes for An Invitation to 3-D Vision.

3-D Euclidean space

Definition 1 Vector. Omitted.

Definition 2 Cross product. Given two vector $u,v \in \Bbb R^3$, their cross product is

$u\times v \ \dot{=}\begin{bmatrix}u_2v_3-u_3v_2\\u_3v_1-u_1v_3\\u_1v_2-u_2v_1\end{bmatrix} \in \Bbb R^3.$

Some properties:

$\langle u\times v,u \rangle=\langle u\times v,v \rangle=0,\ u\times v=-v\times u$

orthogonal
the order of the factors defines an orientation
right-hand rule: for $e_1 \ \dot{=} [1,0,0]^T,\ e_2 \ \dot{=} [0,1,0]^T, \ e_1 \times e_2= [0,0,1]^T\ \dot{=}\ e_3$

If we fix $u$, the cross product can be represented by a map from $\Bbb R^3$ to $\Bbb R^3$: $v \mapsto u \times v.$ This map is linear in $v$ and can be represented by a skew-symmetric matrix $\hat{u} \in \Bbb R^{3\times 3}.$

$\hat{u} \ \dot{=} \begin{bmatrix} 0&-u_3&u_2\\ u_3&0&-u_1\\-u_2&u_1&0\end{bmatrix}\in \Bbb R^{3\times 3}.$

Lemma 3 Skew-symmetric matrix. A matrix $M \in \Bbb R^{3 \times 3}$ is skew-symmetric if and only if $M=\hat{u}$ for some $u \in \Bbb R^3$.

Rigid-body motion

It’s sufficient to specify the motion of one point and the motion of three coordinate axes attached to that point. Because the distances between any two points do not change over time as the object move.

Euclidean transformation: a map that preserves the distance.

Definition 4 Rigid-body motion or special Euclidean transformation. A map $g:\Bbb R^3 \mapsto\Bbb R^3$ is a rigid-body motion or special Euclidean transformation if it preserves the norm (or inner product) and the cross product of any two vectors.

Rigid-body motion also preserves triple product which corresponds to the volume of the parallelepiped spanned by three vectors.

Two coordinate frame

world (reference) frame: fixed
object (body) coordinate frame: this right-handed orthonormal frame is attached to some point on the rigid body

The configuration of camera

translational part $T$: the vector between the origin of the world frame and that of the camera frame
rotational part $R$: the relative orientation of the camera frame $C$ with coordinate axes $(x, y,z)$, relative to the fixed world frame $W$ with coordinate axes $(X,Y,Z)$

Rotational motion and its representation

Orthogonal matrix representation of rotations

A rotation matrix about the Z-axis:

$R_z(\theta)=\begin{bmatrix}cos\theta&-sin\theta&0\\sin\theta&cos\theta&0\\0&0&1\end{bmatrix}.$

The inverse transformation of a rotation is also a rotation:

$R_{cw}=R^{-1}_{wc}=R^{T}_{wc}$

Canonical exponential coordinates for rotations

The dimension of the space of rotation matrices should be 3, and 6 parameters are redundant.

(to be continue…)

Rigid-body motion and its representations

$X_w=R_{wc}X_c+T_{wc}$

We denote the full rigid-body motion by $g_{wc}=(R_{wx},T_{wc})$, or simply $g=(R,T)$.

homogeneous coordinates

Notice that the coordinate transformation for a full rigid-body motion is not linear but affine. We may convert a affine transformation to a linear one by using homogeneous coodinates.

$\bar{X} \dot{=}\begin{bmatrix}X\\1 \end{bmatrix} \in \Bbb R^4.$

Rewrite the affine transformation in a linear form

$\bar{X}_w=\begin{bmatrix}X_w\\1 \end{bmatrix}=\begin{bmatrix}R_{wc}&T_{wc}\\0&1 \end{bmatrix}\begin{bmatrix}X_c\\1 \end{bmatrix} \dot{=}\ \bar{g}_{wc}\bar{X}_c,$

where the 4 * 4 matrix $\bar{g} _{wc} \in \Bbb R^{4 \times 4}$ is called the homogeneous representation of the rigid-body motion $g_{wc}$. In general, if $g=(R,T)$, its homogeneous representation is

$\bar{g}=\begin{bmatrix}R&T\\0&1 \end{bmatrix} \in \Bbb R^{4 \times 4}.$

The rigid-body motion (R, T) between the world frame and the camera frame is sometimes referred to as the extrinsic calibration parameters.

Canonical exponential coordinates for rigid-body motions

(to be continue…)

Coordinate and velocity transformations

(to be continue…)