Skip to content

Rotation and Transformations in 3D

3D Coordinate Frames and Rotations

A coordinate frame in 3D space is uniquely defined by a set of 3 orthogonal basis vectors, and can be denoted as a Vectrix:

\[ \underrightarrow{\mathcal{A}} = \big[\begin{matrix} \overrightarrow{x}_A & \overrightarrow{y}_A & \overrightarrow{z}_A \end{matrix}\big]^\top \]

A 3D vector \(\overrightarrow{a}\) could be represented as linear combination of the basis vectors under \(\underrightarrow{\mathcal{A}}\):

\[ \overrightarrow{a} = {}^\mathcal{A}a_x \cdot \overrightarrow{x}_A + {}^\mathcal{A}a_y \cdot \overrightarrow{y}_A + {}^\mathcal{A}a_z \cdot \overrightarrow{z}_A = \mathbf{a}^\top\underrightarrow{\mathcal{A}} \]

The column vector \(\mathbf{a}\) is coordinate of \(\overrightarrow{a}\) under frame \(\underrightarrow{\mathcal{A}}\). The difference between vector and its coordinate is: A vector has both length and direction and is invariant to coordinate frames, whereas the coordinates of a vector are always associated with coordinate frames, thus the same vector has different coordinates under different coordinate frames.

Let us consider an example that we have two coordinate frames in 3D space, \(\mathcal{A}\) and \(\mathcal{B}\). A vector \(\overrightarrow{a}\) can thus be represented as:

\[ \overrightarrow{a} = \underrightarrow{\mathcal{A}}\left[\begin{array}{c} {}^\mathcal{A}a_x\\ {}^\mathcal{A}a_y \\ {}^\mathcal{A}a_z \end{array}\right] =\underrightarrow{\mathcal{B}}\left[\begin{array}{c} {}^\mathcal{B}a_x\\ {}^\mathcal{B}a_y \\ {}^\mathcal{B}a_z \end{array}\right] \]

Now let us define a rotation matrix

\[ R_{AB}=\underrightarrow{\mathcal{A}}^\top \underrightarrow{\mathcal{B}} = \begin{bmatrix} \overrightarrow{x_A} \cdot \overrightarrow{x_B} & \overrightarrow{x_A} \cdot \overrightarrow{y_B} & \overrightarrow{x_A} \cdot \overrightarrow{z_B} \\ % \overrightarrow{y_A} \cdot \overrightarrow{x_B} & \overrightarrow{y_A} \cdot \overrightarrow{y_B} & \overrightarrow{y_A} \cdot \overrightarrow{z_B} \\ % \overrightarrow{z_A} \cdot \overrightarrow{x_B} & \overrightarrow{z_A} \cdot \overrightarrow{y_B} & \overrightarrow{z_A} \cdot \overrightarrow{z_B} \end{bmatrix} \]

By applying this rotation matrix to coordinates of \(\overrightarrow{a}\) in \(\underrightarrow{\mathcal{B}}\) coordinate frame, we could get

\[ R_{AB} \left[\begin{array}{c} {}^\mathcal{B}a_x\\ {}^\mathcal{B}a_y \\ {}^\mathcal{B}a_z \end{array}\right]=\underrightarrow{\mathcal{A}}^\top \underrightarrow{\mathcal{B}} \left[\begin{array}{c} {}^\mathcal{B}a_x\\ {}^\mathcal{B}a_y \\ {}^\mathcal{B}a_z \end{array}\right] =\underrightarrow{\mathcal{A}}^\top \underrightarrow{\mathcal{A}}\left[\begin{array}{c} {}^\mathcal{A}a_x\\ {}^\mathcal{A}a_y \\ {}^\mathcal{A}a_z \end{array}\right] =\left[\begin{array}{c} {}^\mathcal{A}a_x\\ {}^\mathcal{A}a_y \\ {}^\mathcal{A}a_z \end{array}\right] \]

Thus, we can find that \(R_{AB}\) means transform the coordinates of a vector from coordinate frame \(\underrightarrow{\mathcal{B}}\) to coordinate frame \(\underrightarrow{\mathcal{A}}\). Alternatively, we can also think about the \(R_{AB}\) as trasnforming the coordinate frame \(\underrightarrow{\mathcal{A}}\) to \(\underrightarrow{\mathcal{B}}\) as in the figure below.

Rotation

If we consider the column vectors of \(R_{AB}\) , we can think of it as representing each basis of coordinate frame \(\underrightarrow{\mathcal{B}}\) using coordinate frame \(\underrightarrow{\mathcal{A}}\). To further illustrate that \(R_{AB}\) means trasnforming the coordinate frame \(\underrightarrow{\mathcal{A}}\) to \(\underrightarrow{\mathcal{B}}\), let us consider a practical example:

Rotation

Here we know that:

\[ R_{OA}= \begin{bmatrix} \frac{\sqrt{2}}{2} & \frac{\sqrt{2}}{2} & 0\\ \frac{\sqrt{2}}{2} & \frac{\sqrt{2}}{2} & 0\\ 0 & 0 & 1 \end{bmatrix} \]

And we have

\[ {}^o P_2 = R_{OA} {}^oP_1 = R_{OA} {}^A P_2 \]

Rotation

For left path, it means rotating original coordinate system to coordinate system system \(A\). (Extrinsic rotation) For right path, it means converting the coordinate of point in coordinate system \(A\) to original coordinate system.

Coordinate system conversion

Let us consider a practical example. Let us say we have a world coordinate frame \(\underrightarrow{\mathcal{W}}\), a OpenCV coordinate frame \(\underrightarrow{\mathcal{C}}\), and a OpenGL coordinate frame \(\underrightarrow{\mathcal{C}^\prime}\).

Rotation

We can know that \(R_{wc^\prime} = R_{wc} R_{cc^\prime}\), and it can be relatively easy to know that

\[ R_{cc^\prime} = \begin{bmatrix} 1 & 0 & 0 \\ 0 & -1 & 0 \\ 0 & 0 & -1 \\ \end{bmatrix} \]

Camera pose and extrinsic

Extrinsic, i.e., world to camera matrix (w2c), is used to transform points in world coordinate into camera coordinate (So it should be written as \(T_{cw}\)) :

\[ P_c = \begin{bmatrix} R_{cw} & t_c \end{bmatrix} \begin{bmatrix} P_w \\ 1 \end{bmatrix} \]

Camera pose, i.e., camera to world matrix (c2w), is the inverse of extrinsic matrix that transform points in camera coordinate into world coordinate (So it should be written as \(T_{wc}\)):

\[ P_w = \begin{bmatrix} R_{wc} & C_w \end{bmatrix} \begin{bmatrix} P_c \\ 1 \end{bmatrix} \]

We have:

\[ \begin{aligned} {\left[\begin{array}{c|c} \mathrm{R}_{cw} & \boldsymbol{t} \\ \hline \mathbf{0} & 1 \end{array}\right] } & =\left[\begin{array}{c|c} \mathrm{R}_{\mathrm{wc}} & \mathrm{C}_w \\ \hline \mathbf{0} & 1 \end{array}\right]^{-1} \\ & =\left[\left[\begin{array}{c|c} \mathrm{I} & \mathrm{C}_w \\ \hline \mathbf{0} & 1 \end{array}\right]\left[\begin{array}{c|c} \mathrm{R}_{\mathrm{wc}} & 0 \\ \hline \mathbf{0} & 1 \end{array}\right]\right]^{-1} \\ & =\left[\begin{array}{c|c} \mathrm{R}_{\mathrm{wc}} & 0 \\ \hline \mathbf{0} & 1 \end{array}\right]^{-1}\left[\begin{array}{c|c} \mathrm{I} & \mathrm{C}_w \\ \hline \mathbf{0} & 1 \end{array}\right] \\ & =\left[\begin{array}{c|c} \mathbf{R}_{wc}^{\top} & 0 \\ \hline \mathbf{0} & 1 \end{array}\right]\left[\begin{array}{c|c} 1 & -\mathrm{C}_w \\ \hline \mathbf{0} & 1 \end{array}\right] \\ & =\left[\begin{array}{c|c} \mathbf{R}_{wc}^{\top} & -\mathbf{R}_{wc}^{\top} \mathrm{C}_w \\ \hline \mathbf{0} & 1 \end{array}\right] \end{aligned} \]

Thus, we know:

\[ \begin{aligned} R_{cw} &= R_{wc}^\top\\ t_c &= -R_{wc}^\top \mathrm{C}_w\\ &= -R_{cw} \mathrm{C}_w \end{aligned} \]

Orthographic projection

Transform from the orthographic view volume to the canonical view volume (NDC)

\[ M_{\text{orth}} = \begin{bmatrix} \frac{2}{r-l} & 0 & 0 & -\frac{r+l}{r-l} \\ 0 & \frac{2}{t-b} & 0 & -\frac{t+b}{t-b} \\ 0 & 0 & \frac{2}{n-f} & -\frac{n+f}{n-f} \\ 0 & 0 & 0 & 1 \\ \end{bmatrix} \]

Perspective projection

In perspective projection, a 3D point in a truncated pyramid frustum i.e., eye coordinate (left) is mapped to a canonical volume (right). perspective

The perspective matrix can be written:

\[ P = \begin{bmatrix} n & 0 & 0 & n \\ 0 & n & 0 & 0 \\ 0 & 0 & n+f & -fn \\ 0 & 0 & 1 & 0 \\ \end{bmatrix} \]

The first, second, and fourth rows simply implement the perspective equation. The third row ensures that the point on near and far plane remains the same. See below:

\[ P \begin{bmatrix} x\\ y\\ z\\ 1 \end{bmatrix} = \begin{bmatrix} nx\\ ny\\ (n+f)z-fn\\ z \end{bmatrix} = \begin{bmatrix} \frac{nx}{z}\\ \frac{ny}{z}\\ (n+f)-\frac{fn}{z}\\ 1 \end{bmatrix} \]

So if we want to get full projection matrix that mappes from truncated pyramid frustum to NDC, then we need:

\[ P^{\prime} = M_{\text{orth}}P = \begin{bmatrix} \frac{2n}{r-l} & 0 & \frac{l+r}{l-r} & 0 \\ 0 & \frac{2n}{t-b} & \frac{b+t}{b-t} & 0 \\ 0 & 0 & \frac{n+f}{n-f} & \frac{2fn}{f-n} \\ 0 & 0 & 1 & 0 \\ \end{bmatrix} \]