3D Coordinate Frames and Rotations
A coordinate frame in 3D space is uniquely defined by a set of 3 orthogonal basis vectors. It can be denoted as a Vectrix:
\[
\underrightarrow{\mathcal{A}} = \begin{bmatrix} \overrightarrow{x}_A & \overrightarrow{y}_A & \overrightarrow{z}_A \end{bmatrix}
\]
A 3D vector \(\overrightarrow{a}\) can be represented as a linear combination of the basis vectors under \(\underrightarrow{\mathcal{A}}\):
\[
\overrightarrow{a} = {}^\mathcal{A}a_x \cdot \overrightarrow{x}_A + {}^\mathcal{A}a_y \cdot \overrightarrow{y}_A + {}^\mathcal{A}a_z \cdot \overrightarrow{z}_A = \underrightarrow{\mathcal{A}} \begin{bmatrix} {}^\mathcal{A}a_x \\ {}^\mathcal{A}a_y \\ {}^\mathcal{A}a_z \end{bmatrix}
\]
The column vector \(\mathbf{a}\) represents the coordinates of \(\overrightarrow{a}\) under frame \(\underrightarrow{\mathcal{A}}\). The difference between a vector and its coordinate is as follows:
- Vector (\(\overrightarrow{a}\)): Has both length and direction and is invariant to coordinate frames.
- Coordinates (\(\mathbf{a}\)): Are associated with a specific coordinate frame, meaning the same vector will have different coordinates under different frames.
Let us consider an example with two coordinate frames in 3D space, \(\mathcal{A}\) and \(\mathcal{B}\). A vector \(\overrightarrow{a}\) can thus be represented as:
\[
\overrightarrow{a} = \underrightarrow{\mathcal{A}} \begin{bmatrix} {}^\mathcal{A}a_x \\ {}^\mathcal{A}a_y \\ {}^\mathcal{A}a_z \end{bmatrix} = \underrightarrow{\mathcal{B}} \begin{bmatrix} {}^\mathcal{B}a_x \\ {}^\mathcal{B}a_y \\ {}^\mathcal{B}a_z \end{bmatrix}
\]
Rotation Matrix Between Coordinate Frames
Now, let us define a rotation matrix that relates the two coordinate frames \(\mathcal{A}\) and \(\mathcal{B}\):
\[
R_{AB} = \underrightarrow{\mathcal{A}}^\top \underrightarrow{\mathcal{B}} =
\begin{bmatrix}
\overrightarrow{x}_A \cdot \overrightarrow{x}_B & \overrightarrow{x}_A \cdot \overrightarrow{y}_B & \overrightarrow{x}_A \cdot \overrightarrow{z}_B \\
\overrightarrow{y}_A \cdot \overrightarrow{x}_B & \overrightarrow{y}_A \cdot \overrightarrow{y}_B & \overrightarrow{y}_A \cdot \overrightarrow{z}_B \\
\overrightarrow{z}_A \cdot \overrightarrow{x}_B & \overrightarrow{z}_A \cdot \overrightarrow{y}_B & \overrightarrow{z}_A \cdot \overrightarrow{z}_B
\end{bmatrix}
\]
Given the rotation matrix \(R_{AB}\), the coordinates of vector \(\overrightarrow{a}\) in frame \(\mathcal{A}\) can be obtained from its coordinates in frame \(\mathcal{B}\) as follows:
\[
\begin{bmatrix}
{}^\mathcal{A}a_x \\
{}^\mathcal{A}a_y \\
{}^\mathcal{A}a_z
\end{bmatrix}
= R_{AB}
\begin{bmatrix}
{}^\mathcal{B}a_x \\
{}^\mathcal{B}a_y \\
{}^\mathcal{B}a_z
\end{bmatrix}
\]
An intuition
Each column vector \(R_{AB}\) represents the projection of a basis vector of \(\underrightarrow{\mathcal{B}}\) onto the basis vectors of \(\underrightarrow{\mathcal{A}}\), i.e., the xyz axis of coordinate frame \(\underrightarrow{\mathcal{B}}\) is represented using the coordinate frame \(\underrightarrow{\mathcal{A}}\). Thus, we have:
\[
\begin{bmatrix}
{}^\mathcal{A}a_x \\
{}^\mathcal{A}a_y \\
{}^\mathcal{A}a_z
\end{bmatrix}
= R_{AB} \begin{bmatrix}
{}^\mathcal{B}a_x \\
{}^\mathcal{B}a_y \\
{}^\mathcal{B}a_z
\end{bmatrix}
=
\begin{bmatrix}
\overrightarrow{x_A} \cdot \overrightarrow{x_B}\\
\overrightarrow{y_A} \cdot \overrightarrow{x_B}\\
\overrightarrow{z_A} \cdot \overrightarrow{x_B}
\end{bmatrix} {}^\mathcal{B}a_x
+
\begin{bmatrix}
\overrightarrow{x_A} \cdot \overrightarrow{y_B}\\
\overrightarrow{y_A} \cdot \overrightarrow{y_B}\\
\overrightarrow{z_A} \cdot \overrightarrow{y_B}
\end{bmatrix} {}^\mathcal{B}a_y
+
\begin{bmatrix}
\overrightarrow{x_A} \cdot \overrightarrow{z_B}\\
\overrightarrow{y_A} \cdot \overrightarrow{z_B}\\
\overrightarrow{z_A} \cdot \overrightarrow{z_B}
\end{bmatrix} {}^\mathcal{B}a_z
\]
Great, so \(R_{AB}\) can be used to transform the coordinates of a vector from coordinate frame \(\underrightarrow{\mathcal{B}}\) to coordinate frame \(\underrightarrow{\mathcal{A}}\).
By applying this rotation matrix to coordinates of \(\overrightarrow{a}\) in \(\underrightarrow{\mathcal{B}}\) coordinate frame, we could get
\[
R_{AB} \left[\begin{array}{c} {}^\mathcal{B}a_x\\ {}^\mathcal{B}a_y \\ {}^\mathcal{B}a_z \end{array}\right]=\underrightarrow{\mathcal{A}}^\top \underrightarrow{\mathcal{B}} \left[\begin{array}{c} {}^\mathcal{B}a_x\\ {}^\mathcal{B}a_y \\ {}^\mathcal{B}a_z \end{array}\right]
=\underrightarrow{\mathcal{A}}^\top \underrightarrow{\mathcal{A}}\left[\begin{array}{c} {}^\mathcal{A}a_x\\ {}^\mathcal{A}a_y \\ {}^\mathcal{A}a_z \end{array}\right]
=\left[\begin{array}{c} {}^\mathcal{A}a_x\\ {}^\mathcal{A}a_y \\ {}^\mathcal{A}a_z \end{array}\right]
\]
Thus, we can find that \(R_{AB}\) means transform the coordinates of a vector from coordinate frame \(\underrightarrow{\mathcal{B}}\) to coordinate frame \(\underrightarrow{\mathcal{A}}\). Alternatively, we can also think about the \(R_{AB}\) as transforming the coordinate frame \(\underrightarrow{\mathcal{A}}\) to \(\underrightarrow{\mathcal{B}}\) as in the figure below.
If we consider the column vectors of \(R_{AB}\) , we can think of it as representing each basis of coordinate frame \(\underrightarrow{\mathcal{B}}\) using coordinate frame \(\underrightarrow{\mathcal{A}}\). To further illustrate that \(R_{AB}\) means trasnforming the coordinate frame \(\underrightarrow{\mathcal{A}}\) to \(\underrightarrow{\mathcal{B}}\), let us consider a practical example:
Here we know that:
\[
R_{OA}=
\begin{bmatrix}
\frac{\sqrt{2}}{2} & \frac{\sqrt{2}}{2} & 0\\
\frac{\sqrt{2}}{2} & \frac{\sqrt{2}}{2} & 0\\
0 & 0 & 1
\end{bmatrix}
\]
And we have
\[
{}^o P_2 = R_{OA} {}^oP_1 = R_{OA} {}^A P_2
\]
For left path, it means rotating original coordinate system to coordinate system system \(A\). (Extrinsic rotation) For right path, it means converting the coordinate of point in coordinate system \(A\) to original coordinate system.
Coordinate system conversion
Let us consider a practical example. Let us say we have a world coordinate frame \(\underrightarrow{\mathcal{W}}\), a OpenCV coordinate frame \(\underrightarrow{\mathcal{C}}\), and a OpenGL coordinate frame \(\underrightarrow{\mathcal{C}^\prime}\).
We can know that \(R_{wc^\prime} = R_{wc} R_{cc^\prime}\), and it can be relatively easy to know that
\[
R_{cc^\prime} =
\begin{bmatrix}
1 & 0 & 0 \\
0 & -1 & 0 \\
0 & 0 & -1 \\
\end{bmatrix}
\]
Camera pose and extrinsic
Extrinsic, i.e., world to camera matrix (w2c), is used to transform points in world coordinate into camera coordinate (So it should be written as \(T_{cw}\)) :
\[
P_c = \begin{bmatrix} R_{cw} & t_c \end{bmatrix} \begin{bmatrix} P_w \\ 1 \end{bmatrix}
\]
Camera pose, i.e., camera to world matrix (c2w), is the inverse of extrinsic matrix that transform points in camera coordinate into world coordinate (So it should be written as \(T_{wc}\)):
\[
P_w = \begin{bmatrix} R_{wc} & C_w \end{bmatrix} \begin{bmatrix} P_c \\ 1 \end{bmatrix}
\]
We have:
\[
\begin{aligned}
{\left[\begin{array}{c|c}
\mathrm{R}_{cw} & \boldsymbol{t} \\
\hline \mathbf{0} & 1
\end{array}\right] } & =\left[\begin{array}{c|c}
\mathrm{R}_{\mathrm{wc}} & \mathrm{C}_w \\
\hline \mathbf{0} & 1
\end{array}\right]^{-1} \\
& =\left[\left[\begin{array}{c|c}
\mathrm{I} & \mathrm{C}_w \\
\hline \mathbf{0} & 1
\end{array}\right]\left[\begin{array}{c|c}
\mathrm{R}_{\mathrm{wc}} & 0 \\
\hline \mathbf{0} & 1
\end{array}\right]\right]^{-1} \\
& =\left[\begin{array}{c|c}
\mathrm{R}_{\mathrm{wc}} & 0 \\
\hline \mathbf{0} & 1
\end{array}\right]^{-1}\left[\begin{array}{c|c}
\mathrm{I} & \mathrm{C}_w \\
\hline \mathbf{0} & 1
\end{array}\right] \\
& =\left[\begin{array}{c|c}
\mathbf{R}_{wc}^{\top} & 0 \\
\hline \mathbf{0} & 1
\end{array}\right]\left[\begin{array}{c|c}
1 & -\mathrm{C}_w \\
\hline \mathbf{0} & 1
\end{array}\right] \\
& =\left[\begin{array}{c|c}
\mathbf{R}_{wc}^{\top} & -\mathbf{R}_{wc}^{\top} \mathrm{C}_w \\
\hline \mathbf{0} & 1
\end{array}\right]
\end{aligned}
\]
Thus, we know:
\[
\begin{aligned}
R_{cw} &= R_{wc}^\top\\
t_c &= -R_{wc}^\top \mathrm{C}_w\\
&= -R_{cw} \mathrm{C}_w
\end{aligned}
\]
Orthographic projection
Transform from the orthographic view volume to the canonical view volume (NDC)
\[
M_{\text{orth}} =
\begin{bmatrix}
\frac{2}{r-l} & 0 & 0 & -\frac{r+l}{r-l} \\
0 & \frac{2}{t-b} & 0 & -\frac{t+b}{t-b} \\
0 & 0 & \frac{2}{n-f} & -\frac{n+f}{n-f} \\
0 & 0 & 0 & 1 \\
\end{bmatrix}
\]
Perspective projection
In perspective projection, a 3D point in a truncated pyramid frustum i.e., eye coordinate (left) is mapped to a canonical volume (right).
The perspective matrix can be written:
\[
P =
\begin{bmatrix}
n & 0 & 0 & n \\
0 & n & 0 & 0 \\
0 & 0 & n+f & -fn \\
0 & 0 & 1 & 0 \\
\end{bmatrix}
\]
The first, second, and fourth rows simply implement the perspective equation. The third row ensures that the point on near and far plane remains the same. See below:
\[
P
\begin{bmatrix}
x\\
y\\
z\\
1
\end{bmatrix}
=
\begin{bmatrix}
nx\\
ny\\
(n+f)z-fn\\
z
\end{bmatrix}
=
\begin{bmatrix}
\frac{nx}{z}\\
\frac{ny}{z}\\
(n+f)-\frac{fn}{z}\\
1
\end{bmatrix}
\]
So if we want to get full projection matrix that mappes from truncated pyramid frustum to NDC, then we need:
\[
P^{\prime}
=
M_{\text{orth}}P
=
\begin{bmatrix}
\frac{2n}{r-l} & 0 & \frac{l+r}{l-r} & 0 \\
0 & \frac{2n}{t-b} & \frac{b+t}{b-t} & 0 \\
0 & 0 & \frac{n+f}{n-f} & \frac{2fn}{f-n} \\
0 & 0 & 1 & 0 \\
\end{bmatrix}
\]