3D Coordinate Frames and Rotations
A coordinate frame in 3D space is uniquely defined by a set of 3 orthogonal basis vectors, and can be denoted as a Vectrix:
\[
\underrightarrow{\mathcal{A}} = \big[\begin{matrix} \overrightarrow{x}_A & \overrightarrow{y}_A & \overrightarrow{z}_A \end{matrix}\big]^\top
\]
A 3D vector \(\overrightarrow{a}\) could be represented as linear combination of the basis vectors under \(\underrightarrow{\mathcal{A}}\):
\[
\overrightarrow{a} = {}^\mathcal{A}a_x \cdot \overrightarrow{x}_A + {}^\mathcal{A}a_y \cdot \overrightarrow{y}_A + {}^\mathcal{A}a_z \cdot \overrightarrow{z}_A = \mathbf{a}^\top\underrightarrow{\mathcal{A}}
\]
The column vector \(\mathbf{a}\) is coordinate of \(\overrightarrow{a}\) under frame \(\underrightarrow{\mathcal{A}}\). The difference between vector and its coordinate is: A vector has both length and direction and is invariant to coordinate frames, whereas the coordinates of a vector are always associated with coordinate frames, thus the same vector has different coordinates under different coordinate frames.
Let us consider an example that we have two coordinate frames in 3D space, \(\mathcal{A}\) and \(\mathcal{B}\). A vector \(\overrightarrow{a}\) can thus be represented as:
\[
\overrightarrow{a} = \underrightarrow{\mathcal{A}}\left[\begin{array}{c} {}^\mathcal{A}a_x\\ {}^\mathcal{A}a_y \\ {}^\mathcal{A}a_z \end{array}\right]
=\underrightarrow{\mathcal{B}}\left[\begin{array}{c} {}^\mathcal{B}a_x\\ {}^\mathcal{B}a_y \\ {}^\mathcal{B}a_z \end{array}\right]
\]
Now let us define a rotation matrix
\[
R_{AB}=\underrightarrow{\mathcal{A}}^\top \underrightarrow{\mathcal{B}} =
\begin{bmatrix}
\overrightarrow{x_A} \cdot \overrightarrow{x_B} & \overrightarrow{x_A} \cdot \overrightarrow{y_B} & \overrightarrow{x_A} \cdot \overrightarrow{z_B} \\
%
\overrightarrow{y_A} \cdot \overrightarrow{x_B} & \overrightarrow{y_A} \cdot \overrightarrow{y_B} & \overrightarrow{y_A} \cdot \overrightarrow{z_B} \\
%
\overrightarrow{z_A} \cdot \overrightarrow{x_B} & \overrightarrow{z_A} \cdot \overrightarrow{y_B} & \overrightarrow{z_A} \cdot \overrightarrow{z_B}
\end{bmatrix}
\]
By applying this rotation matrix to coordinates of \(\overrightarrow{a}\) in \(\underrightarrow{\mathcal{B}}\) coordinate frame, we could get
\[
R_{AB} \left[\begin{array}{c} {}^\mathcal{B}a_x\\ {}^\mathcal{B}a_y \\ {}^\mathcal{B}a_z \end{array}\right]=\underrightarrow{\mathcal{A}}^\top \underrightarrow{\mathcal{B}} \left[\begin{array}{c} {}^\mathcal{B}a_x\\ {}^\mathcal{B}a_y \\ {}^\mathcal{B}a_z \end{array}\right]
=\underrightarrow{\mathcal{A}}^\top \underrightarrow{\mathcal{A}}\left[\begin{array}{c} {}^\mathcal{A}a_x\\ {}^\mathcal{A}a_y \\ {}^\mathcal{A}a_z \end{array}\right]
=\left[\begin{array}{c} {}^\mathcal{A}a_x\\ {}^\mathcal{A}a_y \\ {}^\mathcal{A}a_z \end{array}\right]
\]
Thus, we can find that \(R_{AB}\) means transform the coordinates of a vector from coordinate frame \(\underrightarrow{\mathcal{B}}\) to coordinate frame \(\underrightarrow{\mathcal{A}}\). Alternatively, we can also think about the \(R_{AB}\) as trasnforming the coordinate frame \(\underrightarrow{\mathcal{A}}\) to \(\underrightarrow{\mathcal{B}}\) as in the figure below.
If we consider the column vectors of \(R_{AB}\) , we can think of it as representing each basis of coordinate frame \(\underrightarrow{\mathcal{B}}\) using coordinate frame \(\underrightarrow{\mathcal{A}}\). To further illustrate that \(R_{AB}\) means trasnforming the coordinate frame \(\underrightarrow{\mathcal{A}}\) to \(\underrightarrow{\mathcal{B}}\), let us consider a practical example:
Here we know that:
\[
R_{OA}=
\begin{bmatrix}
\frac{\sqrt{2}}{2} & \frac{\sqrt{2}}{2} & 0\\
\frac{\sqrt{2}}{2} & \frac{\sqrt{2}}{2} & 0\\
0 & 0 & 1
\end{bmatrix}
\]
And we have
\[
{}^o P_2 = R_{OA} {}^oP_1 = R_{OA} {}^A P_2
\]
For left path, it means rotating original coordinate system to coordinate system system \(A\). (Extrinsic rotation) For right path, it means converting the coordinate of point in coordinate system \(A\) to original coordinate system.
Coordinate system conversion
Let us consider a practical example. Let us say we have a world coordinate frame \(\underrightarrow{\mathcal{W}}\), a OpenCV coordinate frame \(\underrightarrow{\mathcal{C}}\), and a OpenGL coordinate frame \(\underrightarrow{\mathcal{C}^\prime}\).
We can know that \(R_{wc^\prime} = R_{wc} R_{cc^\prime}\), and it can be relatively easy to know that
\[
R_{cc^\prime} =
\begin{bmatrix}
1 & 0 & 0 \\
0 & -1 & 0 \\
0 & 0 & -1 \\
\end{bmatrix}
\]
Camera pose and extrinsic
Extrinsic, i.e., world to camera matrix (w2c), is used to transform points in world coordinate into camera coordinate (So it should be written as \(T_{cw}\)) :
\[
P_c = \begin{bmatrix} R_{cw} & t_c \end{bmatrix} \begin{bmatrix} P_w \\ 1 \end{bmatrix}
\]
Camera pose, i.e., camera to world matrix (c2w), is the inverse of extrinsic matrix that transform points in camera coordinate into world coordinate (So it should be written as \(T_{wc}\)):
\[
P_w = \begin{bmatrix} R_{wc} & C_w \end{bmatrix} \begin{bmatrix} P_c \\ 1 \end{bmatrix}
\]
We have:
\[
\begin{aligned}
{\left[\begin{array}{c|c}
\mathrm{R}_{cw} & \boldsymbol{t} \\
\hline \mathbf{0} & 1
\end{array}\right] } & =\left[\begin{array}{c|c}
\mathrm{R}_{\mathrm{wc}} & \mathrm{C}_w \\
\hline \mathbf{0} & 1
\end{array}\right]^{-1} \\
& =\left[\left[\begin{array}{c|c}
\mathrm{I} & \mathrm{C}_w \\
\hline \mathbf{0} & 1
\end{array}\right]\left[\begin{array}{c|c}
\mathrm{R}_{\mathrm{wc}} & 0 \\
\hline \mathbf{0} & 1
\end{array}\right]\right]^{-1} \\
& =\left[\begin{array}{c|c}
\mathrm{R}_{\mathrm{wc}} & 0 \\
\hline \mathbf{0} & 1
\end{array}\right]^{-1}\left[\begin{array}{c|c}
\mathrm{I} & \mathrm{C}_w \\
\hline \mathbf{0} & 1
\end{array}\right] \\
& =\left[\begin{array}{c|c}
\mathbf{R}_{wc}^{\top} & 0 \\
\hline \mathbf{0} & 1
\end{array}\right]\left[\begin{array}{c|c}
1 & -\mathrm{C}_w \\
\hline \mathbf{0} & 1
\end{array}\right] \\
& =\left[\begin{array}{c|c}
\mathbf{R}_{wc}^{\top} & -\mathbf{R}_{wc}^{\top} \mathrm{C}_w \\
\hline \mathbf{0} & 1
\end{array}\right]
\end{aligned}
\]
Thus, we know:
\[
\begin{aligned}
R_{cw} &= R_{wc}^\top\\
t_c &= -R_{wc}^\top \mathrm{C}_w\\
&= -R_{cw} \mathrm{C}_w
\end{aligned}
\]
Orthographic projection
Transform from the orthographic view volume to the canonical view volume (NDC)
\[
M_{\text{orth}} =
\begin{bmatrix}
\frac{2}{r-l} & 0 & 0 & -\frac{r+l}{r-l} \\
0 & \frac{2}{t-b} & 0 & -\frac{t+b}{t-b} \\
0 & 0 & \frac{2}{n-f} & -\frac{n+f}{n-f} \\
0 & 0 & 0 & 1 \\
\end{bmatrix}
\]
Perspective projection
In perspective projection, a 3D point in a truncated pyramid frustum i.e., eye coordinate (left) is mapped to a canonical volume (right).
The perspective matrix can be written:
\[
P =
\begin{bmatrix}
n & 0 & 0 & n \\
0 & n & 0 & 0 \\
0 & 0 & n+f & -fn \\
0 & 0 & 1 & 0 \\
\end{bmatrix}
\]
The first, second, and fourth rows simply implement the perspective equation. The third row ensures that the point on near and far plane remains the same. See below:
\[
P
\begin{bmatrix}
x\\
y\\
z\\
1
\end{bmatrix}
=
\begin{bmatrix}
nx\\
ny\\
(n+f)z-fn\\
z
\end{bmatrix}
=
\begin{bmatrix}
\frac{nx}{z}\\
\frac{ny}{z}\\
(n+f)-\frac{fn}{z}\\
1
\end{bmatrix}
\]
So if we want to get full projection matrix that mappes from truncated pyramid frustum to NDC, then we need:
\[
P^{\prime}
=
M_{\text{orth}}P
=
\begin{bmatrix}
\frac{2n}{r-l} & 0 & \frac{l+r}{l-r} & 0 \\
0 & \frac{2n}{t-b} & \frac{b+t}{b-t} & 0 \\
0 & 0 & \frac{n+f}{n-f} & \frac{2fn}{f-n} \\
0 & 0 & 1 & 0 \\
\end{bmatrix}
\]