Learning-based SLAM
Droid-SLAM
Key idea: Using Raft to produce optical flow for updating the depth maps and camera poses.
\[
\rvp_{ij}^\star = \rvr_{ij} + \rvp_{ij}
\]
Given the intial relative pose \(\rmG_{i, j}^{(k)}\) at iteration \(k\) and depth \(\rvd_i^{(k)}\), we can obtain its corresponding point in the other frame by:
\[
\rvp_{ij}^{(k)} = \rmG_{i, j}^{(k)} \circ \Pi_c^{-1}(\rvp_i^{(k)}, \rvd_i^{(k)})
\]
We can use this corresponding point to look-up the cost volume \(C_{ij}\in\mathbb{R}^{H\times W\times H\times W}\) to get a cost volume slice \(\mathbb{R}^{H\times W\times (r+1)^2}\). We can also get an estimated optical flow \(\rvr_{ij}^{(k)} = \rvp_{ij}^{(k)} - \rvp_i^{(k)}\). Those two will be used as the input too ConvGRU to obtain the revised optical flow \(\rvr_{ij}^{(k+1)}\). Then, we can use this to update the point:
\[
\rvp_{i,j}^{\star, (k+1)} = \rvr_{ij}^{(k+1)} + \rvp_{ij}^{(k)}
\]
Then, it will be dense bundle adjustment that uses this updated correponding point to form the residual and update the camera poses and depth maps.
Dense Bundle Adjustment
Here we want an updated pose \(\rmG^\prime\) and depth map \(\rvd^\prime\) that minimizes the following energy:
\[
\rmE(\rmG^\prime, \rvd^\prime) = \sum_{(i,j)\in \epsilon} \| \rvp^\star_{i, j} - \Pi_c (\rmG^\prime \circ \Pi_c^{-1}(\rvp_i, \rvd_i^\prime))\|^2_{\Sigma_{ij}} \quad \Sigma_{ij} = \mathrm{diag}\rvw_{ij}
\]
Note here we can omit the \(k\) for simplicity. \(\|\cdot\|^2_{\Sigma}\) is Mahalanobis distance, and \(\rvw_{ij}\) is confidence weight output by ConvGRU. We can use Gauss-Newton to solve this optimization problem:
\[
\rmJ^T\rmJ \Delta \rvx = -\rmJ^T\rvr
\]
where \(\rmJ\) is the Jacobian of the residual \(\rvr\) with respect to the state \(\rvx = [\xi, \rvd]\). Here \(\xi\) is lie algebra representation of the pose \(\rmG\). \(\rvr\) is the residual:
\[
\rvr = \rvp^\star - \Pi_c (\rmG \circ \Pi_c^{-1}(\rvp, \rvd)) = \rvp^\star - \rvp^\prime
\]
We could solve \(\rmJ^T\rmJ \Delta \rvx = -\rmJ^T\rvr\) with Schur complement:
\[
\begin{bmatrix}
\rmB & \rmE \\
\rmE^T & \rmC
\end{bmatrix}
\begin{bmatrix}
\Delta \xi \\
\Delta \rvd
\end{bmatrix}
=
\begin{bmatrix}
\rvv \\
\rvw
\end{bmatrix}
\]
where \(\rmB = \rmJ_\xi^T\rmJ_\xi\), \(\rmE = \rmJ_\xi^T\rmJ_\rvd\), \(\rmC = \rmJ_\rvd^T\rmJ_\rvd\), \(\rvv = -\rmJ_\xi^T\rvr\), \(\rvw = -\rmJ_\rvd^T\rvr\). We can solve \(\Delta \xi\) first, then \(\Delta \rvd\):
\[
\begin{aligned}
\Delta \xi &= [\rmB - \rmE\rmC^{-1}\rmE^T]^{-1}(\rvv - \rmE\rmC^{-1}\rvw)\\
\Delta \rvd &= \rmC^{-1}(\rvw - \rmE^T\Delta \xi)
\end{aligned}
\]
Note that here \(C\) is a diagonal matrix that is easy to invert (In the original paper, they mention using a dampeing factor \(\lambda\) here). Once we solve \(\Delta \xi\) and \(\Delta \rvd\), we can update the pose and depth map:
\[
\begin{aligned}
\rmG^\prime &= \exp(\Delta \xi) \circ \rmG\\
\rvd^\prime &= \rvd + \Delta \rvd
\end{aligned}
\]
Jacobian derivation
Here we represent 3D points using homogeneous coordinates \(\rmX = (X, Y, Z, W)^\top\). Now we firstly derive the projection function \(\Pi_c\) and its inverse:
\[
\begin{aligned}
\Pi_c(\rmX) &= \begin{bmatrix}
f_x \frac{X}{Z} + c_x \\
f_y \frac{Y}{Z} + c_y
\end{bmatrix} &
\Pi_c^{-1}(\rvp, d) &= \begin{bmatrix}
\frac{p_x - c_x}{f_x} d \\
\frac{p_y - c_y}{f_y} d \\
d
\end{bmatrix}
\end{aligned}
\]
The jacobian of the projection function and its inverse are:
\[
\begin{aligned}
\frac{\partial \Pi_c(\rmX)}{\partial \rmX} &= \begin{bmatrix}
f_x \frac{1}{Z} & 0 & -f_x \frac{X}{Z^2} & 0 \\
0 & f_y \frac{1}{Z} & -f_y \frac{Y}{Z^2} & 0
\end{bmatrix} &
\frac{\partial \Pi_c^{-1}(\rvp, d)}{\partial d} &= \begin{bmatrix}
0 \\
0 \\
0 \\
1
\end{bmatrix}
\end{aligned}
\]
Now we want to derive Jacobian of the 3D point transformation. By using the local parameterization \(e^{\boldsymbol{\xi}}\circ \rmG\), we write the transformation as:
\[
\rmX^\prime = \exp(\boldsymbol{\xi}_j) \cdot \rmG_j \cdot \rmG_i^{-1} \cdot \exp(-\boldsymbol{\xi}_i) \cdot \rmX
\]
Here since \(\rmX^\prime\) is in frame \(j\), we need to use adjoint operator to map \(\xi_i\) to frame \(j\):
\[
\rmX^\prime = \exp(\boldsymbol{\xi}_j) \cdot \exp(-\text{Adj}_{\rmG_j \rmG_i^{-1}} \boldsymbol{\xi}_i) \cdot \rmG_j \cdot \rmG_i^{-1} \cdot \rmX
\]
and then solve the Jacobian of the 3D transformation:
\[
\begin{aligned}
\frac{\partial \rmX^\prime}{\partial \boldsymbol{\xi}_j} =& \begin{bmatrix}
W^\prime & 0 & 0 & 0 & Z^\prime & -Y^\prime \\
0 & W^\prime & 0 & -Z^\prime & 0 & X^\prime \\
0 & 0 & W^\prime & Y^\prime & -X^\prime & 0 \\
0 & 0 & 0 & 0 & 0 & 0
\end{bmatrix} \\
\frac{\partial \rmX^\prime}{\partial \boldsymbol{\xi}_i} = -&\begin{bmatrix}
W^\prime & 0 & 0 & 0 & Z^\prime & -Y^\prime \\
0 & W^\prime & 0 & -Z^\prime & 0 & X^\prime \\
0 & 0 & W^\prime & Y^\prime & -X^\prime & 0 \\
0 & 0 & 0 & 0 & 0 & 0
\end{bmatrix} \cdot \text{Adj}_{\rmG_j \rmG_i^{-1}}
\end{aligned}
\]
Here the adjoint operator is defined as:
\[
\text{Adj}_{\rmG} = \begin{bmatrix}
\rmR & -\rvt^\wedge \rmR \\
0 & \rmR
\end{bmatrix}
\]
where \(\rmR\) is the rotation matrix of \(\rmG\) and \(\rvt\) is the translation vector of \(\rmG\). \(\rvt^\wedge\) is the skew-symmetric matrix of \(\rvt\).
OK, now by using the chain rule, we can compute the Jacobians of \(\rvp^\prime\) with respect to the pose variables and depth:
\[
\begin{aligned}
\frac{\partial \rvp^\prime}{\partial \boldsymbol{\xi}_j} &= \frac{\partial \Pi_c(\rmX^\prime)}{\partial \rmX^\prime} \frac{\partial \rmX^\prime}{\partial \boldsymbol{\xi}_j} \in \mathbb{R}^{2\times 6} &
\frac{\partial \rvp^\prime}{\partial \boldsymbol{\xi}_i} &= \frac{\partial \Pi_c(\rmX^\prime)}{\partial \rmX^\prime} \frac{\partial \rmX^\prime}{\partial \boldsymbol{\xi}_i} \in \mathbb{R}^{2\times 6} \\
\end{aligned}
\]
and
\[
\begin{aligned}
\frac{\partial \rvp^\prime}{\partial d} &= \frac{\partial \Pi_c(\rmX^\prime)}{\partial \rmX^\prime} \frac{\partial \rmX^\prime}{\partial \Pi_c^{-1}(\rvp, d)} \frac{\partial \Pi_c^{-1}(\rvp, d)}{\partial d}\\
&= \frac{\partial \Pi_c(\rmX^\prime)}{\partial \rmX^\prime} (\rmG_j \cdot \rmG_i^{-1}) \begin{pmatrix}
0 \\ 0 \\ 0 \\ 1
\end{pmatrix} \in \mathbb{R}^{2\times 1}\\
&=\frac{\partial \Pi_c(\rmX^\prime)}{\partial \rmX^\prime} \begin{pmatrix}
t_x \\ t_y \\ t_z \\ 1
\end{pmatrix} \in \mathbb{R}^{2\times 1}
\end{aligned}
\]
where \(\rvt = (t_x, t_y, t_z)\) is the translation vector of \(\rmG_j \cdot \rmG_i^{-1}\).
Now if we re-consider the origianl residual \(\rvr = \rvp^\star - \rvp^\prime\), we can write the Jacobian \(J_\xi\) and \(J_\rvd\) as:
\[
\begin{aligned}
\rmJ_\xi &= \begin{bmatrix}
-\frac{\partial \rvp^\prime}{\partial \boldsymbol{\xi}_j} & \frac{\partial \rvp^\prime}{\partial \boldsymbol{\xi}_i}
\end{bmatrix} \in \mathbb{R}^{2\times 12} \\
\rmJ_\rvd &= -\frac{\partial \rvp^\prime}{\partial d} \in \mathbb{R}^{2\times 1}
\end{aligned}
\]
OK, now everything is done!