Traditional Methods

[CVPR'15] DynamicFusion

Dense Non-rgid Warp Field

For each point \(v_c\in \mathbf{S}\) in canonical space, \(\mathbf{T}_{lc}=\mathcal{W}(v_c)\) transforms that point from canonical space into the current frame. The warp function \(\mathcal{W}: \mathbf{S} \rightarrow \mathbf{SE3}\) is defined using dual quaternion blending (DQB):

\[ \mathcal{W}(v_c) = SE3(\mathbf{DQB}(v_c)) \]

Here \(\mathbf{DQB}\) is

\[ \mathbf{DQB}(v_c) = \frac{\sum_{k\in N(v_c)}\mathbf{w}_k(v_c)\hat{\mathbf{q}}_{kc}}{\|\sum_{k\in N(v_c)}\mathbf{w}_k(v_c)\hat{\mathbf{q}}_{kc}\|} \]

\(N(x)\) are k-nearest transformation node. The state of the warp-field \(\mathcal{W}_t\) at time t is defined by the values of a set of \(n\) deformation nodes:

\[ \mathcal{N}_{\mathrm{warp}}^t = \{\mathbf{dg}_v, \mathbf{dg}_{se3}, \mathbf{dg}_w,\} \]

Each of the \(1\dots n\) nodes have a position \(\mathbf{dg}_v^i\) in canonical space, transformation \(\mathbf{T}_{ic}=\mathbf{dg}_{se3}^i\) and weight \(\mathbf{dg}_{w}^i\) that controls:

\[ \mathbf{w}_i(v_c) = \exp(-\|\mathbf{dg}_v^i - v_c\|/(2\mathbf{dg}_{w}^i)^2) \]

We can factor our the rigid body transformation \(\mathbf{T}_{lw}\) and compose this into the volumetric warp function:

\[ \mathcal{W}(v_c) = \mathbf{T}_{lw} SE3(\mathbf{DQB}(v_c)) \]

Dense Non-rgid Surface Fusion

A TSDF volume

\[ \mathcal{V}: \mathcal{V}(x) \rightarrow [v(x)\in\mathbb{R}, w(x)\in\mathbb{R}] \]

is used for canonical space. Here \(v(x)\) is the weight average of projective TSDF from all previous observation and \(w(x)\) is the sum of all associated weights.

Given the live depth map \(D_t\), each voxel center \(x_c\in \mathbf{S}\) is warped into the live frame:

\[ (x_t^\top, 1)^\top = \mathcal{W}_t(x_c)(x_c^\top, 1)^\top \]

Then, we could update the projective TSDF of \(x_c\) using

\[ \mathbf{psdf}(x_c) = \left[\mathbf{K}^{-1}D_t(u_c)[u_c^\top, 1]^\top\right]_z - [x_t]_z \]

where \(u_c = \pi(\mathbf{K}x_t)\) is the pixel into which the voxel center projects. The distance is computed along optical (z) axis, denoted as \([\cdot]_z\)

For each voxel, the TSDF volume is updated via:

\[ \mathcal{V}(x)_t = \begin{cases} [v(x)^\prime, w(x))^\prime]^\top & \text { if } \mathbf{psdf}(\mathbf{dc}(x)) > -\tau\\ \mathcal{V}(x)_{t-1}& \text {otherwise}\\ \end{cases}, \]

where \(\mathbf{dc}(\cdot)\) transforms a discrete voxel point into the continous TSDF domain. \(\tau>0\) is the truncation distance,

\[ \begin{aligned} v^\prime(x) &= \frac{v(x)_{t-1}w(x)_{t-1}+\min(\rho, \tau)w(x)}{w(x)_{t-1}+w(x)}\\ \rho &= \mathbf{psdf}(\mathbf{dc}(x))\\ w^\prime(x) &= \min(w(x)_{t-1}+w(x), w_{\max}) \end{aligned} \]

Here the weight function \(w(x)\) account for uncertainty associated with the warp function at \(x_c\):

\[ w(x) \propto \frac{1}{k} \sum_{i\in N(x_c)}\|\mathbf{dg}_w^i-x_c \|_2 \]

Optimization (Estimating the Warp-field State \(\mathcal{W}_t\))

Dense Non-Rigid ICP Data-term

The current zero level set of canonical TSDF volume \(\mathcal{V}\) is extracted via marching cubes, stored as a polygon mesh with point-normal pairs in the canonical frame: \(\hat{\mathcal{V}}_c=\{V_c, N_c\}\).

Warp the mesh into live frame: \(\hat{\mathcal{V}}_w = \mathcal{W}_t \hat{\mathcal{V}}_c\).
Render \(\hat{\mathcal{V}}_w\) into the current live frame. Each pixel \(u\) will be corresponding to one point \(v_c\) in canonical space, denoted as \(\mathbf{v}(u)\) (Visibility).
We can then get the surface point predictions that are warped from canonical space to the live frame:

\[ \begin{aligned} \hat{v}_u &= \tilde{T}^u \mathbf{v}(u) = \mathcal{W}(\mathbf{v}(u)) \mathbf{v}(u)\\ \hat{n}_u &= \tilde{T}^u \mathbf{n}(u) = \mathcal{W}(\mathbf{v}(u)) \mathbf{n}(u) \end{aligned} \]

Surface point in live frame is represented as \(\mathbf{vl}\) that is obtained via:

\[ [\mathbf{vl}(u)^\top, 1]^\top = \mathbf{K}^{-1} D_t(u) [u^\top, 1]^\top \]

Now we get gt and prediction, the data term can be written as:

\[ \mathbf{Data} (\mathcal{W}, \mathcal{V}, D_t) = \sum_{u\in\Omega} \psi_{\mathbf{data}} (\hat{n}_u^\top(\hat{v}_u - \mathbf{vl}_\tilde{u})) \]

Here \(\tilde{u} = \pi(K)\hat{v}_u\), \(\psi_{\mathbf{data}}\) is a robust Tukey penalty:

\[ \psi_{\mathbf{data}}(r_i) = \begin{cases} r_i (1 - (\frac{r_i}{c})^2)^2 &\text{if} |r_i| \leq c\\ 0 &\text{if} |r_i| > c\\ \end{cases} \]

Warp-field Regularization

We need to estimate the deformation not only of currently visible surface, but over all space within \(S\). The regularization term sums over all pair-wise connected nodes:

\[ \mathbf{Reg}(\mathcal{W}, \mathcal{E}) = \sum_{i=0}^n \sum_{i\in \mathcal{E}(i)} \alpha_{ij} \psi_{\mathbf{reg}} (T_{ic} \mathbf{dg}_v^i-T_{jc} \mathbf{dg}_v^j) \]

where \(\alpha_{ij} = \max(\mathbf{dg}_w^i, \mathbf{dg}_w^j)\), \(\mathcal{E}\) denotes the regularization graph topology (Connected edge). In DynamicFusion, \(\mathcal{E}\) is defined as k-nearest neighbors. Given the current set of deformation nodes \(\mathcal{N}_{\mathbf{warp}}\), we canconstruct a hierahcy of regularization nodes \(\mathcal{N}_{\mathbf{reg}}=\{\mathbf{r}_v, \mathbf{r}_{se3} \mathbf{r}_w,\}\). The regularization graph topology is simply formed by adding edges from each node of hierarchy to its k-nearset node in the next coarser level.

Extending the Warp-field ####

Interting new deformation nodes into \(\mathcal{N}_{\mathrm{warp}}^t\)

After performing non-rigid TSDF-Fusion, we can extract surface estimate as the polygon mesh \(\hat{\mathcal{V}}_c\). For each vertex \(v_c \in \hat{\mathcal{V}}_c\), we can see to what extent the current warp function cover the extracted geomtry. An unsupported vertex is detect when

\[ \min_{k\in N(x_c)} \frac{\|\mathbf{dg}_v^k-v_c\|}{\mathbf{dg}_w^k} \geq 1 \]

The set of unsupported vertices are subsampled to a set of new node positions \(\tilde{\mathbf{dg}}_v\) thare are at least \(\epsilon\) distance apart. Thus, \(\epsilon\) defines the resolution of the motion field.

The transformation of each new node will be initialized though \(\mathbf{DQB}\) with the current warp function \(\tilde{\mathbf{dg}}_{se3} = \mathcal{W}_t(\tilde{\mathbf{dg}}_v)\).

We could update the current set of deformation nodes to correspond to the current time:

\[ \mathcal{N}_{\mathrm{warp}}^t = \mathcal{N}_{\mathrm{warp}}^{t-1}\cup \{\tilde{\mathbf{dg}}_v, \tilde{\mathbf{dg}}_{se3}, \tilde{\mathbf{dg}}_w,\} \]

Updating the Regularization Graph \(\mathcal{E}\)

Given the newly updated set of deformation nodes, we construct an \(L\geq 1\) level regularization graph node hierarchy. \(L = 0\) will simply be \(\mathcal{N}_{\mathrm{warp}}\). The next level (coarser) nodes will be constructed via subsampling on warp field nodes \(\mathbf{dg}_{v}\) with radius of \(\epsilon\beta^l\), where \(\beta > 1\), and again, initialize the tranformation though \(\mathbf{DQB}\) with the current warp function.

Edges for regularization will be added for each node in finer level to its k-nearest level in the next (coarser) level.

t