Cameras

RGB cameras

CCD cameras (global shutter)

Charge-coupled device (CCD) camera:

CCD only has one output node, thus the signal collected in each pixel is transferred from pixel to pixel, potentially thousands of times, to a single output node where the signal is measured by an off-chip analog-to-digital converter (ADC). This sequential charge-transfer approach to a single output greatly limits frame rates as the number of pixels increases.

CMOS cameras (rolling shutter)

Complementary metal oxide semiconductor (CMOS) camera:

CMOS sensors uses thousands of on-chip signal processors and associated ADCs to digitize each pixel in a given row simultaneously - CMOS has one readout per column, allowing a full row to be digitized at once.

Global v.s. rolling shutter

CCD uses global shutter where the exposure time of each pixel within an image is the same. In contrast, CMOS overlaps the measurement of one row of pixels while the remaining rows are still being exposed. This is known as rolling-shutter mode, as the row currently being digitized “rolls” across the sensor from the top to the bottom. Rolling-shutter mode exposes each row for the same amount of time but at different points in time.

Depth cameras

Depth from focus/defocus

Gaussian lens law

The gaussian lens law can be written as:

\[ \frac{1}{f} = \frac{1}{i} + \frac{1}{o}, \quad f\hspace{3pt} \textbf{is focal length} \]

Now when we move the image sensor (i.e., defocus), it will result in a blur circle with diameter \(b\):

\[ b = D|1-\frac{s}{i}| \]

The farther the sensor from image plane, the larger the blur circle b will be (right). The smaller the aperture size D is, the smaller the blur circle will be (left).

Point spread function (PSF)

The PSF function is usually Gaussian function with \(\sigma \approx b/2\):

\[ h(x, y) = \frac{1}{2\pi\sigma^2} e^{-\frac{(x^2+y^2)}{2\sigma^2}} \]

Thus, the defocus is linear and shift invariant, and therefore can be expressed as a convolution. (Note that we assume the surface is flat).

Depth from focus

For each small image patch, we could find when it is best focused. and obtain the depth using gaussian lens law:

\[ o = \frac{sf}{s-f} \]

In order to find the best focused patch, we can measure the amount of high frequency content within each small patch (since defocus is low pass filter):

\[ \nabla_M^2 f = |\frac{\partial^2 f}{\partial x^2}| + |\frac{\partial^2 f}{\partial y^2}| \]

However, depth can only have N values where N is the number of sensors. We can use Gaussian interpolation to fix it.

Depth from defocus

Stereo

Structured light

TOF

iTOF

Lidar & Radar

dTOF

Reference

[1] Rolling shutter v.s. global shutter

[2] CCD & CMOS