Note 17: Light Field Cameras

Three Focus-Related Problems in 2D Photographs

First, we need to focus before taking a shot.

Third, lens designs are complex due to optical aberrations.

Second, there's a trade-off between the depth of field and motion blur.

Because light field cameras take three-dimensional photos, we can change the depth of field and the viewing direction & angle. We can also change the viewpoint (e.g. the center of the pinhole in the pinhole camera model) and simulate different lens settings (tilt/rotation, aperture, etc).

A lot of these capabilities are more common to see these days in computational photography applications, where we use computer vision algorithms to take photos from our smartphones, estimate the depth in the scene, and then use software to simulate these depth effects. The light field camera first introduced industry to these ideas, with a hardware-first approach, rather than a software-based approach.

2D Photographs vs. 4D Light Fields

A photograph, as we've looked at, is really giving us the irradiance at every pixel on the sensor plane. Every pixel on the sensor is adding up all the light that arrives there, over many different angles. The light field is attempting to calculate the radiance, which is four-dimensional.

Here's a light field flowing into the camera. We see the lens, focal plane, and sensor plane. We can parametrize a ray that flows in by its position x (where it flows in from the focal plane) and the position in which it goes through the lens, u. On the right, we have a ray space diagram, on the left we have a ray trace diagram. Every ray will have to go through a unique x position and unique u position. The full set of points arriving on the sensor plane is the full cartesian set of every x position and every u position.

What does a 2D photograph record? If we consider the light arriving at one pixel, all light comes from one point in the world (fixed position, varying position). The number of pixels is equal to the number of vertical lines.

What does a 2D photograph record? If we consider the light arriving at one pixel, all light comes from one point in the world (fixed x position, varying u position). The number of pixels is equal to the number of vertical lines.

A Plenoptic camera samples the light field. Each beam represents one pixel on the resulting image.

A Plenoptic camera samples the light field. Each beam represents one pixel on the resulting image.

The microlenses go right on the back of the cover glass, between the glass and the sensor.

Here's the raw data that comes off of a light field sensor. There's a little bit of a textured pattern, so let's see what that looks like.

If we zoom in, at the pixel level, we see microstructure changes in the pixel patterns. We see an array of small disks, where each disk is packed together in a honeycomb pattern. We're getting one disk image under every microlens, where each disk image doesn't necessarily have to be all the same color.

Let's say we want to take one pixel, under one disk image, and figure out which ray in the world gave us that color (transporting it to the x-y-u-v space). The microlens location inside the image field of view gives us the (x, y) coordinate - the position the ray must have passed through before it hit the light field sensor. The pixel location in the microlens image gives us the (u, v) coordinate. Every position on the sensor surface is effectively a tiny photo (14x14, for example) - a photo of inside of the camera body.

How Does Computational Refocusing Work?

Let's remind ourselves how physical refocusing works.

As we move the camera sensor away from the lens, we're going to focus on objects closer to the camera; as we move the camera sensor towards the lens, we're going to focus on objects farther from the camera. The depth of field is shown as a trapezoidal range (depth of field gets larger as we get further in the world).

To focus on a different depth, what do we do? We imagine, in the software, that we had a different lens position (e.g. the distance was larger). Then, we compute the ray projection for each ray (radiance values) for where it would have gone, given the new position of the sensor.