Microsoft Research First-person Hyperlapse Videos

Microsoft Researcher Johannes Kopf ascends Mount Shuksan in the North Cascades with a GoPro.

Standard video stabilization crops out the pixels on the periphery to create consistent frame-to-frame smoothness. But when applied to greatly sped up video, it fails to compensate for the wildly shaking motion.

Hyperlapse reconstructs how a camera moves throughout a video, as well as its distance and angle in relation to what’s happening in each frame. Then it plots out a smoother camera path and stitches pixels from multiple video frames to rebuild the scene and expand the field of view.

As you might imagine, working with raw video involves crunching a fair amount of data, which required a compute cluster to crunch data for several hours to complete for each video. Microsoft developed a series of new algorithms that lead to a more efficient process without compromising the image quality. The result is that Hyperlapse can now render a high-speed video in a fraction of the time, using a single PC.

The Interactive Visual Media Group focuses on the areas of computer vision, image processing, and statistical signal processing, specifically as they relate to things like enhancing images and video, 3D reconstruction, image-based modeling and rendering, and highly-accurate correspondence algorithms that are commonly used to “stitch” together images.

From Microsoft Research.

We present a method for converting first-person videos, for example, captured with a helmet camera during activities such as rock climbing or bicycling, into hyper-lapse videos, i.e., time-lapse videos with a smoothly moving camera. At high speed-up rates, simple frame sub-sampling coupled with existing video stabilization methods does not work, because the erratic camera shake present in first-person videos is amplified by the speed-up.

Scene Reconstruction
Our algorithm first reconstructs the 3D input camera path as well as dense, per-frame proxy geometries. We then optimize a novel camera path for the output video (shown in red) that is smooth and passes near the input cameras while ensuring that the virtual camera looks in directions that can be rendered well from the input.
Next, we compute geometric proxies for each input frame. These allow us to render the frames from the novel viewpoints on the optimized path.

Proxy Geometry

Stitched & Blended
Finally, we generate the novel smoothed, time-lapse video by rendering, stitching, and blending appropriately selected source frames for each output frame. We present a number of results for challenging videos that cannot be processed using traditional techniques.

Please comment.