Animation Disney Filmmaking Technology

New Software Can Actually Edit Actors’ Facial Expressions

FaceDirector software can seamlessly blend several takes to create nuanced blends of emotions, potentially cutting down on the number of takes necessary in filming.

A new software, from Disney Research in conjunction with the University of Surrey, may help cut down on the number of takes necessary, thereby saving time and money. FaceDirector blends images from several takes, making it possible to edit precise emotions onto actors’ faces.

Shooting a scene in a movie can necessitate dozens of takes, sometimes more. In Gone Girl, director David Fincher was said to average 50 takes per scene. For The Social Network actors Rooney Mara and Jesse Eisenberg acted the opening scene 99 times (directed by Fincher again; apparently he’s notorious for this). Stanley Kubrick’s The Shining involved 127 takes of the infamous scene where Wendy backs up the stairs swinging a baseball bat at Jack, widely considered the most takes per scene of any film in history.

“Producing a film can be very expensive, so the goal of this project was to try to make the process more efficient,” says Derek Bradley, a computer scientist at Disney Research in Zurich who helped develop the software.

Disney Research is an international group of research labs focused on the kinds of innovation that might be useful to Disney, with locations in Los Angeles, Pittsburgh, Boston and Zurich. Recent projects include a wall-climbing robot, an “augmented reality coloring book” where kids can color an image that becomes a moving 3D character on an app, and a vest for children that provides sensations like vibrations or the feeling of raindrops to correspond with storybook scenes. The team behind FaceDirector worked on the project for about a year, before presenting their research at the International Conference on Computer Vision in Santiago, Chile this past December.

Figuring out how to synchronize different takes was the project’s main goal and its biggest challenge. Actors might have their heads cocked at different angles from take to take, speak in different tones or pause at different times. To solve this, the team created a program that analyzes facial expressions and audio cues. Facial expressions are tracked by mapping facial landmarks, like the corners of the eyes and mouth. The program then determines which frames can be fit into each other, like puzzle pieces. Each puzzle piece has multiple mates, so a director or editor can then decide the best combination to create the desired facial expression.

To create material with which to experiment, the team brought in a group of students from Zurich University of the Arts. The students acted several takes of a made-up dialogue, each time doing different facial expressions—happy, angry, excited and so on. The team was then able to use the software to create any number of combinations of facial expressions that conveyed more nuanced emotions—sad and a bit angry, excited but fearful, and so on. They were able to blend several takes—say, a frightened and a neutral—to create rising and falling emotions.

The FaceDirector team isn’t sure how or when the software might become commercially available. The product still works best when used with scenes filmed while sitting in front of a static background. Moving actors and moving outdoor scenery (think swaying trees, passing cars) present more of a challenge for synchronization.

By Emily Matchar

From Disney Research

We present a method to continuously blend between multiple facial performances of an actor, which can contain different facial expressions or emotional states. As an example, given sad and angry video takes of a scene, our method empowers a movie director to specify arbitrary weighted combinations and smooth transitions between the two takes in post-production. Our contributions include (1) a robust nonlinear audio-visual synchronization technique that exploits complementary properties of audio and visual cues to automatically determine robust, dense spatio-temporal correspondences between takes, and (2) a seamless facial blending approach that provides the director full control to interpolate timing, facial expression, and local appearance, in order to generate novel performances after filming. In contrast to most previous works, our approach operates entirely in image space, avoiding the need of 3D facial reconstruction. We demonstrate that our method can synthesize visually believable performances with applications in emotion transition, performance correction, and timing control.


Download File “FaceDirector- Continuous Control of Facial Performance in Video-Paper”
[PDF, 13.22 MB]


Copyright Notice

The documents contained in these directories are included by the contributing authors as a means to ensure timely dissemination of scholarly and technical work on a non-commercial basis. Copyright and all rights therein are maintained by the authors or by other copyright holders, notwithstanding that they have offered their works here electronically. It is understood that all persons copying this information will adhere to the terms and constraints invoked by each author’s copyright. These works may not be reposted without the explicit permission of the copyright holder.



‘Fairy Lights’ Touchable Holograms using lasers

This is an amazing technology called ‘Fairy Lights’ that creates touchable holograms using lasers. Notice that the hologram is interactive, it can change state during and after the touch. No glasses or goggles are required. The possibilities of this for film, theater, video games and theme parks are nearly endless.

From IEEE spectrum.

We’ve seen a few holographic technologies that have come close; they rely on optical tricks of one sort or another to make it seem like you’re seeing an image hovering in front of you.

There’s nothing wrong with such optical tricks (if you can get them to work), but the fantasy is to have true midair pixels that present no concerns about things like viewing angles. This technology does exist, and has for a while, in the form of laser-induced plasma displays that ionize air molecules to create glowing points of light. If lasers and plasma sound like a dangerous way to make a display, that’s because it is. But Japanese researchers have upped the speed of their lasers to create a laser plasma display that’s touchably safe.

Researchers from the University of Tsukuba, Utsunomiya University, Nagoya Institute of Technology, and the University of Tokyo have developed a “Fairy Lights” display system that uses femtosecond lasers instead. The result is a plasma display that’s safe to touch.

Each one of those dots (voxels) is being generated by a laser that’s pulsing in just a few tens of femtoseconds. A femotosecond is one millionth of one billionth of one second.  The researchers found that a pulse duration that minuscule doesn’t result in any appreciable skin damage unless the laser is firing at that same spot at one shot per millisecond for a duration of 2,000 milliseconds. The Fairy Lights display keeps the exposure time (shots per millisecond) well under that threshhold:

Our system has the unique characteristic that the plasma is touchable. It was found that the contact between plasma and a finger causes a brighter light. This effect can be used as a cue of the contact. One possible control is touch interaction in which floating images change when touched by a user. The other is damage reduction. For safety, the plasma voxels are shut off within a single frame (17 ms = 1/60 s) when users touch the voxels. This is sufficiently less than the harmful exposure time (2,000 ms).

Even cooler, you can apparently feel the plasma as you touch it:

Shock waves are generated by plasma when a user touches the plasma voxels. The user feels an impulse on the finger as if the light has physical substance. The detailed investigation of the characteristics of this plasma-generated haptic sensation with sophisticated spatiotemporal control is beyond the scope of this paper.

As you can see from the pics and video, these displays are tiny: the workspace encompasses just eight cubic millimeters. The spatiotemporal resolution is relatively high, though, at up to 200,000 voxels per second, and the image framerate depends on how many voxels your image needs.

To become useful as the consumer product of our dreams, the display is going to need to scale up. The researchers suggest that it’s certainly possible to do this with different optical devices. We’re holding out for something that’s small enough to fit into a phone or wristwatch, and it’s not that crazy to look at this project and believe that such a gadget might not be so far away.

For more see Digital Nature Group

Cinematography Filmmaking Technology

Lytro Immerge for VR

From FXGuide.

Most of us know Lytro from its revolutionary stills camera which allowed for an image to be adjusted in post as never before – it allowed focus to be changed. It did this by capturing a Lightfield and it seemed to offer a glimpse into the future of cameras built on a cross of new technology and the exciting field of computational photography.

Why then did the camera fail? Heck, we sold ours about 8 months after buying it.


Lightfield technology did allow for the image to be adjusted in terms of depth or focus in post, but many soon found that this was just delaying a decision from on location. If you wanted to send someone a Lytro image you almost always just picked the focus and sent a flat .jpeg. The only alternative was to send them a file which required a special viewer. The problem with the later was simple, someone else ‘finished’ taking your photo for you – you had no control. It was delaying an on set focus decision to the point that you never decided at all! The problem with the former, ie. rendering a jpeg, was that the actual image was not better than one could get from a good Canon or Nikon, actually it was a bit worse as the optics for Lightfield could not outgun your trusty Canon 5D.

In summary: the problem was we did not have a reason to not want to lock down the image. Lightfield was a solution looking for a problem. We needed somewhere it made sense to not ‘lock down’ the image and keep it ‘alive’ for the end user.

Enter VR – it is the solution that Lightfield solves.

Currently much of the VR that is cutting edge is computer generated – the rigs that incorporate head movement can understand you are moving your head to the side and it renders the right pair of images for your eyes. While a live action capture will allow you to spin on the spot and see in all directions, a live action capture did not (until now) allow you to lean to one side to miss a slow motion bullet traveling right at you the way a CG scene could.

Live action was stereo and 360 but there was no parallax. If you wanted to see around a thing…you couldn’t. There are some key exceptions such as 8i which have managed to capture video from multiple cameras and then allow a live action playback with head tracking, parallax and the full six degrees of motion, thus becoming dramatically more immersive. However, 8i is a specialist rig which is effectively a concave wall or bank of cameras around someone, a few meters back from them. The new Immerge from Lytro is different – it is a ball of cameras on a stick.

Lytro Immerge seems to be the world’s first commercial professional Lightfield solution for cinematic VR, which will capture ‘video’ from many points of view at once and thereby provide a more lifelike presence for live action VR through six degrees of freedom. It is built from the ground up as a full workflow, camera, storage and even NUKE compositing to color grading pipeline. This allows the blending of live action and computer graphics (CG) using Lightfield data, although details on how you will render your CGI to match the Lightfield captured data is still unclear.

With this configurable capture and playback system, any of the appropriate display head rigs should support the new storytelling approach, since at the headgear end, there is no new format, all the heavy lifting is done earlier in the pipeline.

How does it work?

The only solution dynamic six degrees of freedom is to render the live action and CGI as needed, in response to the head units render requests. In effect you have a render volume. Imagine a meter square box within which you can move your head freely. Once the data is captured the system can solve for any stereo pair anywhere in the 3D volume. Conceptually, this is not that different from what happens now for live action stereo. Most VR rigs capture images from a set of camera and then resolve a ‘virtual’ stereo pair from the 360 overlapping imagery. It is hard to do but if you think of the level 360 panorama view as a strip that is like a 360 degree mini-cinema screen that sits around you like a level ribbon of continuous imagery, then you just need to find the right places to interpolate between camera view.


Of course, if the cameras had captured the world as a nodal pan there would be no stereo to see. But no camera rig does this – given the physical size of cameras all sitting in a circle… a camera to the left of another sees a slightly different view and that offset, that difference in parallax, is your stereo. So if solving off the horizontal offset around a ring is the secret to stereo VR live action, then the Lytro Immerge does this not just around the outside ring but anywhere in the cube volume. Instead of interpolating between camera views it builds up a vast set of views from its custom lenses and then virtualizes the correct view from anywhere.

Actually it even goes further. You can move outside the ‘perfect’ volume, but at this point it will start to not have previously obstructed scene information. So if you look at some trees, and then move your head inside the volume, you can see perfectly around one to another. But if you move too far there will be some part of the back forest that was never captured and hence can’t be used or provided in the real time experience, in a sense you have an elegant fall off in fidelity as you ‘brake the viewing cube’.

VR was already a lot of data, but once you move to Lightfield capture it is vastly more, which is why Lytro has developed a special server, which will feed into editing pipelines and tools such as NUKE and which can record and hold one hour of footage. The server has a touch-screen interface, designed to make professional cinematographers feel at home. PCmag reports that it allows for control over camera functions via a panel interface, and “even though the underlying capture technology differs from a cinema camera, the controls—ISO, shutter angle, focal length, and the like—remain the same.”

Doesn’t this seem like a lot of work just for head tracking?

The best way to explain this is to say, it must have seemed like a lot of work to make B/W films become color…but it added so much there was no going back. You could see someone in black and white and read a good performance, but in color there was a richer experience, closer to the real world we inhabit.

With six degrees of freedom, the world comes alive. Having seen prototype and experimental Lightfield VR experiences all I can say is that it does make a huge difference. A good example comes from an experimental piece done by Otoy. Working with USC-ICT and Dr Paul Debevec they made a rig that effectively scanned a room. Instead of rows and rows of cameras in a circle and stacked on top of one another virtually, the team created a vast data set for Lightfield generation by having the one camera swung around 360 at one height – then lifted up and swung around again, and again all with a robotic arm. This sweeping meant a series of circular camera data sets that in total added up to a ball of data.


Unlike the new Lytro approach, this works only on a static scene, a huge limitation compared to the Immerge, but still a valid data set. This ball of data is however conceptually similar to the ball of data that is at the core of the Lytro limitation, but unlike the Lytro this was an experimental piece and as such was completed earlier this year. What is significant is just how different this experience is over a normal stereo VR experience. For example, even though the room is static, as you move your head the specular highlights change and you can much more accurately sense the nature of the materials being used. In a stereo rig, I was no better able to tell you what a bench top was made of than looking at a good quality still, but in a Lightfield you adjust your head, see the subtle spec shift and break up and you are immediately informed as to what something might feel like. Again spec highlights seem trivial but it is one of the key things we use to read faces. And this brings us to the core of why the Lytro Immerge is so vastly important, people.

VR can be boring. It may be unpopular to say so but it is the truth. For all the whizz bang uber tech, it can lack story telling. Has anyone ever sent you a killer timelapse show reel? As a friend of mine once confessed, no matter how technically impressive, no matter how much you know it would have been really hard to make, after a short while you fast forward through the timelapse to the end of the video. VR is just like this. You want to sit still and watch it but it is not possible to hang in there for too long as it just gets dull – after you get the set up…amazing environment, wow…look around…wow, ok I am done now.

What would make the difference is story, and what we need for story is actors – acting. There is nothing stopping someone from filming VR now, and most VR is live action, but you can’t film actors talking and fighting, punching and laughing – and move your head to see more of what is happening – you can only look around, and then more often than not, look around in mono.

The new Lytro Immerge.
The new Lytro Immerge.

The new Lytro Immerge and the cameras that will follow it offer us professional kit that allows professional full immersive storytelling.

Right now an Oculus Rift DK2 is not actually that sharp to the eye. The image is OK but the next generation of head set gear have vastly better screens and this will make the Lightfield technology even more important. Subtle but real spec changes are not relevant when you can’t make out a face that well due to low res screens, but the prototype new Sony, Oculus and Valve systems are going to scream out for such detail.

Sure they’ll be expensive, but then an original Sony F900 HDCAM was $75,000 when it came out and now my iPhone does better video. Initially, you might only even think about buying one if you had either a stack of confirmed paid work, or a major rental market to service, but hopefully the camera will validate the approach and provide a much needed professional solution for better stories.

How much and when?

No news on when the production units will actually ship, many of the images released for the launch are actually concept renderings, but the company has one of the only track records for shipping actual Lightfield cameras so the expectation is very positive about them pulling the Immerge off technically and delivering.

In Verge, Vrse co-founder and CTO Aaron Koblin commented that “light field technology is probably going to be at the core of most narrative VR” When a prototype version comes out in the first quarter of 2016, it’ll cost “multiple hundreds of thousands of dollars” and is intended for rental.

Lytro CEO Jason Rosenthal says the new cameras actually contain “multiple hundreds” of cameras and sensors and went on to suggest that the company may upgrade the camera quarterly.

Animation Disney Technology VFX

Disney’s Augmented Reality Characters from Colored Drawings

Photo from the Verge.

A Disney Research team has developed technology that projects coloring book characters in 3D while you’re still working on coloring them. The process was detailed in a new paper called “Live Texturing of Augmented Reality Characters from Colored Drawings,” and it was presented at the IEEE International Symposium on Mixed and Augmented Reality on September 29th. That title’s a mouthful, but it’s descriptive: the live texturing technology allows users to watch as their characters stand and wobble on the page and take on color as they’re being colored in. You can see an example in the video above: the elephant’s pants are turning blue on the tablet screen just as they’re being filled on the page itself.

Coloring books capture the imagination of children and provide them with one of their earliest opportunities for creative expression. However, given the proliferation and popularity of digital devices, real-world activities like coloring can seem unexciting, and children become less engaged in them. Augmented reality holds unique potential to impact this situation by providing a bridge between real-world activities and digital enhancements. In this paper, we present an augmented reality coloring book App in which children color characters in a printed coloring book and inspect their work using a mobile device. The drawing is detected and tracked, and the video stream is augmented with an animated 3-D version of the character that is textured according to the child’s coloring. This is possible thanks to several novel technical contributions. We present a texturing process that applies the captured texture from a 2-D colored drawing to both the visible and occluded regions of a 3-D character in real time. We develop a deformable surface tracking method designed for colored drawings that uses a new outlier rejection algorithm for real-time tracking and surface deformation recovery. We present a content creation pipeline to efficiently create the 2-D and 3-D content. And, finally, we validate our work with two user studies that examine the quality of our texturing algorithm and the overall App experience.

Download File “Live Texturing of Augmented Reality Characters from Colored Drawings-Paper”
[PDF, 1.72 MB]

Copyright Notice
The documents contained in these directories are included by the contributing authors as a means to ensure timely dissemination of scholarly and technical work on a non-commercial basis. Copyright and all rights therein are maintained by the authors or by other copyright holders, notwithstanding that they have offered their works here electronically. It is understood that all persons copying this information will adhere to the terms and constraints invoked by each author’s copyright. These works may not be reposted without the explicit permission of the copyright holder.

Film Sound Technology

Lucasfilm, Industrial Light & Magic and Skywalker Sound Launch ILMxLAB

From SoundWorks Collection

Industrial Light & Magic (ILM) and parent company Lucasfilm, Ltd. announce the formation of ILM Experience Lab (ILMxLAB), a new division that will draw upon the talents of Lucasfilm, ILM and Skywalker Sound. ILMxLAB combines compelling storytelling, technological innovation and world-class production to create immersive entertainment experiences. For several years, the company has been investing in real-time graphics – building a foundation that allows ILMxLAB to deliver interactive imagery at a fidelity never seen before. As this new dimension in storytelling unfolds, ILMxLAB will develop virtual reality, augmented reality, real-time cinema, theme park entertainment and narrative-based experiences for future platforms.


Click here for an exclusive interview with Rob Bredlow at FX Guide.