a blog by Bartek Drozdz

Tool Xmas Card – realtime video compositing in WebGL

This Christmas, at Tool we wanted to create a small interactive experience to share with our friends and clients. Since lately I did experiment with compositing WebGL objets on a video [1, 2] I thought this is a cool technique that we can use.

The idea was simple enough: we would shoot a Christmas tree in a nicely decorated room and composite-in a gift box that the user can interact with while watching the video. All this is rendered with WebGL – the video runs in the background and the 3d interactive content on top, both layers are matched in perspective and movement. To achieve this effect I had to use quite a lot of different pieces of software. Here’s a breakdown of what it took to build it: Cinema4D. First of all, I needed to match the perspective of the camera in the footage and that of the camera in the 3D scene. There is no exact science in doing that, so the best way is to take a frame from the video, use it as background in C4D and try to match manually. It’s a trial and error technique and adjusting the details can be quite a challenge. Fortunately, I found a good book about matchmoving with some very useful tips… like that one about writing down what lens was used during filming (I forgot about that, of course… :) Mocha. After matching comes tracking. At first I wanted to do full camera solve, but it turned out to be quite complex and not necessary in this case, so I went with 2D tracking instead. A very good tool to do this is the Mocha AE Plugin – it is easy to use, accurate and fast. Using 2D tracking means that we only track movement on the XY plane. We also do not account for any rotation of the camera. For handheld shots, where there’s only slight camera movement, it is good enough. Of course this would never work for tracking or panning shots – these require full camera solve. Once the tracking is done and tweaked, Mocha allows to export the tracking data into a text-based format. After this, all I needed was a simple Python script to turn this data into a nicely formated JSON. Unity3D. Once the 3D model of the gift was in place and all the camera angles were matched in C4D, I exported the whole thing to Unity3D. The main reason for this, is that I wanted to take advantage of the Unity/WebGL exporter to get that to WebGL. I also used Unity’s animation editor to create the movement of the box when the cap flies off and the nutcracker pops out. Do do that, I added some functionality to J3D to support animations. It can be found in the v2 branch of J3D. One of the main changes in the engine I had to make, was to switch from Euler angles to Quaternions. FFMpeg. In order to correctly track video and overlay any elements with precision, I needed to know the frame the video is at at any time during playback. The easiest way is to take the current time and divide it by the duration of a single frame (ex. 1/24 seconds). Unfortunately it is not that simple! If you want an in-depth look why it is so complicated please read the excellent article by Zeh Fernando. Even though his article talks about Flash, same thing applies to HTML5 video. Long story short, each video used in a tracked shot needs to have the number of the frame encoded into it. The best way to do this: encode the frame number as binary marker somewhere in the video. To see how it looks, try playing one of the videos directly in your browser. See those white boxes on the bottom? This is it! Python/PIL. Adding the binary marker to the video can be painful if done manually. This is where Python comes in. Using a library called PIL (Python Imaging Library) I wrote a simple script that does the following:
  • decompose a video clip into a sequence of PNGs (using FFMPEG)
  • manipulate each image by adding the binary frame number at the bottom
  • encode the frames back into a video optimized for HTML5 (FFMPEG again)
On the Javascript side, I use a simple technique of copying part of the video into a canvas and reading the color of the pixels to determine what frame we are at. And of course I make sure to mask the marker. Video encoding tip. One thing to consider when adding the frame number marker is that video playback is optimal if the pixel size of the video is modulo 16, for example 1024 x 576. Influxis posted a list of all the optimal video dimensions (again, Flash or HTML5 – same rules apply). Now, if your video has an optimized pixel dimension it would be a shame to add a few pixels with the binary marker and get a resulting video that is no longer optimized. It’s usually better to add the marker over of the video. You will need to sacrifice a few pixels of the footage, but a better playback speed will make up for that. WebGL. With the Unity exporter getting the scene to render in WebGL was simple. The one big thing left at this point were custom shaders that would make the gift box blend well with the video. In fact is was the most challenging part od the project! The shader I ended up using on the box has diffuse and specular lighting with a specular map, reflections with a reflection map and a normal map to make the gift wrap look as realistic as possible. Finally, I added a bit of personalization – if you type your name after the # in the URL it will be rendered on the label on the box (example). Sound. Last but not least: our friends at Plan8 added some great interactive sound FX that, as always, add a lot to the final effect. It was fun to create this. I feel like this technique can definitely have some interesting uses. Of course, nowadays, anything that requires WebGL is treated as experimental, but once the majority of browsers can render that… What do you think?
  • Alexander on January 28th, 2013

    It looks nice. But isn’t this insanely too much work and time to do this stuff in HTML 5 vs doing it in Flash.

  • bartek drozdz on January 28th, 2013

    @Alexander how would it be any easier in Flash!?!

  • Alexander on January 29th, 2013

    Ok I am not an expert but here is my guess.
    Flash can work with video, stage3D(3D) and mouse. So setting up
    a 3D model over a video should not be such a big deal. Since you
    can manipulate the camera and perspective of the stage3D.
    Tracking mouse also is easy to do in flash.
    Also I guess you can get information about current video frame
    directly from flash video API without
    encoding this into the video.
    Also one big advantage is that Flash has Monocle profiling tool
    which gives you a way to run-time debug your 3D environment and
    test shaders and see directly the results.
    Maybe I am wrong but it seems to me that doing this in Flash will require at least 2 times less time and effort. Don’t want to
    criticize your work this looks great. I just compare the time
    and the effort, and I am far from being an expert like you ;).

  • bartek drozdz on January 29th, 2013

    @Alexander ok, let’s take it point by point:

    > Flash can work with video, stage3D(3D) and mouse
    HTML5/Javascript can work with video, webgl and mouse in pretty much the same way

    > Since you can manipulate the camera and perspective of the stage3D.
    I assume you would use Away3D or similar in Flash to put together your 3D scene – because stage3D alone has no concept of camera or perspective. In fact Stage3D and WebGL are very similar APIs – compare http://help.adobe.com/en_US/FlashPlatform/reference/actionscript/3/flash/display3D/Context3D.html and http://www.khronos.org/registry/webgl/specs/latest/

    For WebGL, instead of Away3D, you can use Three.js with similar effects. I used J3D which is an engine I wrote from scratch, but not specifically for this project. Also, writing custom shaders for WebGL using GLSL (high level language) seems easier that using AGAL (assembly code) in Flash, but maybe it’s just me

    > Tracking mouse also is easy to do in flash.
    It is as easy to do in Javascript

    > Also I guess you can get information about current video frame directly from flash video API
    no, you can’t rely on that – please read Zeh’s article (link above)

    > one big advantage is that Flash has Monocle profiling tool
    http://benvanik.github.com/WebGL-Inspector/

    finally, whether you use Flash or WebGL you still need to do tracking and matching perspective in some specialized software (I used Mocha and C4D, but other software can be used too)

  • Tomek on January 30th, 2013

    Nice to see you posting again Bartek :) Happy New Year!

  • bartek drozdz on January 30th, 2013

    Thanks dude! Happy New Year to you too! How’s Stockholm, is it cold enough to ice skate on the lake? :)

Post a comment