Best approach to depth streaming via existing codec
- by Kevin
I'm working on a development system (and game) intended for games set mostly in static third-person views. We produce our scenery by CG and photographic techniques. Our background art is rendered off-line by a production-grade renderer.
To allow the runtime imagery to properly interact with the background art, I wrote a program to convert from depth output by Mental Ray into a texture, and a pixel shader to draw a quad such that the Z data comes from the texture.
This technique is working out very well, but now we've decided that some of the camera angle changes between scenes should be animated. The animation itself is straightforward to produce from our CG models. We intend to encode it to some HD video codec such as H.264.
The problem is that in order to maintain our runtime imagery on the screen, the depth buffer will need to be loaded for each video frame. Due to the bandwidth, the video's depth data will need to be compressed efficiently.
I've looked into methods for performing temporal compression of depth info and found an interesting research paper here:
http://web4.cs.ucl.ac.uk/staff/j.kautz/publications/depth-streaming.pdf
The method establishes a mapping between 16-bit depth values and YCbCr values. The mapping is tuned to the properties of existing video codecs in order to maximize precision of the decoded depths after the YCbCr has undergone video compression. It allows an existing, unmodified video codec to be used on the backend.
I'm looking at how to pull this off with the least possible work. (This design change was unplanned.) Our game engine itself is native C++, presently for Win32 and DirectX, although we've worked hard to keep platform dependence segregated because we intend other ports.
We don't have motion video facilities in the engine yet but will ultimately need that anyway for cinematics. I was planning on using some off-the-shelf motion video solution we can plug into our engine, and haven't chosen one yet. This new added requirement makes selecting one harder since, among other things, we'll now need to bypass colourspace conversion on one of the streams, and also will need to be playing two streams simultaneously in lockstep, on top of in some cases audio on one of them (for the cinematics).
I'm also wondering if it's possible (or even useful) to do the conversion from YCbCr to depth in a pixel shader, or if it's better to just do it in CPU and separately load the resulting depth values into a locked tex. The conversion unfortunately does involve branching logic per-pixel. (There are more naive mappings that don't need branching, but they produce inferior results.) It could be reduced to a table lookup but the table would be 32MB.
Programming is second-nature to me but I'm not that experienced with pix shaders and have zero knowledge of off-the-shelf video solutions.
I'd therefore be interested in advice from others who may have dealt more with depth streaming, pixel shaders, and/or off-the-shelf codecs, regarding how feasible the proposed application is and what off-the-shelf video systems out there would best get along with this usage case.