Maas Digital

Transaction Error

My favorite band (2NE1) is coming to my favorite city (Taipei) and I am on a quest to get a good ticket! Unfortunately I can’t just walk half a block to my neighborhood 7-11 and buy one on their iBon terminal like people who live there. Thanks to the miracle of the Internet though, this should be no problem, right?

Check ticket-sale website at exact announced sale opening of 10:58am Taiwan time.
Server near meltdown, but still responding after delays. Go Taiwan IT industry!
Go through seat selection & order process. Front section status: ON SALE
Cannot check out without having an account on the site.
Abort order, start account creation process.
Asking for my Taiwan Citizen Card ID number. Which I obviously do not have.
Go get my old expired Alien Resident Card which sometimes works for these things.
Realize that they also take passport numbers instead. Whew.
Awkwardly enter my U.S. mailing address and phone into the account creation form.
Repeat seat selection & order process. Front section status: ON SALE
Will-call button is unavailable. Can only receive tickets by mail. Remember I entered my U.S. mailing address. There is no way regular mail will get here in time. Abort order process.
Spend 15 minutes on Facebook and Skype frantically trying to reach anyone in Taiwan who has a mailing address I can use.
Realize I can just send it c/o my friend Drake at the Digimax corporate office.
Go to ticket site account management interface. Server still near meltdown. Update mailing address to Digimax.
Repeat seat selection & order process. Front section status: 80 seats left
Try to pay with American Express, since my other credit cards usually decline international transactions.
Find out they do not take American Express. (they don’t actually say this, but the form won’t accept a 15-digit credit card number).
Repeat seat selection & order process. Front section status: 23 seats left
Try every single VISA credit/debit card in my wallet one after another. Each time I get an error message that my web browser renders as gibberish (because the server is spitting out Big-5 Chinese text without the proper HTML meta tags – boo Taiwan IT industry).
Presume it’s some kind of transaction-declined error. This is expected; my VISA card issuers have a hard time understanding the concept that I may occasionally partake in an activity called “travel” that may involve buying things outside of my hometown.
Configure web browser to recognize improperly-tagged Big-5. Repeat seat selection & order process.
Find out what error message ACTUALLY says: “Sorry, the section you have selected is sold out.”
Check seat selection page again. Front sections status: SOLD OUT
Have a very emotional moment. (haha).
Check account management page again just to see if anything is there.
Account says, order history. One ticket. Processing status: “OK”
Really?!?! Dance in celebration.
Screenshot order page to get some hard evidence in case the thing fails later.
Post to Facebook (of course).
Wait by phone in case credit card company calls to verify transaction.
Realize credit card company may very well call my old deactivated land line in New York.

Still not sure if I got the ticket, but it was a fun adventure!

RPC for Twisted Python servers

Lesson learned: if you need an RPC protocol for Python/Twisted, don’t use AMP, use HTTP (with urllib if necessary for synchronous calls). AMP’s 65k message limit is really annoying to work within, and if you strace it you’ll see it doing tons of 1-byte reads/writes. HTTP overhead sucks, but I’d rather have that than the limitations of AMP.

edit: after a few more months of experience, this turns out to be wrong. AMP is actually not too bad in terms of the socket calls it makes, and you can work around the 65k message limit (at the cost of extra round-trips) by using a chunked result protocol. urllib, on the other hand, is pretty awful for synchronous HTTP. It is the source of those 1-byte reads/writes, because it doesn’t have a buffering mechanism between the socket and the Python file interface it is trying to implement.

John Carmack on data storage for graphics

John Carmack’s annual QuakeCon addresses are gold mines of insight on 3D graphics and game development (2011 link).

One interesting point in this year’s talk was about how to structure files and I/O for data-intensive graphics. John came out very strongly in favor of what you might call an “mmap-in-place” approach. Today his code still uses heavily-compressed “pack” files that are read serially, but that’s mostly due to storage space limitations on game consoles. In the future he is favoring binary formats that are very close to the final in-memory representation of the data, so that you just mmap() the file and use it in place rather than running it through some kind of parser.

This surprised me because most of my experiments with mmap have not shown big wins relative to conventional serial file I/O. (e.g. I once hacked an MP3 player to use mmap() rather than read(), and was surprised to find that it performed poorly and trashed the buffer cache). Modern operating systems are very good at loading and caching big linear files on disk, but not as good at handling lots of random paging on memory-mapped files. I couldn’t figure out why John thought mmap-in-place was the ideal solution, until it occurred to me that his use cases are qualitatively different from mine.

Let’s contrast the two I/O approaches in a few ways. First, does a given access actually need to hit the disk? If all of one’s data fits into RAM, then neither I/O system will require disk access. mmap() will be more efficient because there is only one copy of the data in memory rather than two, and program code can access it directly. This is actually a very important consideration for future-proofing. Any code that does complicated things to get around memory limitations should have a good “fast path” that kicks in once Moore’s Law makes those limitations obsolete. For example, a Linux kernel developer once remarked that any database that uses unbuffered disk I/O should include an option to fall back to regular buffered I/O, or else it will perform very poorly in cases where RAM is actually big enough to hold the entire working set. Note that in John’s game-engine world, games levels are specifically designed to always fit into the available memory on target platforms, so it’s always going to use this “fast path.” Whereas, in the offline rendering world, I can’t guarantee that my datasets will always fit in RAM, so mmap-in-place may end up causing more I/O than reading everything serially.

Second, consider the issues of disk access locality and latency. At first glance, it seems that serial I/O on a well-designed, compressed file format is ideal, because disk reads are large and linear, whereas mmap() I/O is inefficient because the access pattern is random. However, I believe John makes an unstated assumption that most of the bulky data consists of graphical details that can be loaded asynchronously, like high-resolution textures and models, and not “core” data structures that must be present in order for the engine to run. In this case, I/O latency and locality are less important. Also, I think John assumes the use of a clever virtual-memory scheme as in his MegaTexture system, which improves locality of access.

So, in a game engine where the working set usually fits into available RAM, and where data can be paged in asynchronously, mmap-in-place does make a lot of sense as a data storage architecture. But for offline applications where you don’t have enough RAM for everything, and where reads have to be synchronous, mmap may not be the ideal approach.

All of this has got me thinking in more detail about what the true disk/memory storage needs are for high-end offline rendering. We spend a lot of time developing clever tricks to minimize memory needs, like on-demand geometry tessellation (REYES/procedurals), mip-mapping, and brickmaps. Most of my rendering optimizations boil down to trying very hard to minimize how much geometry needs to be kept in RAM. It’s interesting to take a step back and think about how much of this work is really necessary. After all, RAM is getting ridiculously cheap. Optimizations to squeeze a scene into 4GB might be useless or even counterproductive when you’ve got 16GB. Is there some point at which we can just dump everything into a naive ray-tracer and forget about all of this annoying optimization work?

Mip-mapping and brickmaps have more or less completely solved the problem of texture memory access. By selecting mip-map tiles using screen-space metrics, we’ve gotten pretty close to optimal in terms of I/O and memory needs for 2D and 3D textures. The remaining problem is just geometry. Smart culling and REYES do a fantastic job on camera-visible geometry; it’s more about ray-visible geometry. You can only fit so many million tessellated micropolygons in RAM, and given the poor locality and wide scope of ray paths, there isn’t a clear upper bound on what might need to be tessellated as there is with straight camera-visible geometry.

You’ve also got the problem of modifications to geometry – clever ray-tracing data structures usually aren’t designed for cases where major pieces of the scene are transforming or deforming every frame. This is why ray tracing hasn’t completely taken over from REYES in production. Ray tracing is theoretically O(log N), but that’s only after you have built an acceleration data structure. In practice it’s more like O(N) because you still need to stream all that source geometry into the system to get it ready to be traced. As of today, this means storing your models on disk, then serially reading those files and translating them into a ray-tracer-friendly data structure in memory. For my current project, which isn’t all that geometry-heavy, this involves going through 100-200MB of data every frame. If we are ever going to do high-quality rendering at interactive frame rates, this will need to change. John’s talk suggests the interesting approach of encoding models into some kind of virtually-paged ray acceleration structure. Perhaps we could run a pre-pass on baked models and animations, converting them into some kind of special binary format that the renderer can mmap on demand.

Animating Mars Rovers

The Mars Rover’s six-wheeled “rocker-bogie” suspension system is an ingenious mechanical design that allows the vehicle to crawl nimbly over rocky terrain. Visualizing how this system moves has been one of my biggest challenges.

Let’s start with how the rover works in reality. Each of the six wheels has its own electric motor. When the rover wants to go somewhere, it uses those motors to turn the wheels forward. The many joints in the suspension and the rover body are pulled along by friction forces on the wheels, transmitted through the suspension structure. Disregarding side-to-side movement, turning, and slippage, there seem to be five degrees of freedom: two bogie angles, the rocker angle, plus pitch and bank on the rover body. (there is only one degree of freedom in the rocker because the left and right rockers are linked through a differential that ensures they move equally far in opposite directions). All dynamics of these five joints are determined from the geometry of the wheels contacting the ground.

Unfortunately, current off-the-shelf animation software is pretty bad at modeling this system. I have considered two basic approaches: first, what I call a “static” approach, where you specify the XZ position and heading of the rover body, and then attempt to determine the rest of the degrees of freedom based on this. The static approach is not “path-dependent” – it solves for the joint positions on each frame independently, without considering forces or inertia. Second, the “dynamic” approach would actually simulate the full physical system evolving over time.

I tried the “dynamic” approach first, but ran into serious problems. Maya’s rigid body dynamics solver appears to use forward integration and therefore has trouble dealing with the very stiff forces necessary to hold the suspension system together (e.g. the differential bar). Furthermore, its collision model does not come close to generating realistic interaction with the ground. In order to model wheels, it is important to be able to specify friction independently on the lateral and longitudinal axes, but Maya only offers a single isotropic friction control. The new nCloth solver seemed like it would give better results, but nCloth does not handle rigid bodies yet (and using very stiff soft bodies makes the joints seem like they are made of Jell-O).

The “static” approach is somewhat more animator-friendly, because you can easily scrub through time and “puppeteer” the rover motion rather than having to run a full simulation first. (and Maya’s simulation baking feature is horribly broken, but that’s a rant for another day). Disregarding forces and inertia, and just solving for the constrained joints seems like a fairly simple geometry problem. Given the rover body position, you compute the XZ coordinates of the wheels, sample the terrain there to determine the wheel heights, then derive all of the suspension joint angles and rover body pitch/bank from the mechanical constraints. Unfortunately, this seems to be a nonlinear problem, because changing the joint angles also changes the wheel positions, which affects the height of the terrain underneath them. I think a fully “correct” solver would have to use an iterative approach: sample the wheels at zero joint deflection, compute the joint angles, then re-sample the terrain at the updated wheel positions, and continue iterating until the system settles down in a stable configuration.

At this point I have built a rig based on Maya expressions that performs just one iteration. Essentially it makes the approximation that the terrain height underneath each wheel does not change as the joints rotate, which is true in the limit of small joint deflections. This is actually good enough to use for animation shots, with the addition of additional manual keyframe controls to fudge things where the rig fails.

Thinking about this some more, I bet I could develop a “second-order” rig that uses the basic rig as input, but then re-samples the terrain height at the wheel positions determined by the first-order rig. It’s like a poor-man’s iterative solver, done manually with Maya expressions…

Edit: Of course, I did go and read a bunch of papers from the robotics literature about how engineers actually simulate rocker-bogie systems. However, it seems like in all of these cases, they assume you already know the joint angles and wheel contact geometry, and are just solving for forces, or trying to figure out how to rotate the wheels to drive in a given direction. I could not find a paper that gives any guidance about how to simulate driving “from scratch” over arbitrarily complex terrain.

One one hand

I remember back when I needed more than one hand to count render time in minutes. Now if an HD frame takes more than 2 minutes I get antsy.

How do I keep render times short? 1) Use an absolute minimum amount of ray tracing. I cache ambient occlusion in point clouds for non-deforming objects (and also for many deforming objects – it really doesn’t look bad). 2) Stream all input to the renderer in a single pass. No writing RIB or other big intermediate files to disk.

Maya 2012 bugs

I do admire Autodesk’s work in porting Maya from its old (XForms-based?) interface to Qt. The transition is almost seamless, except for a few annoying UI bugs. These are minor bugs requiring a keystroke or two to work around, but many are in the “critical path” for working animators, so they seriously hurt usability.

the script editor button does nothing
focus does not return to original window after pressing “Enter” on numeric keyframe time/value fields
same issue for channel box fields
A/F “focus selected”/”focus all” keys randomly fail to work
import “resolve only clashing nodes” option does not work
play/stop toggle key occasionally fails to work, or waits several frames before taking effect
pop-up dialogs cannot be dismissed with “Enter” key unless first clicked to set focus (despite looking like they are focused)

Organizing scenes 2

I implemented the new scene layout convention. It feels awkward, but it works. As a benefit, I can now in theory load more than once scene into interp at once, without any interference. This is now the third time I’ve implemented something akin to a module/package system in interp. It’s definitely crying out for one. I’m pretty sure I have all the code I need to implement it. I just need to make sure whatever solution I come up with works smoothly in the presence of parameter state, and the toplevel syntax has to be logical and clean.

Organizing Scenes

With terrain mostly under control now, I spent some time trying to re-organize how scenes are put together and fed to interp.

Historically, I have built each scene as single, large .mcp file that pulls in a few common definitions from a project-specific .mcp library. The common part is rather small, including only a handful of things like file locations and shader definitions. The rest of the scene consists of a large blob of interp code in the scene-specific .mcp file. This approach has the following features:

Advantage: scenes are largely independent of each other. Assets can be tweaked from scene to scene easily without interference.
Disadvantage: a large amount of interp code is duplicated across scenes. (e.g., instructions for how to render the main element, shadow passes, and atmosphere, and how to comp it all together). Updates to common assets are not propagated across the duplicates.
Disadvantage: scene files are hard to read. The ratio of “important” code to boilerplate is low.

For my current animation project, I’m trying a different approach, putting as much of the boilerplate code as possible into common files. Scenes are much smaller now, down from ~500 lines to ~200 lines or so. Common elements reference the same sources. However, accomplishing scene-specific tweaks while keeping everything well-organized is a challenge.

The problem has to do with interp’s stateless design. I have always believed very strongly that a practical scene processor must be stateless, in order to allow parallelism and caching. Symbol definitions can only reference previously-declared symbols (or themselves), so everything is built in a bottom-up manner. This works fine with abstract math and algorithms, but production scenes are another matter. They contain lots of mutually-referencing definitions, some of which stay constant for an entire project, and some of which must vary on a scene-by-scene basis (ideally with a default value that can be picked up without writing any code). For example, the outer structure of the comp tree and main scene graph are usually constant for a project. However, these reference “inner” symbols, like the list of active objects, and some lighting parameters, which can vary by scene. This prevents a straightforward bottom-up structure for the .mcp files, because the common “headers” at the top of scene need to reference varying things declared in the main body of the file. (old-fashioned monolithic scene files are built bottom-up, but they must duplicate a ton of boilerplate because the scene-specific tweaks start very close to the “bottom”).

I’m also a strong believer in referential transparency and early binding. This means I won’t use preprocessor tricks that operate at the textual level (like C #defines). Definitions should not be updated “behind the back” of the interpreter. Interp does provide a way for specific symbols to “break” statelessness. These are called “parameters” and are used for changing the evaluation environment in a stack-like manner. The classic cases are the symbols for “time”, “width,” and “height” of the current rendered image, since those can change throughout a comp tree evaluation, and it’s very ugly to pass them as explicit parameters to every single function. However, I insist on keeping the use of parameters to an absolute minimum, since they interfere with caching and slow down the interpreter (due to the symbols being bound late). In fact, I consider any use of parameters beyond time and region-of-interest information to be highly questionable. There is no way I’d make every single scene-varying data item a parameter.

Another complication is my long-term plan that involves interp running as an interactive server process. Most scene description systems have serious trouble with interactive usage because they are designed to operate start-to-end as a batch process, like a C compiler. My long-term vision is for interp to read the library files and scenes for an entire project, keeping them all “live” in memory — or in some sort of object-oriented, version-controlled database, where today’s .mcp files are just a snapshot of the database contents — while artists manipulate symbol values interactively. Text-level preprocessor tricks will never work in this case.

I’m currently thinking that a solution might involve managing a dictionary of scene-specific “properties” (which are given default values in a project-wide parent dictionary, then optionally overridden one by one in a scene-specific child dictionary). The dictionary would, unfortunately, have to be a parameter in order to allow library functions to reference things defined below. Something like this:

library.mcp says:
       (def default-values (dict "prop1" 123.0 "prop2" 456.0))
       (defparam scene-values)
       (def scene-values default-values)
then library functions say:
       (def myfunc (+ (get-scene-value "prop1") (get-scene-value "prop2")))
and the scene looks like this:
(execfile "/project/library.mcp")
(set scene-values (dict default-values "prop2" 789.0))
(def foo (myfunc))

Tiled image input

I went ahead and added tiled image input to interp, in order to make the ROAM terrain generator more efficient. It was a little easier than I thought – about a full day’s work, although a substantial portion of that was me adding tiling support to my custom EXR loader, which I ended up not using in favor of good old LibTIFF. (mainly because EXR does not support 16-bit or 8-bit integer channels, which I need for storing terrain heights and colors efficiently). Luckily I didn’t need to write a tiling utility, because PRMan ships with one called “tiffcopy.”

Combined with parallel processing, terrain generation times have gone down significantly, from about 60 seconds to 15 seconds for a decent-quality full-res shot now (shadingrate = 5). The tiled image input did not speed things up much by itself, but it greatly reduced memory usage, which allowed me to run more concurrent threads. My laptop has only 4GB of RAM and could only run 6 threads before, now it’s up to 8. That’s where most of the speed-up came from.

I realized that my ROAM generator has very bad locality in its texture lookups. This is hindering further texture-related efficiency gains. Unlike a REYES renderer, which marches across finely-tessellated surface, the ROAM generator jumps all over the place. It starts with planet-scale triangles and alternates between them as it subdivides the world more and more finely. Delaying vertex evaluation and sorting batches in lat/lon space has helped, and it’s probably worth looking more in this direction in the future.

I also haven’t gotten as far as estimating filter sizes and performing actual mip-mapping yet. Reducing the total number of texels touched is likely to give more speed-ups.

My friend gives me the J. J. Abrams treatment

I think this is my favorite photo of myself, like, ever. Look at the special effects!