Intermittent Conjecture: Did Dory jump the shark?

I was fortunate enough to attend SIGGRAPH 86 and see the premier of Luxo Jr. If you haven't seen it, I'd highly recommend you do. It's only two minutes long.

Luxo Jr. was an eye-opener to me for a number of reasons. First, and this may be hard to believe now, it was a technical milestone. At the time, the field of computer graphics was in the process of moving from 3D wireframes like this to something more realistic, and Pixar did a lot of the heavy lifting in that move.

There were a number of problems to be solved at the time. Some of them had to do with how to render an image of a mathematical model, for example:

How to draw exactly what should be visible (hidden-line and hidden-surface removal). If your model has a cube, an image of that model should only show the faces nearer to you and not the ones on the back -- or anything that's covered by nearer objects in the scene.
How to show more realistic textures than just flat polygons. At first blush you might think that, say, a house is just a few flat walls with windows cut out. But those walls won't just be flat surfaces. There might be brick, or siding. Even a concrete or stucco surface will have little irregularities. Drawing flat surfaces with uniform colors will convey the overall design, but it won't look like the real thing.
How to deal with atmospheric effects. In real life, there might be smoke or mist in the air. Even on a clear day distant objects will have more muted colors than nearby ones.
How to deal with shiny objects. Even in the best case, the math for figuring out how bright a particular point on a surface should be is harder for shiny surfaces. At worst, you have to deal with reflections of other objects, and reflections of reflections, and so on, something like this.
How to deal with transparent and translucent objects (which might also be shiny). Again, this ranges from harder math for the shading to figuring out how the rest of the scene appears when distorted by a curved surface.
How to deal with shadows. If one part of your model is between a light source and another part, that other part will, naturally enough, be darker.
A whole slew of subtler optical effects -- color bleeding, depth of field, motion blur, caustics and probably several others I don't remember. I recall one presenter at a conference half-joking that the whole field had devolved into finding a new optical subtlety and writing a paper about how to render it.

Even if you knew how to render a model accurately, there were thorny questions about modeling:

Real scenes contain a whole lot of objects. Look around next time you're outside -- or inside an average house, office, store or whatever. A realistic rendering will have to account, somehow, for every blade of grass, every leaf, every feather of every bird, every rock on a gravel path, and so forth. You don't necessarily have to create a separate object for every detail, but somehow you have to be prepared to render either a green grassy texture or blades of grass, depending on how closely you're looking. Keep in mind that at that time a typical mobile phone of today would have seemed like a supercomputer (That may seem like hyperbole, but it's not. The ubiquitous SPARCtation 2, for example, ran at 40MHz with 128MB of RAM)
Objects move. In reality, they obey the laws of physics. In an entertainment video, they might move in all sorts of non-physical ways, but anything that's supposed to look lifelike had better move more or less like a real-live thing. Modeling the movement of a piece of clothing, or a full head of hair, or the surface of the ocean, or the flames in a fire, were each good for multiple published papers.

There were (and, I think still are for the most part) two approaches to problems like these:

Grinding out exact solutions to the optics (for rendering) and physics (for modeling)
Finding Stuff That Works.

At the time, ray-tracing was the state of the art for bashing out the optics, though that would soon be superseded by radiosity -- which had the distinction of being even slower than ray-tracing -- and more sophisticated numerical approaches. Jim Kajiya laid out a general form for the problem to be solved and demoed an image that used Monte Carlo simulation to produce what he called "a great simulation of film grain" (see the end of this PDF of the paper). It was a technical tour de force, solving a good chunk of the rendering problems above with one integral equation, using techniques that had been used to model the atomic bomb a generation earlier, among other things. It was not, however, a very impressive demo unless you knew exactly what to look for.

Pixar took the other, entirely different approach*. They handled hidden surfaces through what came to be known as "polygon pushing" -- reducing everything to a model with flat sides that was close enough to the real thing. Flatter parts of surfaces could get by with fewer polygons than curvier parts. You could then sort those polygons to see which was closest to the eye at any particular point. Fast sorting algorithms had been around for decades. Sorting in three dimensions is harder, but it's still possible to do it relatively quickly, even on what was fairly ordinary hardware.

They handled shadows through "shadow mapping", essentially calculating where shadows would fall on a surface and making that a property of the surface. You could figure out where the shadows would fall by looking at the scene from the point of view of the light source, using the same sorting algorithm as for hidden surfaces. You only had to re-do the shadow map when things moved, and much of a typical movie scene is background or otherwise not moving.

They handled textures with texture mapping and bump mapping, which treated the surfaces as flat but then modified the color or local orientation used in the actual shading calculations based on what exact part of the surface you were looking at. That's how the wooden floor in Luxo Jr. was done.

They also developed algorithms for modeling the movement of the lamps and their cords, but I'm less familiar with that. Overall they built up a library of rendering techniques, modeling techniques and models, some general-purpose and powerful, some specialized to particular tasks. Just as important, they built a framework to plug it all into harmoniously.

Kajiya's paper was a great example of the scientific approach, and it ended up underpinning a chunk of important work. It offered only an approximate solution, out of necessity, to the actual problem of putting pixels on the screen but it rigorously defined the exact problem to solve.

Pixar did engineering. They figured out what mattered and what didn't for the purposes of producing an image that would fool the eye in an entertaining video -- basically which shortcuts people would and wouldn't notice -- and applied their resources to solving the problems that mattered. They also developed software for managing a server farm doing the rendering and all kinds of tools to support the animators in making their magic.

I suppose I should take a moment to push back against a couple of stereotypes. It's tempting to write off "the scientific approach" as "of no practical value" or the engineering approach as "just a bunch of hacks". From what I can tell, though, it's hard to write a useful scientific paper in CS without knowing how to code, and it's hard to come up with a good practical hack without understanding what the full solution looks like. Both have been done, but most people who've made a difference have a healthy dose of both practical and theoretical knowledge and tend to move back and forth on the deep insight/cheap hack scale as the occasion demands or the mood strikes.

But all this technical discussion leaves out what made, and makes, Pixar truly special. The Pixar folks didn't just have formidable technical chops and great engineering sense. They told stories.

This was a conscious decision from the outset. John Lasseter and the rest of the team paid a lot of attention to the generation of animators before them, particularly the Disney studio.

If you're drawing every single frame of a picture by hand, even if you're using techniques like cel animation to re-use background drawings, you have to make every line count. The people who we now call the "traditional animators" developed a set of techniques, for example squash and stretch, to illustrate motion without detailing every single movement. They studied facial expressions in order to make their characters emote in a way we instinctively understand. They watched how people and animals moved in order to capture the essence of lifelike motion. They noticed that cute baby animals had (relatively) bigger heads than their adult counterparts, and made countless other observations that went into their work.

If you're just trying to figure out how to shade a model of a teapot by the conference submission deadline you probably won't pay much attention to these things, but the Pixar team did because their goal, from the beginning, was to tell stories with animation. This is crystal clear from the very start. The story in Luxo Jr. is pretty simple, but it's clearly a story, with characters with real emotions, even if those characters are metal desk lamps. In fact, that's the magic: Inanimate, computer-generated desk lamps brought to life -- literally animated.

Watching it at the time was one of those "I didn't realize you could do that" moments, not so much from the technical point of view, though it's technically quite good as well, but because after antiseptic wireframe video games and shiny special effects and endless discussions of ray-tracing vs. polygon pushing it didn't seem like storytelling had much at all to do with the field.

My co-workers and I went to dinner at a steakhouse in Dallas afterward. I remember talking about what portion of the real-life scene there could be modeled and rendered realistically with the resources available. Having seen a few papers presented on techniques for rendering transparent objects with curved surfaces I claimed that the wine glasses could be handled OK (not a foregone conclusion at that point). My boss dipped his thumb in steak juice and smudged it on the glass. "Render that". I muttered something about transparency mapping and such, and I might have been right, but the point was made.

With the tools we have these days, that smudge would be a minor obstacle. Computer-generated scenes still often have that too-clean look to them, but that's more a matter of choice. Computer imagery can handle grit and grime, but it's often easier to model without it. If it makes sense for the setting or character, it's there, but otherwise it's usually not. Also, I suspect, it's easier for an audience to make sense of a scene if the animated main characters look somewhat unnaturally clean and shiny while the trees off in the distance look realistic.

Which brings us to Finding Dory.

In my opinion it's not a bad film, but there's something missing. Technically, it continues Pixar's upward trend in awesomeness. The modeling for Hank the Septopus is so seamless you forget all about the huge amount of work that must have gone into it, from the motions of the tentacles to studying enough octopus behavior to make Hank a move like a realistic cephalopod, to knowing enough old-school animation technique to make him expressive within those parameters. And there's plenty more where that came from.

There are a number of acceptable breaks from reality, starting with talking animals, and on to reading animals, truck-driving animals, aquatic animals spending unlikely amounts of time out of water, and even a plot-convenient echolocation ability that apparently doesn't use ultrasound and works through air as well as water -- not to mention navigating around bends in pipes while still conveying that there are bends at all. That's all fine. I mean, if you're OK with talking underwater animals, hard-boiled skepticism is pretty much out the window to start with.

The problem, unfortunately, is the storytelling.

I had to stop here for a bit, partly because, even if I'm a bit of a curmudgeon, I don't really relish the thought of criticizing Dory, Nemo and the gang. Curmudgeons can still be fans. Mostly though, I realized that if I wanted to go there, I should at least have a specific reason to go there, and it took me a little while to pinpoint that reason.

In Finding Nemo, one of the best moments, and probably the biggest emotional payoff, is when Dory, the cognitively impaired blue tang who at first seems to have been there for comic relief and to play the role of the wacky, plot-complicating sidekick, realizes "I look at you and…I’m home."** The setting is as spare as can be, just two characters alone against a plain backdrop, one of them not even speaking, and that's what makes it work: the characters, and their slow realization of what's happened.

Dory didn't see it coming because, well, she's Dory and she had only fleeting hints that she was lost in the first place. Marlin didn't see it coming because he'd been consumed by his quest to redeem his guilt and remorse over losing Nemo. The audience didn't see it coming because the rest of the story was zipping along at Pixar's usual frenetic-but-impeccably-timed pace and keeping us engaged with a steady parade of engaging characters. It also doesn't hurt that Ellen DeGeneres delivers the speech perfectly.

And that's where Finding Dory's trouble begins. It's just going to be really hard to top a moment like that. It's probably not a good idea to even try. If Dory's backstory stays a backstory we can carry it with us however we like. Probably better to leave that magic alone. But at the same time, you can't blame Pixar for trying anyway. "How are you going to top that?" has driven a lot of creative people to a lot of really good work. I can't imagine there wasn't a little voice in the back of someone's head saying "Challenge accepted."

Meanwhile, rendering and modeling technology march on. Realistic waves crashing on a beach? We can do that. Schools of fish circling in a cylindrical tank? No problem. Northern California vegetation in a light mist? That's the morning commute. How about some Toy Story-style kids wreaking havoc as they plunge their hands into the touch pool, kicking up clouds of sand? Done. It's not that Pixar has ever been shy about pushing the technical envelope. It just seems a bit -- visible. Technique is hardly ever meant to be visible.

And of course, the mouse must be fed. When Dory dodged under Destiny the shark at the last second, I couldn't help thinking "That'll feature somewhere in a Disney ride". And it's not hard to guess which characters were likely to make for hot-selling plush toys. Nemo, Marlin, Crush the sea turtle and whoever else have to be there because sequel.

It's not that commercial tie-ins and franchise characters are bad per se. Those server farms don't run themselves (well, at least not yet). It's just that, like the technical mastery, the commercial machinery is not supposed to actually jump out at you.

In the end, Finding Dory's weakness boils down to fundamentals: the external constraints are muscling in on the plot, and the plot is driving the characters, when it should be the other way around. In Luxo Jr., there's hardly any plot at all. The whole point is to use the technology -- really just a bunch of crunching of a bunch of numbers describing colors, geometric shapes and such -- to show us believable characters. Character wins, maybe not every single time, but almost always. That's especially true if you're Pixar, which is why between Luxo Jr. and Finding Dory, that two-minute short is the better film of the two.

Is this the end of Pixar as we know it? Is it all merchandising and sequels from here on out? Well, three of the four upcoming Pixar projects with titles are sequels (Cars 3, The Incredibles 2 and Toy Story 4) Let's hope that Lasseter's pledge that "If we have a great story, we'll do a sequel" holds. I haven't seen Monsters University or Toy Story 3 (I think), but as I recall Pixar handled Toy Story 2 pretty deftly, sequel though it was.

Really it's impressive that they haven't stumbled any more than they have, all things considered. But this one definitely feels like a stumble.

* I'm writing most of this from memory, so I'm only mostly confident it's mostly right. Corrections are welcome.

**Just put Dory home in the search bar, 14 years after Finding Nemo came out, and there it is.

[I still see it on the first page of hits, but there's a lot of Finding Dory mixed in with it now. Not sure what to make of that --D.H. Mar 2020[

Intermittent Conjecture

Monday, March 20, 2017

Did Dory jump the shark?

No comments:

Post a Comment