Tuesday, October 29, 2019

More on context, tool use and such

In the previous post I claimed that (to paraphrase myself fairly loosely) whether we consider behaviors that technically look like "learning", "planning", "tool use" or such to really be those things has a lot to do with context.  A specially designed robot that can turn a door handle and open the door is different from something that sees a door handle slightly out of reach, sees a stick on the ground, bends the end of the stick so it can grab the door handle and proceeds to open the door by using the stick to turn the handle and then to poke the door open.  In both cases a tool is being used to open a door, but we have a much easier time calling the second case "tool use".  The robot door-opener is unlikely to exhibit tool use in the second case.

With that in mind, it's interesting that the team that produced the hide-and-seek AI demo is busily at work on using their engine to play a Massively Multiplayer Online video game.  They argue at length, and persuasively, that this is a much harder problem than chess or go.  While the classic board games may seem harder to the average person than mere video games, from a computing perspective MMOs are night-and-day harder in pretty much every dimension:
  • You need much more information to describe the state of the game at any particular point (the state space is much larger).  A chess or go position can be described in well under 100 bytes.  To describe everything that's going on at a given moment in an MMO takes more like 100,000 bytes (about 20,000 "mostly floating point" numbers)
  • There are many more choices at any given point (the action space is much larger).  A typical chess position has a few dozen possible moves.  A typical go position may have a couple hundred.  In a typical MMO, a player may have around a thousand possible actions at a particular point, out of a total repertoire of more than 10,000.
  • There are many more decisions to make, in this case running at 30 frames per second for around 45 minutes, or around 80,000 "ticks" in all.  The AI only observes every fourth tick, so it "only" has to deal with 20,000 decision points.  At any given point, an action might be trivial or might be very important strategically.  Chess games are typically a few dozen moves long.  A go game generally takes fewer than 200 (though the longest possible go game is considerably longer).  While some moves are more important than others in board games, each requires a similar amount and type of calculation.
  • Players have complete information about the state of a chess or go game.  In MMOs, players can only see a small part of the overall universe.  Figuring out what an unseen opponent is up to and otherwise making inferences from incomplete data is a key part of the game.
Considered as a context, an MMO is, more or less by design, much more like the kind of environment that we have to plan, learn and use tools in every day.  Chess and go, by contrast, are highly abstract, limited worlds.  As a consequence, it's much easier to say that something that looks like it's planning and using tools in an MMO really is planning and using tools in a meaningful sense.

It doesn't mean that the AI is doing so the same way we do, or at least may think we do, but that's for a different post.

Tool use, planning and AI

A recent story in MIT Technology Review carries the headline AI learned to use tools after nearly 500 million games of hide and seek, and the subhead OpenAI’s agents evolved to exhibit complex behaviors, suggesting a promising approach for developing more sophisticated artificial intelligence.  This article, along with several others, is based on a blog post on OpenAI's site.  While the article is a good summary of the blog post, the blog post is just as readable while going into somewhat more depth and technical detail.  Both the article and the blog post are well worth reading, but as always the original source should take precedence.

There is, as they say, quite a bit to unpack here, and before I'm done this may well turn into another Topic That Ate My Blog.  At the moment, I'm interested in two questions:
  • What does this work say about learning and intelligence in general?
  • To what extent or in what sense do terms like "tool use" and "planning" describe what's going on here?
My answers to both questions changed significantly between reading the summary article and reading the original blog post.

As always, lurking behind stories like this are questions of definition, in particular, what do we mean by "learning", "planning" and "tool use"?  There have been many, many attempts to pin these down, but I think for the most part definitions fall into two main categories, which I'll call internal and external here.  Each has its advantages and drawbacks.

By internal definition I mean an attempt to formalize the sort of "I know it when I do it" kind of feeling that a word like learning might trigger.  If I learn something, I had some level of knowledge before, even if that level was zero, and after learning I could rattle off a new fact or demonstrate a new skill.  I can say "today I learned that Madagascar is larger than Iceland" or "today I learned how to bake a soufflĂ©".

If I talk about planning, I can say "here's my plan for world domination" (like I'd actually tell you about the robot army assembling itself at ... I've said too much) or "here's my plan for cleaning the house".  If I'm using a tool, I can say "I'm going to tighten up this drawer handle with a Philips screwdriver", and so forth.  The common thread is here is a conscious understanding of something particular going on -- something learned, a plan, a tool used for a specific purpose.

This all probably seems like common sense, and I'd say it is.  Unfortunately, common sense is not that helpful when digging into the foundations of cognition, or, perhaps, of anything else interesting.  We don't currently know how to ask a non-human animal to explain its thinking.  Neither do we have a particularly good handle on how a trained neural network is arriving at the result it does.  There may well be something encoded in the networks that control the hiders and seekers in the simulation, which we could point at and call "intent", but my understanding is we don't currently have a well-developed method for finding such things (though there has been progress).

If we can't ask what an experimental subject is thinking, then we're left with externally visible behavior.  We define learning and such in terms of patterns of behavior.  For example, if we define success at a task by some numerical measure, say winning percentage at hide and seek, we can say that learning is happening when behavior changes and the winning percentage increases in a way that can't be attributed to chance (in the hide-and-seek simulation, the percentage would tilt one way or another as each side learned new strategy, but this doesn't change the basic argument).

This turns learning into a pure numerical optimization problem: find the weights on the neurons that produce the best winning percentage.  Neural-network training algorithms are literally doing just such an optimization.  Networks in the training phase are certainly learning, by definition, but certainly not in the sense that we learn by studying a text or going to a lecture.  I suspect that most machine learning researchers are fine with that, and might also argue that studying and lectures are not a large part of how we learn overall, just the part we're most conscious of as learning per se.

This tension between our common understanding of learning and the workings of things that can certainly appear to be learning goes right to why an external definition (more or less what we call an operational definition) can feel so unsatisfying.  Sure, the networks look like they're learning, but how do we know they're really learning?

The simplest answer to that is that we don't.  If we define learning as optimizing a numerical value, then pretty much anything that does that is learning.  If we define learning as "doing things that look to us like learning", then what matters is the task, not the mechanism.  Learning to play flawless tic-tac-toe might be explained away as "just optimizing a network" while learning to use a ramp to peer over the wall of a fort built by a group of hiders sure looks an awful lot like the kind of learning we do -- even though the underlying mechanism is essentially the same.

I think the same reasoning applies to tool use: Whether we call it tool use or not depends on how complex the behavior appears to be, not on the simple use of an object to perform a task.  I remember reading about primates using a stick to dig termites as tool use and thinking "yeah, but not really".  But why not, exactly?  A fireplace poker is a tool.  A barge pole is a tool.  Why not a termite stick?  The only difference, really, is the context in which they are used.  Tending a fire or guiding a barge happen in the midst of several other tools and actions with them, however simple in the case of a fireplace and andirons.  It's probably this sense of the tool use being part of a larger, orchestrated context that makes our tool use seem different.  By that logic, tool use is really just a proxy for being able to understand larger, multi-part systems.

In my view this all reinforces the point that "planning", "tool use" and such are not binary concepts.  There's no one point at which something goes from "not using tools" to "using tools", or if there is, the dividing line has to be fairly arbitrary and therefore not particularly useful.  If "planning" and "tool use" are proxies for "behaving like us in contexts where we consider ourselves to be planning and using tools", then what matters is the behavior and the context.  In the case at hand, our hiders and seekers are behaving a lot like we would, and doing it in a context that we would certainly say requires planning and intelligence.

As far as internal and external definitions, it seems we're looking for contexts where our internal notions seem to apply well.  In such contexts we have much less trouble saying that behavior that fits an external definition of "tool use", "planning", "learning" or whatever is compatible with those notions.