Intermittent Conjecture: Tool use, planning and AI

A recent story in MIT Technology Review carries the headline AI learned to use tools after nearly 500 million games of hide and seek, and the subhead OpenAI’s agents evolved to exhibit complex behaviors, suggesting a promising approach for developing more sophisticated artificial intelligence. This article, along with several others, is based on a blog post on OpenAI's site. While the article is a good summary of the blog post, the blog post is just as readable while going into somewhat more depth and technical detail. Both the article and the blog post are well worth reading, but as always the original source should take precedence.

There is, as they say, quite a bit to unpack here, and before I'm done this may well turn into another Topic That Ate My Blog. At the moment, I'm interested in two questions:

What does this work say about learning and intelligence in general?
To what extent or in what sense do terms like "tool use" and "planning" describe what's going on here?

My answers to both questions changed significantly between reading the summary article and reading the original blog post.

As always, lurking behind stories like this are questions of definition, in particular, what do we mean by "learning", "planning" and "tool use"? There have been many, many attempts to pin these down, but I think for the most part definitions fall into two main categories, which I'll call internal and external here. Each has its advantages and drawbacks.

By internal definition I mean an attempt to formalize the sort of "I know it when I do it" kind of feeling that a word like learning might trigger. If I learn something, I had some level of knowledge before, even if that level was zero, and after learning I could rattle off a new fact or demonstrate a new skill. I can say "today I learned that Madagascar is larger than Iceland" or "today I learned how to bake a soufflé".

If I talk about planning, I can say "here's my plan for world domination" (like I'd actually tell you about the robot army assembling itself at ... I've said too much) or "here's my plan for cleaning the house". If I'm using a tool, I can say "I'm going to tighten up this drawer handle with a Philips screwdriver", and so forth. The common thread is here is a conscious understanding of something particular going on -- something learned, a plan, a tool used for a specific purpose.

This all probably seems like common sense, and I'd say it is. Unfortunately, common sense is not that helpful when digging into the foundations of cognition, or, perhaps, of anything else interesting. We don't currently know how to ask a non-human animal to explain its thinking. Neither do we have a particularly good handle on how a trained neural network is arriving at the result it does. There may well be something encoded in the networks that control the hiders and seekers in the simulation, which we could point at and call "intent", but my understanding is we don't currently have a well-developed method for finding such things (though there has been progress).

If we can't ask what an experimental subject is thinking, then we're left with externally visible behavior. We define learning and such in terms of patterns of behavior. For example, if we define success at a task by some numerical measure, say winning percentage at hide and seek, we can say that learning is happening when behavior changes and the winning percentage increases in a way that can't be attributed to chance (in the hide-and-seek simulation, the percentage would tilt one way or another as each side learned new strategy, but this doesn't change the basic argument).

This turns learning into a pure numerical optimization problem: find the weights on the neurons that produce the best winning percentage. Neural-network training algorithms are literally doing just such an optimization. Networks in the training phase are certainly learning, by definition, but certainly not in the sense that we learn by studying a text or going to a lecture. I suspect that most machine learning researchers are fine with that, and might also argue that studying and lectures are not a large part of how we learn overall, just the part we're most conscious of as learning per se.

This tension between our common understanding of learning and the workings of things that can certainly appear to be learning goes right to why an external definition (more or less what we call an operational definition) can feel so unsatisfying. Sure, the networks look like they're learning, but how do we know they're really learning?

The simplest answer to that is that we don't. If we define learning as optimizing a numerical value, then pretty much anything that does that is learning. If we define learning as "doing things that look to us like learning", then what matters is the task, not the mechanism. Learning to play flawless tic-tac-toe might be explained away as "just optimizing a network" while learning to use a ramp to peer over the wall of a fort built by a group of hiders sure looks an awful lot like the kind of learning we do -- even though the underlying mechanism is essentially the same.

I think the same reasoning applies to tool use: Whether we call it tool use or not depends on how complex the behavior appears to be, not on the simple use of an object to perform a task. I remember reading about primates using a stick to dig termites as tool use and thinking "yeah, but not really". But why not, exactly? A fireplace poker is a tool. A barge pole is a tool. Why not a termite stick? The only difference, really, is the context in which they are used. Tending a fire or guiding a barge happen in the midst of several other tools and actions with them, however simple in the case of a fireplace and andirons. It's probably this sense of the tool use being part of a larger, orchestrated context that makes our tool use seem different. By that logic, tool use is really just a proxy for being able to understand larger, multi-part systems.

In my view this all reinforces the point that "planning", "tool use" and such are not binary concepts. There's no one point at which something goes from "not using tools" to "using tools", or if there is, the dividing line has to be fairly arbitrary and therefore not particularly useful. If "planning" and "tool use" are proxies for "behaving like us in contexts where we consider ourselves to be planning and using tools", then what matters is the behavior and the context. In the case at hand, our hiders and seekers are behaving a lot like we would, and doing it in a context that we would certainly say requires planning and intelligence.

As far as internal and external definitions, it seems we're looking for contexts where our internal notions seem to apply well. In such contexts we have much less trouble saying that behavior that fits an external definition of "tool use", "planning", "learning" or whatever is compatible with those notions.

Intermittent Conjecture

Tuesday, October 29, 2019

Tool use, planning and AI

No comments:

Post a Comment