Tuesday, October 29, 2019

More on context, tool use and such

In the previous post I claimed that (to paraphrase myself fairly loosely) whether we consider behaviors that technically look like "learning", "planning", "tool use" or such to really be those things has a lot to do with context.  A specially designed robot that can turn a door handle and open the door is different from something that sees a door handle slightly out of reach, sees a stick on the ground, bends the end of the stick so it can grab the door handle and proceeds to open the door by using the stick to turn the handle and then to poke the door open.  In both cases a tool is being used to open a door, but we have a much easier time calling the second case "tool use".  The robot door-opener is unlikely to exhibit tool use in the second case.

With that in mind, it's interesting that the team that produced the hide-and-seek AI demo is busily at work on using their engine to play a Massively Multiplayer Online video game.  They argue at length, and persuasively, that this is a much harder problem than chess or go.  While the classic board games may seem harder to the average person than mere video games, from a computing perspective MMOs are night-and-day harder in pretty much every dimension:
  • You need much more information to describe the state of the game at any particular point (the state space is much larger).  A chess or go position can be described in well under 100 bytes.  To describe everything that's going on at a given moment in an MMO takes more like 100,000 bytes (about 20,000 "mostly floating point" numbers)
  • There are many more choices at any given point (the action space is much larger).  A typical chess position has a few dozen possible moves.  A typical go position may have a couple hundred.  In a typical MMO, a player may have around a thousand possible actions at a particular point, out of a total repertoire of more than 10,000.
  • There are many more decisions to make, in this case running at 30 frames per second for around 45 minutes, or around 80,000 "ticks" in all.  The AI only observes every fourth tick, so it "only" has to deal with 20,000 decision points.  At any given point, an action might be trivial or might be very important strategically.  Chess games are typically a few dozen moves long.  A go game generally takes fewer than 200 (though the longest possible go game is considerably longer).  While some moves are more important than others in board games, each requires a similar amount and type of calculation.
  • Players have complete information about the state of a chess or go game.  In MMOs, players can only see a small part of the overall universe.  Figuring out what an unseen opponent is up to and otherwise making inferences from incomplete data is a key part of the game.
Considered as a context, an MMO is, more or less by design, much more like the kind of environment that we have to plan, learn and use tools in every day.  Chess and go, by contrast, are highly abstract, limited worlds.  As a consequence, it's much easier to say that something that looks like it's planning and using tools in an MMO really is planning and using tools in a meaningful sense.

It doesn't mean that the AI is doing so the same way we do, or at least may think we do, but that's for a different post.


  1. Ultimately, I think, it will turn out that the way we do it (though pretty amazing, when it comes down to it) is not the optimum way.

  2. But then, that could likely be said about whatever AI we come up with as well.

    (assuming we can meaningfully define "optimum" here)

    1. Point taken. But at least computers don't have endocrine systems.