Sunday, December 30, 2018

Computer chess: Dumber and smarter at the same time?

One of the long-running tags on the other blog is "dumb is smarter".  The idea is that, at least in the world of software, it's perilous to build a lot of specialized knowledge into the system.  Specialized knowledge means specialized blind spots -- any biases or incorrect assumptions in the specialization carry over to the behavior of the system itself.  In cases like spam filtering, for example, a skilled adversary can exploit these assumptions to beat the system.

I will probably follow up on the other blog at some point on how valid the web.examples of this that I cited really were, and to what extent recent developments in machine learning have changed the picture (spoiler alert: at least some).  Here I'd like to focus on one particular area: chess.  Since this post concerns Deep Mind, part of the Alphabet family, I should probably be clear that what follows is taken from public sources I've read.


For quite a while and up until recently, the most successful computer chess programs have relied mainly on brute force, bashing out millions and millions of possible continuations from a given position and evaluating them according to fairly simple rules.  There is a lot of sophistication in the programming, in areas like making good use of multiple processors, avoiding duplicate work, representing positions and generating possible moves efficiently and deciding how to allocate limited (though large) processing resources to a search tree that literally grows exponentially with depth, but from the point of view of a human chess player, there's nothing particularly deep going on, just lots and lots of calculation.

For a long time, a good human player could outplay a good computer program by avoiding tactical blunders and playing for positional advantages that could eventually be turned into a winning position that even perfect tactical play couldn't save.  If the human lost, it would generally be by missing a tactical trick that the computer was able to find by pure calculation.  In any case, the human was looking at no more than dozens of possible continuations and, for the most part, calculating a few moves ahead, while the computer was looking at vastly more positions and exploring many more of them in much greater depth than a person typically would.

The sort of positional play that could beat a computer -- having a feel for pawn structure, initiative and such -- comes reasonably naturally to people, but it's not easily expressed in terms of an evaluation function.  The evaluation function is a key component of a computer chess engine that reduces a position to a number that determines whether one position should get more attention than another.  Since there are millions of positions to evaluate, the evaluation function has to be fast.  A typical evaluation function incorporates a variety of rules of the kind you'll find in beginning chess books -- who has more material, who has more threats against the other's pieces, whose pieces have more squares (particular central squares) to move to, are any pieces pinned or hanging, and so forth.

There's quite a bit of tuning to be done here -- is it more important to have a potential threat against the opponent's queen or to control two squares in the center? -- but once the tuning parameters are set, they're set, at least until the next release.  The computer isn't making a subtle judgment based on experience.  It's adding up numbers based on positions and alignments of pieces.

It's not that no one thought of writing a more subtle evaluation function, one that would allow the program to look at fewer, but better positions.  It's just that it never seemed to work as well.  Put a program that looks at basic factors but looks at scads and scads of positions against one that tries to distill the experience of human masters and only look at a few good moves, and the brute force approach has typically won.  The prime example here would be Stockfish, but I'm counting, engines such as earlier versions of Komodo as brute force since, as I understand it, they use the same alpha/beta search technique and examine large numbers of positions.  I'm having trouble finding exact numbers on that, though.

This situation held for years.  Computer hardware got faster, chess programs got more finely tuned and their ratings improved, but human grandmasters could still beat them by exploiting their lack of strategic understanding.  Or at least that was the typical explanation:  Humans had a certain je ne sais quoi that computers could clearly never capture, not that the most successful ones were even trying.  Occasionally you'd hear a stronger version: computers could never beat humans since they were programmed by humans and could never know more than their programmers, or some such, but clearly there's a hole in that logic somewhere ...


Then, in 1997, Deep Blue (an IBM project not directly related to Deep Mind) beat world champion Garry Kasparov in a rematch, Kasparov having won the first match between the two.  It wasn't just that Deep Blue won the games.  It outplayed Kasparov positionally just like human masters had been outplaying computers, but without employing anything you could point at and call strategic understanding.  It just looked at lots and lots of positions in a fairly straightforward way.

This isn't as totally surprising as it might seem.  The ultimate goal in chess is to checkmate the opponent.  In practice, that usually requires winning a material advantage, or opening up lines of attack, or promoting a pawn in the endgame, or some other tactical consideration.  Getting a positional advantage is just setting things up to make that more likely.  Playing a simple tactical game but looking twenty plies (half-moves) ahead turns out to be quite a bit like playing a strategic game.


Human-computer matches aren't that common.  At first, the contests were too one-sided.  Then, as computers became stronger, few grandmasters wanted to risk being the first to lose to a computer (and Kasparov is to be commended for taking up the challenge anyway).  Now, once again, the contests are too one-sided.  Human players use computers for practice and analysis, but everyone loses to them head-to-head, even with odds given (the computer plays without some combination of pieces and pawns).

At this writing, current world champion Magnus Carlsen, by several measures the strongest player ever or in the top handful, stands less than a 2% chance of beating the current strongest computer program head to head.  With the question of "will computers ever beat the best human players at chess?" firmly settled, human players can now be compared on the basis of how often they play what the computer would play.  Carlsen finds the computer move over 90% of the time, but it doesn't take many misses to lose a game against such a strong player.


And now comes AlphaZero.

The "alpha" is carried over from its predecessor, AlphaGo which, by studying games between human players of the game of go and using machine learning to construct a neural network for evaluating positions, was able to beat human champion Lee Sedol.  This was particularly significant because a typical position in go has many more possible continuations than a typical chess position, making the "bash out millions of continuations" approach impractical.  Given that computer chess had only fairly recently reached the level of top human players and go programs hadn't been particularly close, it had seemed like a good bet that humans would continue to dominate go for quite a while yet.

AlphaGo, and machine learning approaches in general, use what could be regarded as a much more human approach, not surprising since they're based on an abstraction of animal nervous systems and brains rather than the classic Turing/Von Neumann model of computing, notwithstanding that they're ultimately still using the same computing model as everyone else.  That is, they run on ordinary computer hardware, though often with a specialized "tensor processor" for handling the math.  However, the algorithm for evaluating positions is completely different.  There's still an evaluation function, but it's "run the positions of the pieces through this multilinear function that the training stage churned out".  Unlike a traditional chess engine, there's nothing you can really point at and say "this is how it knows about pins" or "this is how it counts up material".

AlphaGo looks at many fewer positions than a chess-style program would, and it looks at them probabilistically, that is, it uses its evaluation function to decide how likely it is that a particular continuation is worth looking at, then randomly chooses the actual positions to look at based on that.  It's still looking at many more positions than a human would, but many fewer than a pure brute-force approach would.


The "zero" is because the training stage doesn't use any outside knowledge of the game.  Unlike AlphaGo, it doesn't look at games played by humans.  It plays against itself and accumulates knowledge of what works and what doesn't.  Very roughly speaking, AlphaZero in its training stage plays different versions of its network against each other, adjusts the parameters based on what did well and what did badly, and repeats until the results stabilize.

AlphaZero does this knowing only the rules of the game, that is, what a position looks like, what moves are possible, and how to tell if the game is won, lost, drawn (tied) or not over yet.  This approach can be applied to a wide range of games, so far go, chess and shogi (a chess-like game which originated in Japan).  In all three cases AlphaZero achieved results clearly better than the (previous) best computer players after a modest number of hours of training (though the Stockfish team makes a good case that AlphaZero had a hardware advantage and wasn't playing against the strongest configuration).  Notably, AlphaZero beat AlphaGo 60 games to 40.


In one sense, AlphaZero is an outstanding example of Dumb is Smarter, particularly in beating AlphaGo, which used nearly the same approach, but trained from human games.  AlphaZero's style of play has been widely described as unique.  Its go version has found opening ideas that had lain undiscovered for centuries.  Its chess version plays sacrifices (moves that give up material in hopes of a winning attack) that "classical" chess engines pass up because they can't prove that they're sound.  Being unbiased by exposure to human games or a human-developed evaluation function, AlphaZero can find moves that other programs would never play, and it turns out these moves are good enough to win, even against chess engines that never make tactical mistakes.

On the other hand, AlphaZero specifically avoids sheer brute force.  Rather than look at lots and lots of positions using a relatively simple evaluation function, it looks at many fewer, using a much more sophisticated evaluation function to sharply limit the number of positions it examines.  This is the same approach that had been tried in the past with limited success, but with two key differences:  The evaluation function is expressed as a neural network rather than a set of explicit rules, and that neural network is trained without any human input, based solely on what works in practice games against AlphaZero itself.


The Dumb is Smarter tag on Field Notes takes "dumb" to mean "no special sauce" and "smarter" to mean "gets better results".  The "smarter" part is clear.  The "dumb" part is more interesting.  There's clearly no special sauce in the training stage.  AlphaZero uses a standard machine learning approach to produce a standard neural network.

On the other hand, if you consider the program itself without knowing anything about the training stage, you have a generic engine, albeit one with a somewhat unusual randomized search algorithm, and an evaluation function that no one understands in detail.  It's all special sauce -- a set of opaque, magical parameters that somehow give the search algorithm the ability to find the right set of variations to explore.

I think it's this opaqueness that gives neural networks their particularly uncanny flavor (uncanny, etymologically, roughly means "unknowable").  The basic approach of taking some inputs and crunching some numbers on them to get an output is unequivocally dumb.  As I said above, "It's adding up numbers based on positions and alignments of pieces."  Except that those numbers are enabling it to literally make "a subtle judgment based on experience", a judgment we have no real choice but to call "smart".


Progress marches on.  At least one of the previous generation of chess engines (Komodo) has incorporated ideas from AlphaZero.  It looks like the resulting hybrid isn't dominant, at least not yet, but it does play very strong chess, and plays in a completely different, more human, style from the "classical" version.  That's interesting all by itself.

No comments:

Post a Comment