Sunday, December 30, 2018

Computer chess: Dumber and smarter at the same time?

[As usual, I've added inline comments for non-trivial corrections to the original text, but this one has way more than most.  I've come up to speed a bit on the current state of computer chess, so that's reflected here.  The result is not the most readable narrative, but I'm not going to try to rewrite it --D.H. Apr 2019]

One of the long-running tags on the other blog is "dumb is smarter".  The idea is that, at least in the world of software, it's perilous to build a lot of specialized knowledge into a system.  Specialized knowledge means specialized blind spots -- any biases or incorrect assumptions in the specialization carry over to the behavior of the system itself.  In cases like spam filtering, for example, a skilled adversary can exploit these assumptions to beat the system.

I will probably follow up on the other blog at some point on how valid the web.examples of this that I cited really were, and to what extent recent developments in machine learning have changed the picture (spoiler alert: at least some).  Here I'd like to focus on one particular area: chess.  Since this post includes Deep Mind, part of the Alphabet family, I should probably be clear that what follows is taken from public sources I've read.


For quite a while and up until recently, the most successful computer chess programs have relied mainly on brute force, bashing out millions and millions of possible continuations from a given position and evaluating them according to fairly simple rules [in actual tournament games, it's not unusual to see over a billion nodes evaluated for a single move].  There is a lot of sophistication in the programming, in areas like making good use of multiple processors, avoiding duplicate work, representing positions in ways that make good use of memory, generating possible moves efficiently and deciding how to allocate limited (though large) processing resources to a search tree that literally grows exponentially with depth, but from the point of view of a human chess player, there's nothing particularly deep going on, just lots and lots of calculation.

For a long time, a good human player could outplay a good computer program by avoiding tactical blunders and playing for positional advantages that could eventually be turned into a winning position that even perfect tactical play couldn't save.  If the human lost, it would generally be by missing a tactical trick that the computer was able to find by pure calculation.  In any case, the human was looking at no more than dozens of possible continuations and, for the most part, calculating a few moves ahead, while the computer was looking at vastly more positions and exploring many more of them in much greater depth than a person typically would.

The sort of positional play that could beat a computer -- having a feel for pawn structure, initiative and such -- comes reasonably naturally to people, but it's not easily expressed in terms of an evaluation function.  The evaluation function is a key component of a computer chess engine that reduces a position to a number that determines whether one position should get more attention than another.  Since there are millions of positions to evaluate, the evaluation function has to be fast.  A typical evaluation function incorporates a variety of rules of the kind you'll find in beginning chess books -- who has more material, who has more threats against the other's pieces, whose pieces have more squares (particularly central squares) to move to, are any pieces pinned or hanging, and so forth.

There's quite a bit of tuning to be done here -- is it more important to have a potential threat against the opponent's queen or to control two squares in the center? -- but once the tuning parameters are set, they're set, at least until the next release.  The computer isn't making a subtle judgment based on experience.  It's adding up numbers based on positions and alignments of pieces.

It's not that no one thought of writing a more subtle evaluation function, one that would allow the program to look at fewer, but better positions.  It's just that it never seemed to work as well.  Put a program that looks at basic factors but looks at scads and scads of positions against one that tries to distill the experience of human masters and only look at a few good moves, and the brute force approach has typically won.  The prime example here would be Stockfish, but I'm counting, engines such as earlier versions of Komodo as brute force since, as I understand it, they use the same alpha/beta search technique and examine large numbers of positions.  I'm having trouble finding exact numbers on that, though [If you look at the stats on chess.com's computer chess tournaments, it's very easy to tell who's who.  For a 15|5 time control, for example, you either see on the order of a billion nodes evaluated per move or on the order of a hundred thousand.]

[Houdini is another interesting example.  Its evaluation function puts more weight on "positional" factors such as active pieces and space, and it tries to be highly selective in which moves it devotes the most resources too.  This is, explicitly, trying to emulate the behavior of human players.  So it's not quite correct to say that programs that try to emulate humans have done worse than ones that just bash out continuations.  Houdini has done quite well.

However, from a software point of view, these programs are all largely similar.  There is an evaluation function that's explicitly coded as rules you can read, and this is used to decide how much attention to pay to what part of the search tree.

Alpha zero, by contrast, uses machine learning (aka neural network training, aka deep neural networks) to build an evaluation function that's totally opaque when considered purely as code.  There are techniques to examine what a neural network is doing, but they're not going to reduce to rules like "a bishop is worth three times as much as a pawn".  It also uses a Monte Carlo approach to examine the search tree probabilistically, which is a somewhat different way to use the evaluation function to guide the search.  As I understand it, this is not the usual practice for other engines, though it's certainly possible to incorporate a random element into a conventional chess engine.  Komodo MC comes to mind.

In short, the narrative of "purely mechanical programs beat ones that try to emulate humans" is not really right, but Alpha Zero still represents a radically different approach, and one that is in some sense structurally more similar to what things-with-brains do.  --D.H. Feb 2019]

This situation held for years.  Computer hardware got faster, chess programs got more finely tuned and their ratings improved, but human grandmasters could still beat them by exploiting their lack of strategic understanding.  Or at least that was the typical explanation:  Humans had a certain je ne sais quoi that computers could clearly never capture, not that the most successful ones were particularly trying to.  Occasionally you'd hear a stronger version: computers could never beat humans since they were programmed by humans and could never know more than their programmers, or some such, but clearly there's a hole in that logic somewhere ...


Then, in 1997, Deep Blue (an IBM project not directly related to Deep Mind) beat world champion Garry Kasparov in a rematch, Kasparov having won the first match between the two.  It wasn't just that Deep Blue won the games.  It outplayed Kasparov positionally just like human masters had been outplaying computers, but without employing anything you could point at and call strategic understanding.  It just looked at lots and lots of positions in a fairly straightforward way.

This isn't as totally surprising as it might seem.  The ultimate goal in chess is to checkmate the opponent.  In practice, that usually requires winning a material advantage, opening up lines of attack, promoting a pawn in the endgame or some other tactical consideration.  Getting a positional advantage is just setting things up to make that more likely.  Playing a simple tactical game but looking twenty plies (half-moves) ahead turns out to be quite a bit like playing a strategic game. [In fact, computer-computer chess games are almost entirely positional, since neither player is going to fall for a simple tactical trap.  That's not to say tactics don't matter.  For example I've seen any number of positions where a piece looked to be hanging, but couldn't be taken due to some tactical consideration.  What I haven't seen is a game won quickly by way of tactics.]


Human-computer matches aren't that common.  At first, the contests were too one-sided.  There was no challenge in a human grandmaster playing a computer.  Then, as computers became stronger, few grandmasters wanted to risk being the first to lose to a computer (and Kasparov is to be commended for taking up the challenge anyway).  Now, once again, the contests are too one-sided.  Human players use computers for practice and analysis, but everyone loses to them head-to-head, even with odds given (the computer plays without some combination of pieces and pawns).

At this writing, current world champion Magnus Carlsen, by several measures the strongest player ever or at the very least in the top handful, stands less than a 2% chance of beating the current strongest computer program head to head.  With the question of "will computers ever beat the best human players at chess?" firmly settled, human players can now be compared on the basis of how often they play what the computer would play.  Carlsen finds the computer move over 90% of the time, but it doesn't take many misses to lose a game against such a strong player.


And now comes AlphaZero.

The "alpha" is carried over from its predecessor, AlphaGo which, by studying games between human players of the game of go and using machine learning to construct a neural network for evaluating positions, was able to beat human champion Lee Sedol.  This was particularly significant because a typical position in go has many more possible continuations than a typical chess position, making the "bash out millions of continuations" approach impractical.  Given that computer chess had only fairly recently reached the level of top human players and go programs hadn't been particularly close, it had seemed like a good bet that humans would continue to dominate go for quite a while yet.

AlphaGo, and machine learning approaches in general, use what could be regarded as a much more human approach, not surprising since they're based on an abstraction of animal nervous systems and brains rather than the classic Turing/Von Neumann model of computing, notwithstanding that they're ultimately still using the same computing model as everyone else.  That is, they run on ordinary computer hardware, though often with a specialized "tensor processor" for handling the math.

However, the algorithm for evaluating positions is completely different.  There's still an evaluation function, but it's "run the positions of the pieces through this assemblage of multilinear functions that the training stage churned out".  Unlike a conventional chess engine, there's nothing you can really point at and say "this is how it knows about pins" or "this is how it counts up material".

AlphaGo looks at many fewer positions [by a factor of around 10,000] than a conventional chess engine would, and it looks at them probabilistically, that is, it uses its evaluation function to decide how likely it is that a particular continuation is worth looking at, then randomly chooses the actual positions to look at based on that.  It's still looking at many more positions than a human would, but many fewer than a pure brute-force approach would.


The "zero" is because the training stage doesn't use any outside knowledge of the game.  Unlike AlphaGo, it doesn't look at games played by humans.  It plays against itself and accumulates knowledge of what works and what doesn't.  Very roughly speaking, AlphaZero in its training stage plays different versions of its network against each other, adjusts the parameters based on what did well and what did badly, and repeats until the results stabilize.

AlphaZero does this knowing only the rules of the game, that is, what a position looks like, what moves are possible, and how to tell if the game is won, lost, drawn (tied) or not over yet.  This approach can be applied to a wide range of games, so far go, chess and shogi (a chess-like game which originated in Japan).  In all three cases AlphaZero achieved results clearly better than the (previous) best computer players after a modest number of hours of training (though the Stockfish team makes a good case that AlphaZero had a hardware advantage and wasn't playing against the strongest configuration).  [Recent results indicate that LC0, the strongest neural net based engine, and Stockfish, the strongest conventional engine, are very evenly matched, but LC0 doesn't have the benefit of a tensor processor to speed up its evaluation --D.H. May 2019]

Notably, AlphaZero beat AlphaGo 60 games to 40.


In one sense, AlphaZero is an outstanding example of Dumb is Smarter, particularly in beating AlphaGo, which used nearly the same approach, but trained from human games.  AlphaZero's style of play has been widely described as unique.  Its go version has found opening ideas that had lain undiscovered for centuries.  Its chess version plays sacrifices (moves that give up material in hopes of a winning attack) that conventional chess engines pass up because they can't prove that they're sound.  Being unbiased by exposure to human games or a human-developed evaluation function, AlphaZero can find moves that other programs would never play, and it turns out these moves are often good enough to win, even against chess engines that never make tactical mistakes.

On the other hand, AlphaZero specifically avoids sheer brute force.  Rather than look at lots and lots of positions using a relatively simple evaluation function, it looks at many fewer, using a much more sophisticated evaluation function to sharply limit the number of positions it examines.  This is the same approach that had been tried in the past with limited success, but with two key differences:  The evaluation function is expressed as a neural network rather than a set of explicit rules, and that neural network is trained without any human input, based solely on what works in practice games against AlphaZero itself.


The Dumb is Smarter tag on Field Notes takes "dumb" to mean "no special sauce" and "smarter" to mean "gets better results".  The "smarter" part is clear.  The "dumb" part is more interesting.  There's clearly no special sauce in the training stage.  AlphaZero uses a standard machine learning approach to produce a standard neural network.

On the other hand, if you consider the program itself without knowing anything about the training stage, you have a generic engine, albeit one with a somewhat unusual randomized search algorithm, and an evaluation function that no one understands in detail.  It's all special sauce -- a set of opaque, magical parameters that somehow give the search algorithm the ability to find the right set of variations to explore.

I think it's this opaqueness that gives neural networks their particularly uncanny flavor (uncanny, etymologically, roughly means "unknowable").  The basic approach of taking some inputs and crunching some numbers on them to get an output is unequivocally dumb.  As I said above, "It's adding up numbers based on positions and alignments of pieces."  Except that those numbers are enabling it to literally make "a subtle judgment based on experience", a judgment we have no real choice but to call "smart".


Progress marches on.  At least one of the previous generation of chess engines (Komodo) has incorporated ideas from AlphaZero [Leela has open-sourced the neural network approach wholesale].  It looks like the resulting hybrid isn't dominant, at least not yet, but it does play very strong chess, and plays in a completely different, more human, style from the conventional version.  That's interesting all by itself.