Intermittent Conjecture: Computer chess: where now?

In an effort to wrap up this thread, at least for a while, here's an attempt to recap some of the major points and to conjecture about what might be next, and what this might tell us about chess, and intelligence in general.

Since playing perfect chess appears intractable, we have a classic tradeoff: give up on perfect play in order to fit our actual strategy into our limited resources. Chess theory, combined with practice to develop pattern recognition and calculation, is optimized for the human brain. Chess theory is just shorthand for "patterns and rules that players have discovered over the centuries." These days, that very much includes discoveries by computer players.

For computers (at least the Von Neumann-style processors currently in commercial use), there are now two main options:

exhaustive search to a limited depth combined with alpha/beta pruning to avoid exploring moves that can't be as good as moves already considered, using an explicit set of rules to evaluate whether a position is good or bad (known as AB, after the alpha/beta part)
a completely opaque neural net evaluation of positions combined with a limited randomized search of possible variations (known as NN, for the neural net part), though you'll also see mention of MCTS (Monte Carlo Tree Search), referring to the randomized search.

There are some hybrid approaches that combine explicit rules with random search, but one way or another there has to be a tradeoff of performance for limited resources.

It's probably worth repeating that NN engines still consider far more possible continuations than humans can. The numbers for human, NN and AB are roughly dozens, hundreds of thousands and billions respectively. We can assume that those last two numbers are going to increase over time, and the first one isn't. Moore's law may (or may not) be tailing off, but there will still be improvements in software and bigger piles of hardware in the future. There could also be breakthroughs like some sort of quantum chess computer that can examine vastly larger numbers of possible positions, in which case all bets are off.

It's interesting explore what human, AB and NN players do and don't have in common. One common factor is the importance of what's traditionally called "positional" play, that is, taking into account factors beyond tactical concerns about winning material or forcing a quick checkmate. Tactics are still important, but as the level of play becomes higher it becomes more and more important to consider factors that influence the longer-term course of the game, factors like pawn structure, space, initiative and so forth.

Positional play is interesting from a computing standpoint because it's not the primary objective of the game -- that would be checkmate -- or even the secondary objective -- generally considered to be winning material to make it easier to checkmate, or using the threat of winning material to force the opponent into a weaker position. The positional elements that have been developed over the centuries aren't readily obvious consequences of the rules. They are human constructs aimed at making an intractable problem tractable. Heuristics, in other words. In broad terms, all three kinds of players are using the same approach -- rules of thumb plus calculation -- just in different mixes and with different rules of thumb.

It seems significant, then, that computers have, in effect, rediscovered positional factors in their play, even without having them explicitly programmed in. Deep Blue beat Kasparov, by Kasparov's own assessment, by outplaying Kasparov positionally. The surprise was that it did this with only basic knowledge of what makes a position good or bad and that the rest emerged from looking at large numbers of possible positions, much further ahead than a human could look.

Similarly, in their training phases, NN engines like Alpha Zero learn to play good moves without any obvious tactical benefit -- and indeed some that can look like blunders or at least unnecessarily risky at first glance -- without any cues beyond what happened to win in training games. They do seem to produce more than their share of "wild" positions, and "unorthodox" moves, but even then their play can generally be described in terms like "an unusually aggressive pawn push to initiate a kingside attack" or "a positional sacrifice to obtain more active pieces", and not (except in endgames, it seems) "no idea ... looks like it picked that move at random".

Maybe that just means that we have enough terms for things going on in chess that at least some of them are bound to apply to any particular move or position, but even when engines play moves that flout previously accepted theory, such cases are probably the memorable exception. In a great many positions each of the three types of player would find the others' preferred moves perfectly reasonable. They might well disagree over which exact move is best, but in most cases their evaluations will be largely similar. Typically they will all rule out the great majority of possible moves as inferior, and they will largely agree on which moves are inferior. For that matter, AB engines and human grandmasters have been known to flout the same rules. It's not just NN that can throw a curve ball from time to time.

All this is to say that if three radically different approaches to chess-playing can reach a reasonable consensus in discarding almost all possible moves and that their ranking among the best moves agrees reasonably well -- though by no means perfectly -- with accepted chess theory, then most likely there is something to accepted chess theory. Call it an emergent property of the rules.

So where do we go from here? Can we combine the strengths of the three approaches, roughly speaking high-level planning for humans, calculation for AB engines and positional assessment for NN engines? This next bit is more speculation than the nice summary I was hoping for, but here goes:

It might seem almost self-evident that if we knew the full details of how grandmasters see the board then computers would be able to do even better, but it's not like that hasn't been tried. Chess engine developers have been working with top-level human players for decades, with good results, but not always with the best results. Stockfish, which currently rules the AB world, benefits from a large distributed team making continual tweaks to the algorithm, with only the best making it into the next release. Contributions can come from anyone. I'm sure a lot of contributors are also strong chess players, but it's not a prerequisite.

Alpha Zero, of course, dispenses with outside expertise entirely. Other NN engines try to incorporate outside knowledge, but the ones that don't, notably Alpha Zero and its open-source cousin LC0, seem to do best.

Humans clearly do things differently from computers, but as I've said previously, it's quite possible that this is because these things work for humans but not necessarily for computers. My guess is that trying to model the way humans formulate plans when playing chess is not going to lead to stronger chess engines. At this point, both kinds of engines have a reasonable substitute for human-style conscious planning, namely calculating possible continuations en masse. That's a good fit for computers, and there's no strong reason to think that trying to emulate humans instead would be a better fit.

Even NN engines, which focus mainly on evaluating particular positions, benefit from calculation. While NN engines may look downright clueless in endgames (even when they win), they look considerably less clueless at looser time controls, that is, when they are doing more calculation. Looking far enough ahead appears indistinguishable from planning, even if that's not how we humans do it.

While the original Alpha Zero work was done in a closed environment and (to my knowledge) hasn't been directly replicated by outside researchers, it's generally understood that Alpha Zero benefitted from using tensor processing chips that sped up its evaluation (one reason the Stockfish team argued that the Alpha Zero match results weren't a fair comparison), and that for that reason it would likely beat LC0, which runs on stock hardware. Again, NN with faster, and thus more, calculation beats NN without it.

On the other hand, one big reason that NN engines got so much attention was that they didn't just win, but they did it in a noticeably different style. The classic example is the "fawn pawn" (apparently a mishearing of "thorn pawn"), which is an advanced pawn, typically on the sixth rank, that can't be attacked by the defender's pawn, for example a black pawn on h3 when white has castled kingside and played g3.

A strong human player would likely look at such a position and think "Yeah ... that doesn't look good for white ... ", but NN players demonstrated just how bad it is, and therefore why it can be worth playing for even if you have to give up material or it doesn't look good to expose your own king and push a pawn that might have to be supported by pieces instead of other pawns. The NN games also gave some insight as to when to push such a pawn.

The ideal combination of current AB and NN approaches would evaluate positions as well as NNs but as fast as ABs. It's not out of the question, in fact I'd conjecture that it's likely, that AB engines can incorporate some of the knowledge that NNs have turned up directly into their evaluation functions. The resulting code wouldn't have to capture everything the NN's network does, just enough to avoid losing to strategies like fawn pawn attacks, by skewing just enough toward positions that avoid them.

AB engine developers should probably be careful with this, though. Some of a NN's unorthodox behavior represents what we might as well call new insight, but some of it will be due to overfitting or plain old random noise. Nor does an AB developer have to understand exactly how the NN's evaluation works. There's a very direct way of seeing what an NN's evaluation function is doing: play games against it and see what it does.

Because an NN is mostly evaluating individual positions, its play gives a good view into how it does that. A hypothetical perfect player might make totally surprising moves because it can see every variation, but that's not how NNs work. They play surprising moves because the evaluation function gives those moves good ratings.

Another option that comes to mind would be to use neural networks not for playing chess directly, but for tweaking the parameters of traditional chess engines. I'm not sure this is a great idea, though. One major lesson of both types of engine, in their own ways, is that it's risky to build in assumptions -- not necessarily doomed to failure, but risky.

If you're tweaking parameters in an AB engine -- e.g., how much a rook on an open file is worth in relation to a pawn -- you're tuning weights in a system with a bunch of specialized, not-necessarily-linear nodes. In a standard NN you're tuning weights in a (relatively) straightforward multilinear system. It's not clear that an NN built for tweaking weights in a multilinear system would do a good job tweaking the weights on an AB's specialized evaluation function. It might, but it might not.

Looking at it from a more NN perspective, you could try to speed up evaluation by noticing this or that pattern of weights and replacing the generalized tensor calculations with ad-hoc functions that happen to work for those particular weights. For example if the weights are "sparse", that is, most of them are zero, you can write out an explicit formula that combines the terms with non-zero weights.

In theory, you might come up with something that resembles some of the customary rules of thumb. Maybe weights involving a particular kind of piece tend to emphasize empty squares that the piece could move to (open lines, mobility), or pairs of squares the piece could attack if it moved a certain way, and which hold enemy pieces (forks). Again, it's not clear that this would help. It might just get in the way of further improvements, if it turns out that the zero weights that made things sparse really shouldn't have been zero.

If all we know is that neural nets are combining basic features of a chess position according to some magic combination of weights, then all we can say is "there's something that current explicit evaluation functions don't seem to be capturing." If that's how it stays, then we might expect neural networks to outperform engines like Stockfish in the long run. They're already at least comparable, and they haven't been around for nearly as long.

It's quite possible, though, that a neural network's evaluation function is capturing a handful of quantifiable factors that current explicit evaluation functions aren't capturing. In that case, an engine like Stockfish with an updated evaluation function informed by NN behavior should have the upper hand. It could choose promising positions as well as, or about as well as, a neural net, but it would examine them much more efficiently.

It's not clear how much room is left for neural networks to improve. Playing winning chess is just plain hard. Training a network for twice as long generally doesn't produce twice as good a result. It may be that network-based evaluation of individual positions is about as good as it's going to get.

For that matter, though, neither does looking at twice as many moves, or even twice as deep (which should roughly square the number of moves examined), make an AB engine twice as strong. Currently AB engines can examine billions of positions and around twenty plies deep (twenty levels of move/countermove, or ten full moves) in a typical midgame position. Looking at trillions of positions would mean looking more like thirty plies ahead. That ought to yield some sort of improvement, but how much? Most high-level games are draws in any case. As players get stronger, it becomes harder to measure their strength, at least insofar as it takes more games to get enough decisive results.

It's interesting that AB and NN engines appear to be more or less evenly matched. This may be evidence that we're reaching the limits of what computer chess players can do. Or it may not.

In the discussion above, I talk mostly about evaluation functions, weights on numbers and so forth, and not so much about "planning" or "understanding" or "insight". This was a deliberate choice. Chess engines are nothing more or less than computer programs. It's fascinating that they can clearly display some aspects that we associate with intelligence, while clearly lacking others. Neural networks don't fundamentally change this -- they're still just doing calculations -- but they do seem to capture another chunk of what brains do, namely fuzzy pattern matching.

Much discussion of developments like AlphaZero and similar neural network-based programs focuses on what we're learning about algorithms, and in particular how to harness "deep neural networks" (that is, multi-stage neural networks) to solve problems that are widely agreed to require intelligence.

That's fair enough, but in that process we're also learning about the problems themselves. If intelligence is the ability to solve certain kinds of problems, then solving problems by machine tells us something about what it means to be intelligent. While it's natural to want to move the goal posts and try to narrow "intelligence" down so it continues to mean "behavior we can't emulate reasonably well with machines," I argue in the first of the previous posts, that's probably a losing game. The winning game, I think, is gaining a better understanding of various capabilities that we tend to throw under the blanket term "intelligence".

Intermittent Conjecture

Sunday, July 14, 2019

Computer chess: where now?

No comments:

Post a Comment