Intermittent Conjecture: computing

Showing posts with label computing. Show all posts

Wednesday, September 25, 2024

Amplified intelligence, or what makes a computer a computer?

I actually cut two chunks out of What would superhuman intelligence even mean?. I think the one that turned into this post is the more interesting of the two, but this one's short and I didn't want to discard it entirely.

Two very clear cases of amplified human intelligence are thousands of years old: writing and the abacus. Both of them amplify human memory, long-term for writing and short-term for the abacus. Is a person reading a clay tablet or calculating with an abacus some sort of superhuman combination of human and technology? No? Why not?

Calculating machines and pieces of writing are passive. They don't do anything on their own. They need a human, or something like a human, to have any effect. Fair enough. To qualify as superhuman by itself, a machine needs some degree of autonomy.

Autonomous machines are more recent than computing and memory aids. The first water clocks were probably built two or three thousand years ago, and there is a long tradition in several parts of the world of building things that, given some source of power, will perform a sequence of actions on their own without any external guidance.

But automata like clocks and music boxes are built to perform a particular sequence of actions from start to finish, though some provide a way to change the program between performances. Many music boxes use some sort of drum that encodes the notes of the tune and can be swapped out to play a different tune, for example. Nevertheless, once the automaton starts its performance, it's going to perform whatever it's been set up to perform.

There's one more missing piece: The ability to react to the external world, to do one thing based on one stimulus and a different thing based on a different stimulus, that is, to perform conditional actions. Combine this with some sort of updatable memory and you have the ability to perform different behavior based on something that happened in the past, or even multiple things that happened at different points in the past.

My guess is that both of those pieces are also older than we might think, but the particular combination of conditional logic and memory that they use is the real difference between the modern computers that first appeared in the mid twentieth century and the automata of the centuries before.

Sunday, May 2, 2021

Things are more like they are now than they have ever been

I hadn't noticed until I looked at the list, but it looks like this is post 100 for this blog. As with the other blog, I didn't start out with a goal of writing any particular number of posts, or even on any particular schedule. I can clearly remember browsing through a couple dozen posts early on and feeling like a hundred would be a lot. Maybe I'd get there some day or maybe I wouldn't. In any case, it's a nice round number, in base 10 anyway, so I thought I'd take that as an excuse to go off in a different direction from some of the recent themes like math, cognition and language.

The other day, a colleague pointed me at Josh Bloch's A Brief, Opinionated History of the API (disclaimer: Josh Bloch worked at Google for several years, and while he was no longer at Google when he made the video, it does support Google's position on the Google v. Oracle suit). What jumped out at me, probably because Bloch spends a good portion of the talk on it, was just how much the developers of EDSAC, generally considered "the second electronic digital stored-program computer to go into regular service", anticipated, in 1949.

Bloch argues that its subroutine library -- literally a file cabinet full of punched paper tapes containing instructions for performing various common tasks -- could be considered the first API (Application Program Interface), but the team involved also developed several other building blocks of computing, including a form of mnemonic assembler (a notation for machine instructions designed for people to read and write without having to deal with raw numbers) and a boot loader (a small program whose purpose is to load larger programs into the computer memory). For many years, their book on the subject, Preparation of Programs for Electronic Digital Computers, was required reading for anyone working with computers.

This isn't the first "Wow, they really thought of everything" moment I've had in my field of computing. Another favorite is Ivan Sutherland's Sketchpad (which I really thought I'd already blogged about, but apparently not), generally considered the first fully-developed example of a graphical user interface. It also laid foundations for object-oriented programming and offers an early example of constraint-solving as a way of interacting with computers. Sutherland wrote it in 1963 as part of his PhD work.

These two pioneering achievements lie either side of the 1950s, a time that Americans often tend to regard as a period of rigid conformity and cold-war paranoia in the aftermath of World War II (as always, I can't speak for the rest of the world, and even when it comes to my own part, my perspective is limited). Nonetheless, it was also a decade of great innovation, both technically and culturally. The Lincoln X-2 computer that Sketchpad ran on, released in 1958, had over 200 times the memory EDSAC had in 1949 (it must also have run considerably faster, but I haven't found the precise numbers). This development happened in the midst of a major burst of progress throughout computing. To pick a few milestones:

In 1950, Alan Turing wrote the paper that described the Imitation Game, now generally referred to as the Turing test.
In 1951, Remington Rand released the UNIVAC-I, the first general-purpose production computer in the US. The transition from one-offs to full production is a key development in any technology.
In 1951, the solid-state transistor was developed.
In 1952, Grace Hopper published her first paper on compilers. The terminology of the time is confusing, but she was specifically talking about translating human-readable notation, at a higher level than just mnemonics for machine instructions, into machine code, exactly what the compilers I use on a daily basis do. Her first compiler implementation was also in 1952.
In 1953, the University of Machester prototyped its Transistor Computer, the world's first transistorized computer, beginning a line of development that includes all commercial computers running today (as of this writing; I'm counting current quantum computers as experimental).
In 1956, IBM prototyped the first hard drive, a technology still in use (though it's on the way out now that SSDs are widely available).
In 1957, the first FORTRAN compiler appeared. In college, we loved to trash FORTRAN (in fact "FORTRASH" was the preferred name), but FORTRAN played a huge role in the development of scientific computing, and is still in use to this day.
In 1957, the first COMIT compiler appeared, developed by Victor Yngve et. al.. While the language itself is quite obscure, it begins a line of development in natural-language processing, one branch of which eventually led to everyone's favorite write-only language, Perl.
In 1958, John McCarthy developed the first LISP implementation. LISP is based on Alonzo Church's lambda calculus, a computing model equivalent in power to the Turing/Von Neumann model that CPU designs are based on, but much more amenable to mathematical reasoning. LISP was the workhorse of much early research in AI and its fundamental constructs, particularly lists, trees and closures, are still in wide use today (Java officially introduced lambda expressions in 2014). Its explicit treatment of programs as data is foundational to computer language research. Its automatic memory management, colloquially known as garbage collection, came along a bit later, but is a key feature of several currently popular languages (and explicitly not a key feature of some others). For my money, LISP is one of the two most influential programming languages, ever.
Also in 1958, the ZMMD group gave the name ALGOL to the programming language they were working on. The 1958 version included "block statements", which supported what at the time was known as structured programming and is now so ubiquitous no one even notices there's anything special about it. The shift from "do this action, now do this calculation and go to this step in the instructions if the result is zero (or negative, etc.)" to "do these things as long as this condition is true" was a major step in moving from a notation for what the computer was doing to a notation specifically designed to help humans to work with algorithms. Two years later, Algol 60 codified several more significant developments from the late 50s, resulting in a language famously described as "an improvement on its predecessors and many of its successors". Most if not all widely-used languages -- for example Java, C/C++/C#, Python, JavaScript/ECMAScript, Ruby ... can trace their control structures and various other features directly back to Algol, making it, for my money, the other of the two most influential programming languages, ever.
In 1959, the CODASYL committee published the specification for COBOL, based on Hopper's work on FLOW-MATIC from 1950-1959. As with FORTRAN, COBOL is now the target of widespread derision, and its PICTURE clauses turned out to be a major issue in the notorious Y2K panic. Nonetheless, it has been hugely influential in business and government computing and until not too long ago more lines of code were written in COBOL than anything else (partly because COBOL infamously requires more lines of code than most languages to do the same thing)
In 1959, Tony Hoare wrote Quicksort, still one of the fastest ways to sort a list of items, the subject of much deep analysis and arguably one of the most widely-implemented and influential algorithms ever written.

This is just scratching the surface of developments in computing during the 1950s, and I've left off one of the great and needless tragedies of the field: Alan Turing's suicide in 1954.

On a different note, in 1958, the National Advisory Committee on Aeronautics became the National Aeronautics and Space Administration and disbanded its pool of computers, that is, people who performed computations, and Katherine Johnson began her career in aerospace technology in earnest.

It wasn't just a productive decade in computing. Originally, I tried to list some of the major developments elsewhere in the sciences, and in art and culture in general in 1950s America, but I eventually realized that there was no way to do it without sounding like one of those news-TV specials and also leaving out significant people and accomplishments through sheer ignorance. Even in the list above, in a field I know something about, I'm sure I've left out a lot, and someone else might come up with a completely different list of significant developments.

As I was thinking through this, though, I realized that I could write much the same post about any of a number of times and places. The 1940s and 1960s were hardly quiet. The 1930s saw huge economic upheaval in much of the world. The Victorian era, also often portrayed as a period of stifling conformity, not to mention one of the starkest examples of rampant imperialism, was also a time of great technical innovation and cultural change. The idea of the Dark Ages, where supposedly nothing of note happened between the fall of Rome and the Renaissance, has largely been debunked, and so on and on.

All of the above is heavily tilted toward "Western" history, not because it has a monopoly on innovation, but simply because I'm slightly less ignorant of it. My default assumption now is that there has pretty much always been major innovation affecting large portions of the world's population, often in several places at once, and the main distinction is how well we're aware of it.

While Bloch's lecture was the jumping-off point for this post, I didn't take too long for me to realize that the real driver was one of the recurring themes from the other blog: not-so-disruptive technology. That in turn comes from my nearly instinctive tendency to push back against "it's all different now" narratives and particularly the sort of breathless hype that, for better or worse, the Valley has excelled in for generations.

It may seem odd for someone to be both a technologist by trade and a skeptical pourer-of-cold-water by nature, but in my experience it's actually not that rare. I know geeks who are eager first adopters of new shiny things, but I think there are at least as many who make a point of never getting version 1.0 of anything. I may or may not be more behind-the-times than most, but the principle is widely understood: Version 1.0 is almost always the buggiest and generally harder to use than what will come along once the team has had a chance to catch a breath and respond to feedback from early adopters. Don't get me wrong: if there weren't early adopters, hardly anything would get released at all. It's just not in my nature to be one.

There are good reasons to put out a version 1.0 that doesn't do everything you'd want it to and doesn't run as reliably as you'd like. The whole "launch and iterate" philosophy is based on the idea that you're not actually that good at predicting what people will like or dislike, so you shouldn't waste a lot of time building something based on your speculation. Just get the basic idea out and be ready to run with whatever aspect of it people respond to.

Equally, a startup, or a new team within an established company, will typically only command a certain amount of resources (giving a new team or company carte blanche often doesn't end well). At some point you have to get more resources in, either from paying customers or from whoever you can convince that yes, this is really a thing. Having a shippable if imperfect product makes that case much more effectively than having a bunch of nice-looking presentations and well-polished sales pitches. Especially when dealing with paying customers.

But there's probably another reason to put things out in a hurry. Everyone knows that tech, and software in general, moves fast (whether or not it also breaks stuff). In other words, there's a built-in cultural bias toward doing things quickly whether it makes sense or not, and then loudly proclaiming how fast you're moving and, therefore, how innovative and influential you must be. I think this is the part I tend to react to. It's easy to confuse activity with progress, and after seeing the same avoidable mistakes made a few times in the name of velocity, the eye can become somewhat jaundiced.

As much as I may tend toward that sort of reaction, I don't particularly enjoy it. A different angle is to go back and study, appreciate, even celebrate, the accomplishments of people who came before. The developments I mentioned above are all significant advances. They didn't appear fully-formed out of a vacuum. Each of them builds on previous developments, many just as significant but not always as widely known.

Looking back and focusing on achievements, one doesn't see the many false starts and oversold inventions that went nowhere, just the good bits, the same way that we remember and cherish great music from previous eras and leave aside the much larger volume of unremarkable or outright bad.

Years from now, people will most likely look back on the present era in much the same way and pick out the developments that really mattered, leaving aside much of the commotion surrounding it. It's not that the breathless hype is all wrong, much less that everything important has already happened, just that from the middle of it all it's harder to pick out what's what. Not that there's a lack of opinions on the matter.

The quote in the tile has been attributed to several people, but no one seems to know who really said it first.

Wednesday, May 22, 2019

What's chess to a computer?

It's now quite clear that computer chess is going to eat this blog for a while, so I'm in the process of unpacking several points that have come up in my earlier attempts to just make a few quick points and move on. I'm also coming to realize that, since I'm having even more trouble than usual organizing this into a coherent whole, I'll probably end up with N posts worth of material spread across N+1 posts, if not more. Apologies. This particular post is an excursion into computer science land to look at computer chess from that point of view, rather than trying to understand it as chess.

Chess, fundamentally, falls into a large class of problems called tree searches, in which a number of interconnected structures are explored by proceeding from a starting point to those that are connected directly to it, and those connected directly to the first set, and so forth. There are lots of different ways to do this, depending mostly on what order you do the visiting in.

Examples of tree searches, besides finding good chess moves, include figuring out the best way to route packets in a network, solving a puzzle where odd-shaped blocks have to be arranged to form a cube, finding the shortest path through a maze and following reflections and refractions to find what color a particular pixel in an image will be (ray-tracing). There are many others.

Each point in a tree search is called a node. In chess, the nodes are chess positions (including where the pieces are, which side is to move and a few other bits such as whether each side's king has moved). At any point in a tree search you are either at a leaf node, meaning you have no further choices to make -- in chess, you have arrived at a win, loss or draw, or more often simply decided not to explore any deeper -- or not, in which case you have one or more child nodes to choose from. In chess, the children of a position are all the positions that result from legal moves in the current position.

Since there is no element of chance in chess, there is one and only one game tree that starts at the initial position (the root node) and includes every possible position resulting from every series of legal moves. This means that in theory there it's possible to know whether any position is a win for white, a win for black or a draw, assuming each side plays perfectly. That includes the initial position, so in theory there is an optimal strategy for each player and all perfectly-played games of chess have the same result.

The problem is that the game tree for chess is incomprehensibly huge. There are way, way more legal chess positions than fundamental particles in the known universe. No computer will ever be able to examine more than a minuscule fraction of them. That doesn't necessarily mean that chess will never be "solved", in the sense of proving what the optimal strategies are for each player and what the result will be with perfect play. In theory there could be a way to rule out all but a tiny fraction of possible positions and exhaustively search what's left. In practice, we're nowhere near being able to do that.

Instead, we're left with approximate approaches. Either exhaustively search a tiny, tiny fraction of the possibilities, which is what AB engines do, or find some sort of approximate measure of what constitutes a "good" position and "just play good moves", using a partial search of the possible continuations to double-check that there aren't any obvious blunders. This is what humans and NN engines do.

One thing that makes chess particularly interesting as a search problem is that it's not totally random, but it's not totally orderly either.

Imagine a search tree as a series of branching passageways. At the end of every passageway, after however many branches, is a card reading "black wins", "white wins" or "draw". If all you know is that the cards are there then you might as well pick randomly at each junction. On the other hand, if you have a map of the entire complex of passageways and which card is at the end of each of them, you can find your way perfectly. Depending on where the cards are, you might or might not be able to win against a perfect player, but if your opponent makes a mistake you'll know exactly how to take advantage.

In chess there are far too many passages to map out, but there are markings on the walls. For example, there are several types of endgame positions which even a human can always play perfectly. If you reach a position where you have a king and a rook and your opponent only has a king (and it's not a stalemate to start out with), you can win by following a pretty simple recipe.

There are also databases of positions ("tablebases") with known outcomes, found by exhaustively searching backwards from all possible checkmate positions. Currently every endgame with seven or fewer pieces (including the kings) is known to be either a win for white, win for black or a draw. If it shows up by working backward from one of the possible checkmates, it's a win. If it didn't, it's a draw (the process of working backwards from known positions is itself a tree search).

There are a few kinds of positions that are well-known to be immediate wins for one side or the other, regardless of what other pieces are on the board. The classic example is a back-rank mate, where one side's king is blocked between the edge of the board and its own pawns and the other side has major pieces able to give check. It's simple to calculate whether the defending side can capture all the pieces that can give check or can safely interpose (block). If not, it's game over. My understanding is that chess engines don't bother to special-case these, since they're typically looking several moves ahead anyway.

And then there's a lot of gray area. If one side gains a material advantage, it will generally win, eventually, but this is far from an ironclad rule. There are plenty of cases where one side will give up material (sacrifice) in order to gain an advantage. This ranges from spectacular combinations where one side gives up a queen, or more, in order to get a quick checkmate, to "positional sacrifices" where one side gives up something small, generally a pawn, in order to get an initiative. Whether a positional sacrifice will be worth it is usually not clear-cut, though in some well-known cases (for example, the Queen's gambit), it's generally agreed that the other side is better off not taking the material and should concentrate on other factors instead.

There are also temporary sacrifices where one side gives up material in order to quickly win back more. If you're playing against a strong player and it looks like they've left a piece hanging, be very careful before taking it.

In short, which side has more material is a clue to which side will win, and it's often a good idea to try to win material, but it's not a guarantee. This goes equally well for other generally accepted measures of how well one is doing. Maybe it's worth it to isolate or double one of your pawns in order to gain an initiative, but often it's not. Games between human masters often turn on whose estimations of which small trade-offs are correct.

From a computing point of view, since it's impossible to follow all the passageways, we have to rely on the markings on the wall, that is, the characteristics of particular positions. An AB engine will look exhaustively at all the passages that it explores (subject to the "alpha/beta pruning" that gives them their name, which skips moves that can't possibly lead to better results than are already known). But in most positions it will have to give up its search before it finds a definitive result. In that case it has to rely on its evaluation function. Basically it's saying "I've looked ten branches ahead and if I make this move the worst that can happen is I'll get to a position worth X".

That's much better than picking randomly, but it only works well if you explore quite deeply. A good human player can easily beat a computer that's only searching three moves (six plies) ahead because the human can easily find something that looks good in the short term but turns bad a few moves later. Once an AB engine is looking dozens of moves ahead, though, there are many fewer short-term traps that don't pan out until late enough for the search to miss them.

This is a lot like the problem of local minima in hill-climbing algorithms. If you're trying to get to the highest point in a landscape, but it's so dark or foggy you can only see a short distance around you, your best bet could well be to go uphill until you can't any more, even though that fails if you end up atop a small hill that's not actually the high point. The further you can see, the better chance you have of finding your way to the true high point, even if you sometimes have to go downhill for a while to get on the right path.

Humans and neural networks, however, are doing something slightly different. They can't look very far, but they're able to read the landscape better. It's kind of like being on a mountainside in the fog and being able to look at the vegetation, or smell the wind, or examine the types of rock in the area or whatever, and be able to say that the peak is more likely this way than that way.

This wouldn't work if there weren't some sort of correlation between the features of a position and how likely it is to ultimately lead to a win. It's not a property of tree-searching algorithms. It's a property of chess as a tree-searching problem. It's probably a property of any chess-like game that humans find interesting, because that's probably what we find interesting about chess: there's some order to it, in that some positions are clearly better than others, but there's some mystery as well, since we often can't tell for sure which option is better.

Thursday, November 9, 2017

syl·lab·i·fi·ca·tion

[Author's note: When I started this, I thought it was going to touch on deep questions of language and cognition. It ended up kinda meandering around some random bits of computer word-processing. This happens sometimes. I'm posting it anyway since, well, it's already written. --D.H.]

Newspaper and magazine articles are traditionally typeset in narrow, justified columns. "Justified" here means that every line is the same width (unlike, say, with most blog posts). If the words aren't big enough to fill out a line, the typesetter will widen the spaces to fill it out. If the words are a bit too long, the typesetter might move the last word to the next line and then add space to what's left.

Originally, a typesetter was a person who physically inserted pieces of lead type into a form. Later, it was a person operating a Linotype™ or similar machine to do the same thing. These days it's mostly done by software.

Technically, laying out a paragraph to minimize the amount of extra space is not trivial, but certainly feasible, the kind of thing that would make a good undergraduate programming exercise. Several algorithms are available. They may not always produce results as nice as an experienced human typesetter, but they do well enough for most purposes.

One option for getting better line breaks and better-looking paragraphs is to hyphenate. If your layout looks funny because you've got floccinaucinihilipilification in the middle of a line, you might try breaking it up as, say floccinaucinihili-
pilification. It will probably be easier to lay out those two pieces rather than trying to make room for one large one.

You can't just throw a hyphen in anywhere. There's a strong tendency to read whatever comes before and after the hyphen as independent units, so you don't want to break at wee-
knights or pre-
aches.

In many languages, probably most, this isn't a big problem. For example, Spanish has an official set of rules that gives a clear hyphenation for any word (actually there are several of these, depending on what country you're in). It's hard for English, though, for the same reason that spelling is hard for English -- English spelling is historical, not phonetic, and has so far resisted attempts at ~~standardisation~~ standardization and fonetissizing.

So instead we have the usual suspects, particularly style guides produced by various academic and media organizations. This leads to statements like this one from the Chicago Manual of Style:

Chicago favors a system of word division based on pronunciation and more or less demonstrated by the recommendations in Webster’s tenth.

The FAQ list that that comes from has a few interesting cases, though I'm not sure that "How should I hyphenate Josephine Bellver's last name?" actually qualifies as a frequently asked question. The one that interests me here concerns whether it should be "bio-logy" or "biol-ogy". CMOS opts for "biol-ogy", going by pronunciation rather than etymology.

Which makes sense, in that consistently going by pronunciation probably makes reading easiest. But it's also a bit ironic, in that English spelling is all about etymology over pronunciation.

Either approach is hard for computers to cope with, since they both require specific knowledge that's not directly evident from the text. It's common to teach lists of rules, which computers do deal with reasonably well, but the problem with lists of rules for English is that they never, ever work. For example, it's going to be hard to come up with a purely rule-based approach that divides "bark-ing" but also "bar-keeper".

This is why style guides tend to fall back on looser guidance like "divide the syllables as they're pronounced". Except -- whose pronunciation? When I was a kid I didn't pronounce an l in also or an n in government (I've since absorbed both of those from my surroundings). I'm pretty sure most American speakers don't pronounce a t in often. So how do you hyphenate those according to pronunciation?

Fortunately, computers don't have to figure this out. A hyphenation dictionary for 100,000 words will cost somewhere around a megabyte, depending on how hard you try to compress it. That's nothing in modern environments where a minimal "Hello world" program can run into megabytes all by itself (it doesn't have to, but it's very easy to eat a few megabytes on a trivial program without anyone noticing).

But what if the hyphenator runs across some new coinage or personal name that doesn't appear in the dictionary -- for example, whoever put the dictionary together didn't know about Josephine Bellver? One option is just not to try to hyphenate those. A refinement of that would be to allow the author to explicitly add a hyphen. This should be the special "optional hyphen" character, so that you don't get hyphens showing up in the middle of lines if you later edit the text. That way if you invent a really long neologism, it doesn't have to mess up your formatting.

If there's a point to any of this, it's that computers don't have to follow specific rules, except in the sense that anything a computer does follows specific rules. While it might be natural for a compugeek to try to come up with the perfect hyphenation algorithm, the better engineering solution is probably to treat every known word as a special case and offer a fallback (or just punt) when that fails.

This wasn't always the right tradeoff. Memory used to be expensive, and a tightly-coded algorithm will be much smaller than a dictionary. But even then, there are tricks to be employed. One of my all-time favorite hacks compressed a spelling dictionary down to a small bitmap that didn't even try to represent the actual words. I'd include a link, but the only reference I know for it, Programming Pearls by Jon Bentley, isn't online.

Saturday, November 4, 2017

Surrender, puny humans!

A while ago, Deep Mind's AlphaGo beat human champion Lee Sedol at the game of go. This wasn't just another case of machines beating humans at games of skill.

Granted, from a mathematical point of view it was nothing special. Games like go, chess, checkers/draughts and tic-tac-toe, can in theory be "solved" by simply bashing out all the possible combinations of moves and seeing which ones lead to wins for which players.

Naturally the technical definition of "games like go, etc." is a bit, well, technical, but the most important stipulations are

perfect information -- each player has the same knowledge of the game as the others
no random elements

That leaves out card games like poker and bridge (imperfect information, random element) and Parcheesi (random element) and which-hand-did-I-hide-the-stone-in (imperfect information), but it includes most board games (Reversi, Connect 4, Pente, that game where you draw lines to make squares on a field of dots, etc. -- please note that most of these are trademarked).

From a practical point of view, there is sort of a pecking order:

Tic-tac-toe is so simple that you can write down the best strategy on a piece of paper. Most people grow bored of it quickly since the cat always wins if everyone plays correctly, and pretty much everyone can.
Games like ghost or Connect 4 have been "strongly solved", meaning that there's a known algorithm for determining whether a given position is a win, loss or draw for the player whose turn it is. Typically the winning strategy is fairly complex, in some cases too complex for a human to reasonably memorize. A human will have no chance of doing better than a computer for such games (unless the computer is programmed to make mistakes), but might be able to do as well.
Checkers is too complex for humans to play perfectly, but it has been "weakly solved". This means that it's been proved that with perfect play, the result is always a draw, but, not all legal positions have been analyzed, and there is currently nothing that will always be able to tell you if a particular position is a win for either side, or a draw. In other words, for a weakly solved game, we can answer win/loss/draw for the initial position, and typically many others, but not for an arbitrary position.
Chess has not been solved, even in theory, but computer chess players that bash out large numbers of sequences of moves can consistently beat even the best human players.

In most cases the important factor in determining where a game fits in this order is the "branching factor", which is the average number of moves available at any given point. In tic-tac-toe, there are nine first moves, eight second moves, and so on, and since the board is symmetrical there are effectively even fewer. In many positions there's really only one (win with three-in-a-row or block your opponent from doing that).

In Connect 4, there are up to six legal moves in any position. In checkers there can be a dozen or so. In chess, a couple dozen is typical. As with tic-tac-toe there are positions where there is only one legal move, or only one that makes sense, but those are relatively rare in most games.

In go, there are typically more than a hundred different possible moves, and go positions tend not to be symmetrical. Most of the time a reasonably strong human player will only be looking at a small portion of the possible moves. In order to have any hope of analyzing a situation, a computer has to be able to narrow down the possibilities by a similar amount. But to beat a human, it has to be able to find plays that a human will miss.

I've seen go described as a more "strategic" game, one that humans can develop a "feel" for that computers can't emulate, but that's not entirely true. Tactics can be quite important. Much of the strategy revolves around deciding which tactical battles to pursue and which to leave for later or abandon entirely. At least, that's my understanding. I'm not really a go player.

AlphaGo, and programs like it, solved the narrowing-down problem by doing what humans do: collecting advice from strong human players and studying games played by them. Historically this has meant a programmer working with an expert player to formulate rules that computers can interpret, along with people combing through games to glean more rules.

As I understand it (and I don't know anything more about Deep Mind or AlphaGo than the public), AlphaGo used machine learning techniques to automate this process, but the source material was still games played by human players. [Re-reading this in light of a more recent post, I see I left out a significant point: AlphaGo (and AlphaZero) encode their evaluation of positions -- their understanding of the game -- as neural networks rather than explicit rules. While a competent coder could look at the code for explicit rules and figure out what they were doing, no on really knows how to decode what a neural network is doing, at least not to the same level of detail -- D.H. Jan 2019]

The latest iteration (AlphaGo Zero, of course) dispenses with human input. Rather than studying human games, it plays against itself, notes what works and what doesn't, and tries again after incorporating that new knowledge. Since it's running on a pretty hefty pile of hardware, it can do this over and over again very quickly.

This approach worked out rather well. AlphaGo Zero can beat the AlphaGo that beat Lee Sedol, making it presumably the strongest go player in the world. [and it has since done the same thing with chess and shogi, though its superiority in chess is not clear-cut. See the link above for more details. -- D.H. Jan 2019]

On the one hand, this is not particularly surprising. It's a classic example of what I call "dumb is smarter" on the other blog, where a relatively straightforward approach without a lot of built in assumptions can outperform a carefully crafted system with lots of specialized knowledge baked in. This doesn't mean that dumb is necessarily smartest, only that it often performs better than one might expect, because the downside to specialized knowledge is specialized blind spots.

On the other hand, this is all undeniably spooky. An AI system with no baked-in knowledge of human thought is able, with remarkably little effort, to outperform even the very best of us at a problem that had long been held up as something unreachable by AI, something that only human judgement could deal with effectively. If computers can beat us at being human, starting essentially from scratch (bear in mind that the hardware that all this is running on is largely designed and built by machine these days), then what, exactly are we meat bags doing here?

So let's step back and look at the actual problem being solved: given a position on a go board, find the move that is most likely to lead to capturing the most stones and territory at the end of the game.

Put that way, this is a perfectly well-posed optimization problem of the sort that we've been using computers to solve for decades. Generations, really, at this point. Granted, one particular solution -- bashing out all possible continuations from a given position -- is clearly not best suited, but so what? Finding the optimum shape -- or at least a better one -- for an airplane wing isn't well suited to that either, but we've made good progress on it anyway using different kinds of algorithms.

So "chess-style algorithms suck at go, therefore go is inherently hard" was a bad argument from the get-go.

From what I've seen in the press, even taking potential hype with a grain of salt, AlphaGo Zero is literally rewriting the books on go, having found opening moves that have escaped human notice for centuries. But that doesn't mean this is an inherently hard problem. Humans failing to find something they're looking for for centuries means it's a hard problem for humans.

We humans are just really bad at predicting what kinds of problems are inherently hard, which I'd argue is the same as being hard to solve by machine*. Not so long ago the stereotype of a genius was someone who "knew every word in the dictionary" or "could multiply ten-digit numbers immediately", both of which actually turned out to be pretty easy to solve by machine.

Once it was clear that some "genius" problems were easy for machines, attention turned to things that were easy for people but hard for machines. There have been plenty of those -- walking, recognizing faces, translating between speech and text, finding the best move on the go board. Those held out for quite a long time as "things machines will never be able to do", but the tide has been turning on them as well thanks, I think, to two main developments:

We can now build piles of hardware that have, in a meaningful sense, more processing power than human brains.
With these new piles of hardware, techniques that looked promising in the past but never really performed are now able to perform well, the main example being neural network-style algorithms

At this point, I'm convinced that trying to come up with ever fuzzier and more human things that only human brains will ever be able to do is a losing bet. Maybe not now, but in the long run. I will not be surprised at all if I live to see, say

Real time speech translation that does as well as a human interpreter.
Something that can write a Petrarchan sonnet on a topic of choice, say the futility of chasing perfection, that an experienced and impartial reviewer would describe as "moving", "profound" and "original".
Something that could read a novel and write a convincing essay on it comparing the story to specific experiences in the real world, and answer questions about it in a way that left no choice but to say that in some meaningful sense the thing "understood" what it read.
Something that it would be hard to argue didn't have emotions -- though the argument would certainly be made.

[On the other hand, I also won't be shocked if these don't totally pan out in the next few decades --D.H. Feb 2019]

These all shade into Turing test territory. I've argued that, despite Alan Turing's genius and influence, the Turing test is not necessarily a great test of whatever we mean by intelligence, and in particular it's easy to game because people are predisposed to assume intelligence. I've also argued that "the Singularity" is an ill-defined concept, but that's really a different thread. Nevertheless, I expect that, sooner or later, we will be able to build things that pass a Turing test with no trickery, in a sense that most people can agree on.

And that's OK.

Or at least, we're going to have to figure out how to be OK with it. Stopping it from happening doesn't seem like a realistic option.

This puts us firmly in the territory of I, Robot and other science fiction of its era and more recently (the modern Westworld reboot comes to mind), which is one reason I chose the cheesy title I did. Machines can already do a lot of things better than we can, and the list will only grow over time. At the moment we still have a lot of influence over how that happens, but that influence will almost certainly decrease over time (the idea behind the Singularity is that this will happen suddenly, in fact nearly instantaneously, once the conditions are right).

The question now is how to make best use of what influence we still have while we still have it. I don't really have any good, sharp answers to that, but I'm pretty sure it's the right question.

* There's a very well-developed field, complexity theory, dealing in what kinds of problems are hard or easy for various models of computing in an absolute, quantifiable sense. This is largely distinct from the question of what kinds of games or other tasks computers should be good at, or at least better than humans at, though some games give good examples of various complexity classes. One interesting result is that it's often easy (in a certain technical sense) to produce good-enough approximate solutions to problems that are provably very hard to solve exactly. Another interesting result is that it can be relatively tricky to find hard examples of problems that are known to be hard in general.

Wednesday, July 19, 2017

The human perspective and its limits

A couple more points occurred to me after I hit "publish" on the previous post. Both of them revolve around subjectivity versus objectivity, and to what extent we might be limited by our human perspective.

In trying to define whether a kind of behavior is simple or complex, I gave two different notions which I claimed were equivalent: how hard it is to describe and how hard it is to build something to copy it.

The first is, in a sense, subjective, because it involves our ability to describe and understand things. Since we describe things using language, it's tied to what fits well with language. The second is much more objective. If I build a chess-playing robot, something with no knowledge of human language or of chess could figure out what it was doing, at least in principle.

One of the most fundamental results in computer science is that there are a number of very simple computing models (stack machines, lambda calculus, combinators, Turing machines, cellular automata, C++ templates ... OK, maybe not always so simple) which are "functionally complete". That means that any of them can compute any "total recursive function". This covers a wide range of problems, from adding numbers to playing chess to finding cute cat videos and beyond.

It doesn't matter which model you choose. Any of them can be used to simulate any of the others. Even a quantum computer is still computing the same kinds of functions [um ... not 100% sure about that ... should run that down some day --D.H.]. The fuss there is about the possibility that a quantum computer could compute certain difficult functions exponentially faster than a non-quantum computer.

Defining a totally recursive function for a problem basically means translating it into mathematical terms, in other words, describing it objectively. Computability theory says that if you can do that, you can write a program to compute it, essentially building something to perform the task (generally you tell a general-purpose computer to execute the code you wrote, but if you really want to you can build a physical circuit to do the what the computer would do).

So the two notions, of describing a task clearly and producing something to perform it are, provably, equivalent. There are some technical issues with the notion of complexity here that I'm going to gloss over. The whole P = NP thing revolves around whether being able to state a problem simply means being able to solve it simply, but when it comes to deciding whether recognizing faces is harder than walking, I'm going to claim we can leave that aside.

The catch here is that my notion of objectivity -- defining a computable function -- is ultimately based on mathematics, which in turn is based on our notion of what it means to prove something (the links between computing and theorem proving are interesting and deep, but we're already in deep enough as it is). Proof, in turn, is -- at least historically -- based on how our minds work, and in particular how language works. Which is what I called "subjective" at the top.

So, is our notion of how hard something is to do mechanically -- my ostensibly "objective" definition -- limited by our modes of reasoning, particularly verbal reasoning, or is verbal/mathematical reasoning a fundamentally powerful way of describing things that we happened to discover because we developed minds capable of apprehending it? I'd tend to think the latter, but then maybe that's just a human bias.

Second, as to our tendency to think that particularly human things like language and house-building are special, that might not just be arrogance, even if we're not really as special as we'd like to think. We have a theory of mind, and not just of human minds. We attribute very human-like motivations to other animals, and I'd argue that in many, maybe most, cases we're right. Moreover, we also attribute different levels of consciousness to different things (things includes machines, which we also anthropomorphize).

There's a big asymmetry there: we actually experience our own consciousness, and we assume other people share the same level of consciousness, at least under normal circumstances, and we have that confirmed as we communicate our consciousnesses to each other. It's entirely natural, then, to see our own intelligence and consciousness, which we see from the inside in the case of ourselves and close up in the case of other people, as particularly richer and more vivid. This is difficult to let go of when trying to study other kinds of mind, but it seems to me it's essential at least to try.