Intermittent Conjecture: learning

Showing posts with label learning. Show all posts

Sunday, December 3, 2023

What would superhuman intelligence even mean?

Artificial General Intelligence, or AGI, so the story goes, isn't here yet, but it's very close. Soon we will share the world with entities that are our intellectual superiors in every way, that have their own understanding of the world and can learn any task and execute it flawlessly, solve any problem perfectly and generally outsmart us at every turn. We don't know what the implications of this are (and it might not be a good idea to ask the AGIs), but they're certainly huge, quite likely existential.

Or at least, that's the story. For a while now, my feeling has been that narratives like this one say more about us than they do about AI technology in general or about AGI in particular.

At the center of this is the notion of AGI itself. I gave a somewhat extreme definition above, but not far, I think, from what many people think it is. OpenAI, whose mission is to produce it, has a more focused and limited definition. While the most visible formulation is that an AGI would be "generally smarter than humans", the OpenAI charter defines it as "a highly autonomous system that outperforms humans at most economically valuable work". While "economically valuable work" may not be the objective standard that it's trying to be here -- valuable to whom? by what measure? -- it's still a big step up from "generally smarter".

Google's Deep Mind team (as usual, I don't really know anything you don't, and couldn't tell you anyway) lays out more detailed criteria, based on three properties: autonomy, performance and generality. A system can exhibit various levels of each of these, from zero (a desk calculator, for example, would score low across the board) to superhuman, meaning able to do better than any human. In this view there is no particular dividing line between AGI and not-AGI, but anything that scored "superhuman" on all three properties would have to be on the AGI side. The paper calls this Artificial Superintelligence (ASI), and dryly evaluates it as "not yet achieved".

There are several examples of superhuman intelligence in current AI systems. This blog's favorite running example, chess engines, can consistently thrash the world's top human players, but they're not very general (more on that in a bit). The AlphaFold system can predict how a string of amino acids will fold up into a protein better than any top scientist, but again, it's specialized to a particular task. In other words, current AIs may be superhuman, but not in a general way.

As to generality, LLMs such as ChatGPT and Bard are classified as "Emerging AGI", which is the second of six levels of generality, just above "No AI" and below Competent, Expert, Virtuoso and Superhuman. The authors do not consider LLMs, including their own, as "Competent" in generality. Competent AGI is "not yet achieved." I tend to agree.

So what is this "generality" we seek?

Blaise Agüera y Arcas and Peter Norvig (both at Google, but not at DeepMind, at least not at the time) argue that LLMs are, in fact, AGI. That is, flawed though they are, they're not only artificial intelligence, which is not really in dispute, but general. They can converse on a wide range of topics, perform a wide range of tasks, work in a wide range of modalities, including text, images, video, audio and robot sensors and controls, use a variety of languages, including some computer languages, and respond to instructions. If that's not generality, then what is?

On the one hand, that seems hard to argue with, but on the other hand, it's hard to escape the feeling that at the end of the day, LLMs are just producing sequences of words (or images, etc.), based on other sequences of words (or images, etc.). While it's near certain that they encode some sorts of generalizations about sequences of words, they also clearly don't encode very much if anything about what the words actually mean.

By analogy, chess engines like Stockfish make fairly simple evaluations of individual positions, at least from the point of view of a human chess players. There's nothing in Stockfish's evaluation function that says "this position would be good for developing a queenside attack supported by a knight on d5". However, by evaluating huge numbers of positions, it can nonetheless post a knight on d5 that will end up supporting a queenside attack.

A modern chess engine doesn't try to just capture material, or follow a set of rules you might find in a book on chess strategy. It performs any number of tactical maneuvers and implements any number of strategies that humans have developed over the centuries, and some that they haven't. If that's not general, what is?

And yet, Stockfish is obviously not AGI. It's a chess engine. Within the domain of chess, it can do a wide variety of things in a wide variety of ways, things that, when a human does them, require general knowledge as well as understanding, planning and abstract thought. An AI that had the ability to form abstractions and make plans in any domain it encounters, including domains it hasn't encountered before, would have to be considered an AGI, and such an AI could most likely learn how to play chess well, but that doesn't make Stockfish AGI.

I think much the same thing is going on with LLMs, though there's certainly room for disagreement. Agüera y Arcas and Norvig see multiple domains like essay writing, word-problem solving, Italian-speaking, Python-coding and so forth. I see basically a single domain of word-smashing. Just like a chess engine can turn a simple evaluation function and tons of processing power into a variety of chess-related abilities, I would claim that an LLM can turn purely formal word-smashing and tons of training text and processing power into a variety of word-related abilities.

The main lesson of LLMs seems to be that laying out coherent sequences of words in a logical order certainly looks like thinking, but, even though there's clearly more going on than in an old-fashioned Markov chain, there's abundant evidence that they're not doing anything like what we consider "thinking" (I explore this a bit more in this post and in some others with the AI tag).

What's missing, then? The DeepMind paper argues that metacognitive skills are an important missing piece, perhaps the most important one. While the term is mentioned several times, it is never really sharply defined. It variously includes "learning", "the ability to learn new tasks or the ability to know when to ask for clarification or assistance from a human", "the ability to learn new skills", "the ability to learn new skills and creativity" and "learning when to ask a human for help, theory of mind modeling, social-emotional skills". Clearly, learning new skills is central, but there is a certain "we'll know it when we see it" quality to all this.

This isn't a knock on the authors of the paper. A recurring theme in the development of AI, as the hype dies down about the latest development, is trying to pinpoint why the latest development isn't the AGI everyone's been looking for. By separating out factors like performance and autonomy, the paper makes it clear that we have a much better handle on what those mean, and the remaining mystery is in generality. Generality comprises a number of things that current AIs don't do. You could make a case that current LLMs show some level of learning and creativity, but I agree with the assessment that this is "emerging" and not "competent".

An LLM can write you a poem about a tabby cat in the style of Ogden Nash, but it won't be very good. Or all that much like Ogden Nash. More importantly, it won't be very creative. LLM-generated poems I've seen tend to have a similar structure: Opening lines that are generally on-topic and more or less in style, followed by a meandering middle that throws in random things about the topic in a caricature of the style, followed by a conclusion trying to make some sort of banal point.

Good poems usually aren't three-part essays in verse form. Even in poems that have that sort of structure, the development is carefully targeted and the conclusion tells you something insightful and unexpected.

It's not really news that facility with language is not the same as intelligence, or that learning, creativity, theory of mind and so on are capabilities that humans currently have in ways that AIs clearly don't, but the DeepMind taxonomy nonetheless sharpens the picture and that's useful.

I think what we're really looking for in AGI is something that will make better decisions than we do, for some notion of "better". That "for some notion" bit isn't just a bit of boilerplate or an attempt at a math joke. People differ, pretty sharply sometimes, on what makes a decision better or worse. Different people bring different knowledge, experience and temperaments to the decision-making process, but beyond that, we're not rational beings and never will be.

Making better decisions probably does require generality in the sense of learning and creativity, but the real goal is something even more elusive: judgment. Wisdom, even. Much of the concern over AGI is, I think, about judgment.

We don't want to create something powerful with poor judgment. What constitutes good or poor judgment is at least partly subjective, but when it comes to AIs, we at least want that judgment to regard the survival of humanity as a good thing. One of the oldest nightmare scenarios, far older than computers or Science Fiction as a genre, is the idea that some all-powerful, all-wise being will judge us, find us wanting and destroy us. As I said at the top, our concerns about AGI say more about us than they do about AI.

The AI community does talk about judgment, usually under the label of alignment. Alignment is a totally separate thing from generality or even intelligence. "Is it generally intelligent?" is not just a different question, but a different kind of question, from "Does its behavior align with our interests?" In other words, "good judgment" means "good for us". I'm not going to argue against, or at least not very enthusiastically.

Alignment is a concern when a thing can make decisions, or influence us to make decisions, in the real world. Technology to amplify human intelligence is ancient (writing, for example), as is technology to influence our decisions (think rolling dice or drawing lots for ancient examples, but also any technology, such as a spreadsheet, that we come to rely on to make decisions).

Technology that can make decisions based on an information store it can also update is less than a century old. While computing pioneers were quick to recognize that this was a very significant development, it's no surprise that we're still coming to grips with just what it means a few decades later.

Intelligence is important here not for its own sake, but because it relates to concepts like risk, judgment and alignment. To be an active threat, something has to be able to influence the real world, and it has to be able to make decisions on its own. That ability to make decisions is where intelligence comes in.

Computers have been involved in controlling things like power plants and weapons for most of the time computers have been around, but until recently control systems have only been able to implement algorithms that we directly understand. If the behavior isn't what we expect, it's because a part failed or we got the control algorithm wrong. With the advent of ML systems (not just LLMs), we now have a new potential failure mode: The control system is doing what we asked, but we don't really understand what that means.

This is actually not entirely new, either. It took a while to understand that some systems are chaotic and that certain kinds of non-linear feedback can lead to unpredictable behavior even though the control system itself is simple and you know the inputs with a high degree of precision. Nonetheless, state-of-the-art ML models introduce a whole new level of opaqueness. There's now a well-developed theory of when non-linear systems go chaotic and what kinds of behavior they can exhibit. There's nothing like that for ML models.

This strongly suggests that we should tread very carefully before, say, putting an ML model in charge of launching nuclear missiles, but currently, and for quite a while yet as far as I can tell, whether to do such a thing is still a human decision. If some sort of autonomous submarine triggers a nuclear war, that's ultimately a failure in human judgment for building the sub and putting nuclear missiles on it.

Well, that went darker than I was expecting. Let's go back to the topic: What would superhuman intelligence even mean? The question could mean two different things:

How do you define superhuman intelligence? It's been over 70 years since Alan Turing asked if machines could think, but we still don't have a good answer. We have a fairly long list of things that aren't generally intelligent, including current LLMs except perhaps in a limited sense, and we're pretty sure that having capabilities like the ability to learn new tasks is a key factor, but we don't have a good handle on what it really means to have such a capability.
What are the implications of something having superhuman intelligence? This is an entirely different question, having to do with what kind of decisions do we allow an AI to make about what sort of things. The important factors here are risk and judgment.

These are two very different questions, but they're related.

It's natural to think of them together. In particular, when some new development comes along that may be a step toward AGI (first question), it's natural, and useful, to think of the implications (second question). But that needs to be done carefully. It's easy to follow a chain of inference along the lines of

X is a major development in AI
So X is a breakthrough on the way to AGI
In fact, X may even be AGI
So X has huge implications

Those middle steps tie a particular technical development to the entire body of speculation about what it would mean to have all-knowing super-human minds in our midst, going back to well before there were even computers. Whatever conclusions you've come to in that discussion -- AGI will solve all the world's problems, AGI will be our demise, AGI will disrupt the jobs market and the economy, whether for better or for worse, or humans will keep being humans and AGI will have little effect one way or another, or something else -- this latest development X has those implications.

My default assumption is that humans will keep being humans, but there's a lot I don't know. My issue, really, is with the chain of inference. The debate over whether something like an LLM is generally intelligent is mostly about how you define "generally intelligent". Whether you buy my view on LLMs, or Agüera y Arcas and Norvig's has little if anything to do with what the economic or social impacts will be.

The implications of a particular technical development, in AI or elsewhere, depend on the specifics of that development and the context it happens in. While it's tempting to ask "Is it AGI?" and assume that "no" means business as usual while "yes" has vast consequences, I doubt this is a useful approach.

The development of HTTP and cheap, widespread internet connectivity has had world-wide consequences, good and bad, with no AI involved. Generative AI and LLMs may well be a step toward whatever AGI really is, but at this point, a year after ChatGPT launched and a couple of years after generative AIs like DALL-E came along, it's hard to say just what direct impact this generation of AIs will have.

I would say, though, that the error bars have narrowed. A year ago, they ranged from "ho-hum" to "this changes everything". The upper limit seems to have dropped considerably in the interim, while the lower limit hasn't really moved.

Tuesday, October 29, 2019

More on context, tool use and such

In the previous post I claimed that (to paraphrase myself fairly loosely) whether we consider behaviors that technically look like "learning", "planning", "tool use" or such to really be those things has a lot to do with context. A specially designed robot that can turn a door handle and open the door is different from something that sees a door handle slightly out of reach, sees a stick on the ground, bends the end of the stick so it can grab the door handle and proceeds to open the door by using the stick to turn the handle and then to poke the door open. In both cases a tool is being used to open a door, but we have a much easier time calling the second case "tool use". The robot door-opener is unlikely to exhibit tool use in the second case.

With that in mind, it's interesting that the team that produced the hide-and-seek AI demo is busily at work on using their engine to play a Massively Multiplayer Online video game. They argue at length, and persuasively, that this is a much harder problem than chess or go. While the classic board games may seem harder to the average person than mere video games, from a computing perspective MMOs are night-and-day harder in pretty much every dimension:

You need much more information to describe the state of the game at any particular point (the state space is much larger). A chess or go position can be described in well under 100 bytes. To describe everything that's going on at a given moment in an MMO takes more like 100,000 bytes (about 20,000 "mostly floating point" numbers)
There are many more choices at any given point (the action space is much larger). A typical chess position has a few dozen possible moves. A typical go position may have a couple hundred. In a typical MMO, a player may have around a thousand possible actions at a particular point, out of a total repertoire of more than 10,000.
There are many more decisions to make, in this case running at 30 frames per second for around 45 minutes, or around 80,000 "ticks" in all. The AI only observes every fourth tick, so it "only" has to deal with 20,000 decision points. At any given point, an action might be trivial or might be very important strategically. Chess games are typically a few dozen moves long. A go game generally takes fewer than 200 (though the longest possible go game is considerably longer). While some moves are more important than others in board games, each requires a similar amount and type of calculation.
Players have complete information about the state of a chess or go game. In MMOs, players can only see a small part of the overall universe. Figuring out what an unseen opponent is up to and otherwise making inferences from incomplete data is a key part of the game.

Considered as a context, an MMO is, more or less by design, much more like the kind of environment that we have to plan, learn and use tools in every day. Chess and go, by contrast, are highly abstract, limited worlds. As a consequence, it's much easier to say that something that looks like it's planning and using tools in an MMO really is planning and using tools in a meaningful sense.

It doesn't mean that the AI is doing so the same way we do, or at least may think we do, but that's for a different post.

Tool use, planning and AI

A recent story in MIT Technology Review carries the headline AI learned to use tools after nearly 500 million games of hide and seek, and the subhead OpenAI’s agents evolved to exhibit complex behaviors, suggesting a promising approach for developing more sophisticated artificial intelligence. This article, along with several others, is based on a blog post on OpenAI's site. While the article is a good summary of the blog post, the blog post is just as readable while going into somewhat more depth and technical detail. Both the article and the blog post are well worth reading, but as always the original source should take precedence.

There is, as they say, quite a bit to unpack here, and before I'm done this may well turn into another Topic That Ate My Blog. At the moment, I'm interested in two questions:

What does this work say about learning and intelligence in general?
To what extent or in what sense do terms like "tool use" and "planning" describe what's going on here?

My answers to both questions changed significantly between reading the summary article and reading the original blog post.

As always, lurking behind stories like this are questions of definition, in particular, what do we mean by "learning", "planning" and "tool use"? There have been many, many attempts to pin these down, but I think for the most part definitions fall into two main categories, which I'll call internal and external here. Each has its advantages and drawbacks.

By internal definition I mean an attempt to formalize the sort of "I know it when I do it" kind of feeling that a word like learning might trigger. If I learn something, I had some level of knowledge before, even if that level was zero, and after learning I could rattle off a new fact or demonstrate a new skill. I can say "today I learned that Madagascar is larger than Iceland" or "today I learned how to bake a soufflé".

If I talk about planning, I can say "here's my plan for world domination" (like I'd actually tell you about the robot army assembling itself at ... I've said too much) or "here's my plan for cleaning the house". If I'm using a tool, I can say "I'm going to tighten up this drawer handle with a Philips screwdriver", and so forth. The common thread is here is a conscious understanding of something particular going on -- something learned, a plan, a tool used for a specific purpose.

This all probably seems like common sense, and I'd say it is. Unfortunately, common sense is not that helpful when digging into the foundations of cognition, or, perhaps, of anything else interesting. We don't currently know how to ask a non-human animal to explain its thinking. Neither do we have a particularly good handle on how a trained neural network is arriving at the result it does. There may well be something encoded in the networks that control the hiders and seekers in the simulation, which we could point at and call "intent", but my understanding is we don't currently have a well-developed method for finding such things (though there has been progress).

If we can't ask what an experimental subject is thinking, then we're left with externally visible behavior. We define learning and such in terms of patterns of behavior. For example, if we define success at a task by some numerical measure, say winning percentage at hide and seek, we can say that learning is happening when behavior changes and the winning percentage increases in a way that can't be attributed to chance (in the hide-and-seek simulation, the percentage would tilt one way or another as each side learned new strategy, but this doesn't change the basic argument).

This turns learning into a pure numerical optimization problem: find the weights on the neurons that produce the best winning percentage. Neural-network training algorithms are literally doing just such an optimization. Networks in the training phase are certainly learning, by definition, but certainly not in the sense that we learn by studying a text or going to a lecture. I suspect that most machine learning researchers are fine with that, and might also argue that studying and lectures are not a large part of how we learn overall, just the part we're most conscious of as learning per se.

This tension between our common understanding of learning and the workings of things that can certainly appear to be learning goes right to why an external definition (more or less what we call an operational definition) can feel so unsatisfying. Sure, the networks look like they're learning, but how do we know they're really learning?

The simplest answer to that is that we don't. If we define learning as optimizing a numerical value, then pretty much anything that does that is learning. If we define learning as "doing things that look to us like learning", then what matters is the task, not the mechanism. Learning to play flawless tic-tac-toe might be explained away as "just optimizing a network" while learning to use a ramp to peer over the wall of a fort built by a group of hiders sure looks an awful lot like the kind of learning we do -- even though the underlying mechanism is essentially the same.

I think the same reasoning applies to tool use: Whether we call it tool use or not depends on how complex the behavior appears to be, not on the simple use of an object to perform a task. I remember reading about primates using a stick to dig termites as tool use and thinking "yeah, but not really". But why not, exactly? A fireplace poker is a tool. A barge pole is a tool. Why not a termite stick? The only difference, really, is the context in which they are used. Tending a fire or guiding a barge happen in the midst of several other tools and actions with them, however simple in the case of a fireplace and andirons. It's probably this sense of the tool use being part of a larger, orchestrated context that makes our tool use seem different. By that logic, tool use is really just a proxy for being able to understand larger, multi-part systems.

In my view this all reinforces the point that "planning", "tool use" and such are not binary concepts. There's no one point at which something goes from "not using tools" to "using tools", or if there is, the dividing line has to be fairly arbitrary and therefore not particularly useful. If "planning" and "tool use" are proxies for "behaving like us in contexts where we consider ourselves to be planning and using tools", then what matters is the behavior and the context. In the case at hand, our hiders and seekers are behaving a lot like we would, and doing it in a context that we would certainly say requires planning and intelligence.

As far as internal and external definitions, it seems we're looking for contexts where our internal notions seem to apply well. In such contexts we have much less trouble saying that behavior that fits an external definition of "tool use", "planning", "learning" or whatever is compatible with those notions.

Sunday, December 30, 2018

Computer chess: Dumber and smarter at the same time?

[As usual, I've added inline comments for non-trivial corrections to the original text, but this one has way more than most. I've come up to speed a bit on the current state of computer chess, so that's reflected here. The result is not the most readable narrative, but I'm not going to try to rewrite it --D.H. Apr 2019]

One of the long-running tags on the other blog is "dumb is smarter". The idea is that, at least in the world of software, it's perilous to build a lot of specialized knowledge into a system. Specialized knowledge means specialized blind spots -- any biases or incorrect assumptions in the specialization carry over to the behavior of the system itself. In cases like spam filtering, for example, a skilled adversary can exploit these assumptions to beat the system.

I will probably follow up on the other blog at some point on how valid the web.examples of this that I cited really were, and to what extent recent developments in machine learning have changed the picture (spoiler alert: at least some). Here I'd like to focus on one particular area: chess. Since this post includes Deep Mind, part of the Alphabet family, I should probably be clear that what follows is taken from public sources I've read.

For quite a while and up until recently, the most successful computer chess programs have relied mainly on brute force, bashing out millions and millions of possible continuations from a given position and evaluating them according to fairly simple rules [in actual tournament games, it's not unusual to see over a billion nodes evaluated for a single move]. There is a lot of sophistication in the programming, in areas like making good use of multiple processors, avoiding duplicate work, representing positions in ways that make good use of memory, generating possible moves efficiently and deciding how to allocate limited (though large) processing resources to a search tree that literally grows exponentially with depth, but from the point of view of a human chess player, there's nothing particularly deep going on, just lots and lots of calculation.

For a long time, a good human player could outplay a good computer program by avoiding tactical blunders and playing for positional advantages that could eventually be turned into a winning position that even perfect tactical play couldn't save. If the human lost, it would generally be by missing a tactical trick that the computer was able to find by pure calculation. In any case, the human was looking at no more than dozens of possible continuations and, for the most part, calculating a few moves ahead, while the computer was looking at vastly more positions and exploring many more of them in much greater depth than a person typically would.

The sort of positional play that could beat a computer -- having a feel for pawn structure, initiative and such -- comes reasonably naturally to people, but it's not easily expressed in terms of an evaluation function. The evaluation function is a key component of a computer chess engine that reduces a position to a number that determines whether one position should get more attention than another. Since there are millions of positions to evaluate, the evaluation function has to be fast. A typical evaluation function incorporates a variety of rules of the kind you'll find in beginning chess books -- who has more material, who has more threats against the other's pieces, whose pieces have more squares (particularly central squares) to move to, are any pieces pinned or hanging, and so forth.

There's quite a bit of tuning to be done here -- is it more important to have a potential threat against the opponent's queen or to control two squares in the center? -- but once the tuning parameters are set, they're set, at least until the next release. The computer isn't making a subtle judgment based on experience. It's adding up numbers based on positions and alignments of pieces.

It's not that no one thought of writing a more subtle evaluation function, one that would allow the program to look at fewer, but better positions. It's just that it never seemed to work as well. Put a program that looks at basic factors but looks at scads and scads of positions against one that tries to distill the experience of human masters and only look at a few good moves, and the brute force approach has typically won. The prime example here would be Stockfish, but I'm counting, engines such as earlier versions of Komodo as brute force since, as I understand it, they use the same alpha/beta search technique and examine large numbers of positions. I'm having trouble finding exact numbers on that, though [If you look at the stats on chess.com's computer chess tournaments, it's very easy to tell who's who. For a 15|5 time control, for example, you either see on the order of a billion nodes evaluated per move or on the order of a hundred thousand.]

[Houdini is another interesting example. Its evaluation function puts more weight on "positional" factors such as active pieces and space, and it tries to be highly selective in which moves it devotes the most resources too. This is, explicitly, trying to emulate the behavior of human players. So it's not quite correct to say that programs that try to emulate humans have done worse than ones that just bash out continuations. Houdini has done quite well.

However, from a software point of view, these programs are all largely similar. There is an evaluation function that's explicitly coded as rules you can read, and this is used to decide how much attention to pay to what part of the search tree.

Alpha zero, by contrast, uses machine learning (aka neural network training, aka deep neural networks) to build an evaluation function that's totally opaque when considered purely as code. There are techniques to examine what a neural network is doing, but they're not going to reduce to rules like "a bishop is worth three times as much as a pawn". It also uses a Monte Carlo approach to examine the search tree probabilistically, which is a somewhat different way to use the evaluation function to guide the search. As I understand it, this is not the usual practice for other engines, though it's certainly possible to incorporate a random element into a conventional chess engine. Komodo MC comes to mind.

In short, the narrative of "purely mechanical programs beat ones that try to emulate humans" is not really right, but Alpha Zero still represents a radically different approach, and one that is in some sense structurally more similar to what things-with-brains do. --D.H. Feb 2019]

This situation held for years. Computer hardware got faster, chess programs got more finely tuned and their ratings improved, but human grandmasters could still beat them by exploiting their lack of strategic understanding. Or at least that was the typical explanation: Humans had a certain je ne sais quoi that computers could clearly never capture, not that the most successful ones were particularly trying to. Occasionally you'd hear a stronger version: computers could never beat humans since they were programmed by humans and could never know more than their programmers, or some such, but clearly there's a hole in that logic somewhere ...

Then, in 1997, Deep Blue (an IBM project not directly related to Deep Mind) beat world champion Garry Kasparov in a rematch, Kasparov having won the first match between the two. It wasn't just that Deep Blue won the games. It outplayed Kasparov positionally just like human masters had been outplaying computers, but without employing anything you could point at and call strategic understanding. It just looked at lots and lots of positions in a fairly straightforward way.

This isn't as totally surprising as it might seem. The ultimate goal in chess is to checkmate the opponent. In practice, that usually requires winning a material advantage, opening up lines of attack, promoting a pawn in the endgame or some other tactical consideration. Getting a positional advantage is just setting things up to make that more likely. Playing a simple tactical game but looking twenty plies (half-moves) ahead turns out to be quite a bit like playing a strategic game. [In fact, computer-computer chess games are almost entirely positional, since neither player is going to fall for a simple tactical trap. That's not to say tactics don't matter. For example I've seen any number of positions where a piece looked to be hanging, but couldn't be taken due to some tactical consideration. What I haven't seen is a game won quickly by way of tactics.]

Human-computer matches aren't that common. At first, the contests were too one-sided. There was no challenge in a human grandmaster playing a computer. Then, as computers became stronger, few grandmasters wanted to risk being the first to lose to a computer (and Kasparov is to be commended for taking up the challenge anyway). Now, once again, the contests are too one-sided. Human players use computers for practice and analysis, but everyone loses to them head-to-head, even with odds given (the computer plays without some combination of pieces and pawns).

At this writing, current world champion Magnus Carlsen, by several measures the strongest player ever or at the very least in the top handful, stands less than a 2% chance of beating the current strongest computer program head to head. With the question of "will computers ever beat the best human players at chess?" firmly settled, human players can now be compared on the basis of how often they play what the computer would play. Carlsen finds the computer move over 90% of the time, but it doesn't take many misses to lose a game against such a strong player.

And now comes AlphaZero.

The "alpha" is carried over from its predecessor, AlphaGo which, by studying games between human players of the game of go and using machine learning to construct a neural network for evaluating positions, was able to beat human champion Lee Sedol. This was particularly significant because a typical position in go has many more possible continuations than a typical chess position, making the "bash out millions of continuations" approach impractical. Given that computer chess had only fairly recently reached the level of top human players and go programs hadn't been particularly close, it had seemed like a good bet that humans would continue to dominate go for quite a while yet.

AlphaGo, and machine learning approaches in general, use what could be regarded as a much more human approach, not surprising since they're based on an abstraction of animal nervous systems and brains rather than the classic Turing/Von Neumann model of computing, notwithstanding that they're ultimately still using the same computing model as everyone else. That is, they run on ordinary computer hardware, though often with a specialized "tensor processor" for handling the math.

However, the algorithm for evaluating positions is completely different. There's still an evaluation function, but it's "run the positions of the pieces through this assemblage of multilinear functions that the training stage churned out". Unlike a conventional chess engine, there's nothing you can really point at and say "this is how it knows about pins" or "this is how it counts up material".

AlphaGo looks at many fewer positions [by a factor of around 10,000] than a conventional chess engine would, and it looks at them probabilistically, that is, it uses its evaluation function to decide how likely it is that a particular continuation is worth looking at, then randomly chooses the actual positions to look at based on that. It's still looking at many more positions than a human would, but many fewer than a pure brute-force approach would.

The "zero" is because the training stage doesn't use any outside knowledge of the game. Unlike AlphaGo, it doesn't look at games played by humans. It plays against itself and accumulates knowledge of what works and what doesn't. Very roughly speaking, AlphaZero in its training stage plays different versions of its network against each other, adjusts the parameters based on what did well and what did badly, and repeats until the results stabilize.

AlphaZero does this knowing only the rules of the game, that is, what a position looks like, what moves are possible, and how to tell if the game is won, lost, drawn (tied) or not over yet. This approach can be applied to a wide range of games, so far go, chess and shogi (a chess-like game which originated in Japan). In all three cases AlphaZero achieved results clearly better than the (previous) best computer players after a modest number of hours of training (though the Stockfish team makes a good case that AlphaZero had a hardware advantage and wasn't playing against the strongest configuration). [Recent results indicate that LC0, the strongest neural net based engine, and Stockfish, the strongest conventional engine, are very evenly matched, but LC0 doesn't have the benefit of a tensor processor to speed up its evaluation --D.H. May 2019]

Notably, AlphaZero beat AlphaGo 60 games to 40.

In one sense, AlphaZero is an outstanding example of Dumb is Smarter, particularly in beating AlphaGo, which used nearly the same approach, but trained from human games. AlphaZero's style of play has been widely described as unique. Its go version has found opening ideas that had lain undiscovered for centuries. Its chess version plays sacrifices (moves that give up material in hopes of a winning attack) that conventional chess engines pass up because they can't prove that they're sound. Being unbiased by exposure to human games or a human-developed evaluation function, AlphaZero can find moves that other programs would never play, and it turns out these moves are often good enough to win, even against chess engines that never make tactical mistakes.

On the other hand, AlphaZero specifically avoids sheer brute force. Rather than look at lots and lots of positions using a relatively simple evaluation function, it looks at many fewer, using a much more sophisticated evaluation function to sharply limit the number of positions it examines. This is the same approach that had been tried in the past with limited success, but with two key differences: The evaluation function is expressed as a neural network rather than a set of explicit rules, and that neural network is trained without any human input, based solely on what works in practice games against AlphaZero itself.

The Dumb is Smarter tag on Field Notes takes "dumb" to mean "no special sauce" and "smarter" to mean "gets better results". The "smarter" part is clear. The "dumb" part is more interesting. There's clearly no special sauce in the training stage. AlphaZero uses a standard machine learning approach to produce a standard neural network.

On the other hand, if you consider the program itself without knowing anything about the training stage, you have a generic engine, albeit one with a somewhat unusual randomized search algorithm, and an evaluation function that no one understands in detail. It's all special sauce -- a set of opaque, magical parameters that somehow give the search algorithm the ability to find the right set of variations to explore.

I think it's this opaqueness that gives neural networks their particularly uncanny flavor (uncanny, etymologically, roughly means "unknowable"). The basic approach of taking some inputs and crunching some numbers on them to get an output is unequivocally dumb. As I said above, "It's adding up numbers based on positions and alignments of pieces." Except that those numbers are enabling it to literally make "a subtle judgment based on experience", a judgment we have no real choice but to call "smart".

Progress marches on. At least one of the previous generation of chess engines (Komodo) has incorporated ideas from AlphaZero [Leela has open-sourced the neural network approach wholesale]. It looks like the resulting hybrid isn't dominant, at least not yet, but it does play very strong chess, and plays in a completely different, more human, style from the conventional version. That's interesting all by itself.

Thursday, April 6, 2017

Big vocabulary, or just big words?

The other day I was reading an article that used a couple of words I hadn't seen in a while, say anodyne or encomium. I more-or-less remembered what they meant, and it was reasonably clear from context what they meant, but I still ended up looking them up. I had two feelings about this: on the one hand, did the author really have to drag those out? Why not just use Plain English? On the other hand, they were correctly used, and apt, so what's the big deal? I'm sure I've thrown out a word or two here that I could replaced with something more familiar, maybe with a little rewording.

But I'm not here to critique style. What stuck in my head about this incident was how conspicuous an unusual word can be (and besides that, unusual words tend to stick out). The article itself was probably a thousand words or so, maybe more, but it was those two that changed the whole reading experience.

This wasn't just because of the extra time it took to look the words up and make sure I knew what they meant. That's a speed bump these days, reading an online article with search bar and dictionary app at the ready, maybe an extra minute, if that. Even if I hadn't had a dictionary handy, I could have gotten the good out of the article without knowing exactly what those words meant.

The real issue lies deeper in human perception: We (and living things with recognizable brains in general) are finely tuned to notice discrepancies. In a field of green grass it's the shape of that predator, or that prey, or that particularly tasty plant, or whatever, that stands out. In an article of a thousand words, it's the unusual ones that stand out.

I could go on and on, but it's worth particularly noting how important this is in social environments. We can spot an unfamiliar accent in seconds. We can spot someone dressed differently, or with different features than we usually see, well before we're even aware that we have. The other night I was watching a TV show with a foreign actor playing an American, and everything was just fine until they said "not" with a British "o". It didn't ruin the whole show -- this was a single vowel, not Dick Van Dyke in Mary Poppins -- but it was noticeable enough I still recall it out of an hour of tense drama.

(I have to say that dialect coaching has gotten a lot better over the past couple of decades. Time was, movie stars talked like movie stars, with a kind of over-enunciated diction that didn't sound like anyone in real life, and if a character was meant to sound foreign, pretty much anything would do. This is doubtless because in the early days of "talking pictures" the medium was still transitioning from the stage, a theatre actor was used to projecting up to the cheap seats and a fake accent was as good as a fake beard since everything was a hand-painted set and there probably weren't that many people in the audience who knew what a true Elbonian accent sounded like anyway. Today pretty much every part of that is different, and we expect realism -- Billy Bob Thornton's all-too-valid complaint about "that Southern accent that no one in the South actually speaks with" notwithstanding.)

Where was I?

I've argued before that we often seem to care most about distinctions when they matter least. Vocabulary is largely another example of that. Unless you're reading Finnegan's Wake or something equally chewy, you're probably OK just skimming over anything you don't know and looking it up later. Even that blowhard commentator with the two-dollar words is trying to get a point across and isn't going to let the vocabulary get completely in the way.

As a corollary to that, you don't need to know very many unusual words in order to stand out. If you know a few dozen and use them appropriately, you'll almost certainly draw attention (if you learn a few dozen and use them inappropriately you'll also draw attention, but probably not the kind you want). This can happen naturally if you run across a rare-ish but useful word or two in your reading from time to time and hold onto it for future use. There's something nice about, say, cogent that is hard to reword cleanly, the distinction between terse and concise is sometimes worth making, and so on.

Contrast that with the average human vocabulary. This is a hard thing to measure, but if you've heard something on the order of "uneducated people have a vocabulary of 2000 words while educated people know 20,000", rest assured that's complete bunk. If we're measuring vocabulary, we have to measure "listemes", that is, things that you just have to learn by rote because you can't work them out from their parts.

This includes all kinds of things:

proper names of people and places
distinct senses of words, particularly small words like out and by, which can have quite a few, depending on how you count.
idioms large and small, like in touch or look up (in its non-literal senses) to classics like red herring, two-dollar word and that's the way the cookie crumbles.
Cultural references, which are kind of like names and kind of like idioms
Fine points that we don't generally think of as idioms, but are idiomatic nonetheless, like fried egg meaning a particular way of frying an egg, as distinct from scrambling an egg or -- for whatever reason -- trying to fry a whole egg in a pan without removing the shell

I'm not trying to give a full taxonomy of things-that-you-just-have-to-learn, but I hope that gives the general idea. The main point is that there are lots and lots of these, the categories they might fall into are somewhat arbitrary, and how many you know doesn't have a great deal to do with how many literary classics you've read.

I'm not really familiar with the research on this, but my understanding is that the average person knows somewhere in the hundreds of thousands of listemes, and a large portion of them are commonly understood. On top of those, we can add a smaller portion of jargon, slang or sesquipedalianisms. That part, people will notice. But it's a relatively small part.

Wednesday, June 20, 2012

What is, or isn't a theory of mind? Part 1: Objects

One notion of self-awareness revolves around the notion of a theory of mind, that is, a mental model of mental models.

Strictly speaking, having a theory of mind doesn't imply self-awareness. I could have a very elaborate concept of others' mental states without being aware of my own. This would go beyond normal human frailty like my not being aware of why I did some particular thing or forgetting how others might be affected by my actions. It would mean being able to make inferences like "he likes ice cream so there will probably be a brownie left" without being aware that I like things. That seems unlikely, but neurology is a wide and varied landscape. There may well be people with just such a condition.

This is clearly brain-stretching stuff, so let's try to ease into it. In this post, I want to start with a smaller, related problem: What would a theory of objects look like, and how could you tell if something has it? What we're trying to describe here is some sort of notion that the world contains discrete objects, which have a definite location and boundaries and which we can generally interact with, for example causing them to move. This leaves room for things that aren't discrete objects, like sunshine or happiness or time, but it does cover a lot of interesting territory.

Not every living thing seems to have such a theory of objects. A moth flying toward a light probably doesn't have any well-developed understanding of what the light is. Rather, it is capable of flying in a given direction, and some part of its nervous system associates "more light in that direction" with "move in that direction". It's the stimulus of the light, not the object producing the light, that the moth is responding to. In other words, this is a simple stimulus-response interaction.

On the other hand, a theory of objects is not some deep unique property of the human mind. A dog chasing a frisbee clearly does not treat the frisbee as an oval blob of color. It treats it as a discrete, moving object with a definite position and trajectory in three dimensions.

You might fool the dog for a moment by pretending to throw the frisbee but hanging onto it instead, but the dog will generally abandon the chase for the not-flying disc in short order and retarget itself on the real thing. It can recognize discs of different shapes and sizes and react to them as things to be thrown and caught. It's hard to imagine a creature doing such a thing without some abstract mental representation of the disc -- and of you for that matter.

Likewise a bird or a squirrel stashing food for the winter and recovering it months later must have some representation of places and, if not objects, a "food-having" attribute to apply to those places. That they are able to pick individual nuts and take them to those places also implies some sort of capability beyond reacting to raw sense data.

(but on the other hand ... ants are able to move objects from place to place, bees are able to locate and return to flowers ... my fallback here and elsewhere is that rather than one single thing we can call "theory of object" there must be many different object-handling facilities, some more elaborate than others ... and dogs and people have more elaborate facilities than do insects).

I've been careful in the last few paragraphs to use terms like "facility" and "representation" instead of "concept" or "idea" or such. I'm generally fine with more loaded terms, which suggest something like thought as we know it, but just there I was trying to take a particularly mechanistic view.

So what sort of experiment could we conduct to determine whether something has a theory of objects, as opposed to just reacting to particular patterns of light, sound and so forth? We are looking for situations where an animal with a theory of objects would behave differently from one without.

One key property of objects is that they can persist even when we can't sense them. Technically, this goes under the name of object permanence. For example, if I set up a screen, stand to one side of it and throw a frisbee behind it, I won't be surprised if a dog heads for the other side of the screen in anticipation of the frisbee reappearing from behind it. Surely that demonstrates that the dog has a concept of the frisbee as an object.

Well, not quite. Maybe the dog just instinctively reacts to a moving blob of color and continues to move in that direction until something else captures its attention. Ok then, what if the frisbee doesn't come out from behind the screen? Perhaps I've placed a soft net behind the screen that catches the frisbee soundlessly. If the dog soon abandons its chase and goes off to do something else, we can't tell much. But if it immediately rushes behind the screen, that's certainly suggestive.

However ... one can continue to play devil's advocate here. After all, the two scenes, of the frisbee emerging or staying hidden, necessarily look different. In one case there is a moving blob of color -- causing the dog to move -- followed by another blob of moving color. In the other, there is no second movement. So perhaps the hard-wiring is something like "Move in the direction of a moving blob of color. If it disappears for X amount of time, move back toward the source." That wouldn't quite explain why the dog might go behind the screen, but with a bit more thought we can probably explain that away.

What we need in order to really put the matter to rest is a combinatorial explosion. A combinatorial explosion occurs when a few basic pieces can produce a huge number of combinations. For example, a single die can show any of 6 numbers, two dice can show 36 combinations, three can show 216, four can show 1296 and so forth. As the number of dice grows, it quickly becomes impractical to keep track of all the possible combinations separately.

If something, for whatever reason, reacts to combinations of eight dice that total less than 10 one way and those that total 10 or more a different way, it's hard to argue that it's simply reacting to the 9 particular combinations (all ones, eight different ways to get a two and seven ones) that total less than ten one way and the other 1,679,607 the other way. Rather, the simplest explanation is that it has some concept of number. On the other hand, if we're only experimenting with a single die, and a one gets a different reaction from the other numbers, it might well be that a lone circle has some sort of special status.

In the case of the frisbee and screen experiment, we might add more screens and have grad students stand behind them and randomly catch the frisbee and throw it back the other way. If there are, say, five screens and the dog can follow the frisbee from the starting position to screen four, back to screen two and finally out the far side, and can consistently follow randomly chosen paths of similar complexity, we might as well accept the obvious: A dog knows what a frisbee is.

Why not just accept the obvious to begin with? Because not all obvious cases are so obvious. When we get into borderline cases, our intuition becomes unreliable. Different people can have completely different rock-solid intuitions and the only way to sort it out is to run an experiment that can distinguish the two cases.

This is where we are with primate psychology and theories of mind. It's pretty clear that chimps (and here I really mean chimps and/or bonobos), for example, have much of the same cognitive machinery we do, including not only a theory of objects and some ability to plan, but also such things as an understanding of social hierarchies and kinship relations.

On the other hand, attempts to teach chimps human language have been fairly unconvincing. It's clear that they can learn vocabulary. This is notable, even though understanding of vocabulary is not unique to primates. There are dogs, for example, that can reliably retrieve any of dozens of objects from a different room by name.

There has been much less success, however, with understanding sentences with non-trivial syntax, on the order of "Get me the red ball from the blue box under the table" when there is also, say, a red box with a blue ball in it on the table. Clearly chimps have some concept of containment, and color, and spatial relationships, but that doesn't seem to carry through to their language facility such as it is.

So what facilities do we and don't we have in common? In particular, do our primate cousins have some sort of theory of mind?

That brings us back to the original question of what constitutes a theory of mind, and the further question of what experiments could demonstrate its presence or absence.

People who work closely with chimps are generally convinced that they can form some concept of what their human companions are thinking and can adjust their behavior accordingly. However, we humans are strongly biased toward attributing mental states to anything that behaves enough like it has them -- we're prone to assuming things (including ourselves, some might say) are smarter than they are.

Direct interaction in a naturalistic setting is valuable, and most likely better for the chimp subjects, but working with a chimp that has every appearance of understanding what you're up to doesn't necessarily rule out more simplistic explanations. For example, if the experimenter looks toward something and the ape looks in the same direction, did it do so because it reasoned that the experimenter was intentionally looking that direction and therefore there must be something of interest there, or simply out of some instinct to follow the gaze of other creatures?

These are thornier questions, with quite a bit of research and debate accumulated around them over the past several decades. I want to say more about them, though not necessarily in the next post. I'm still reading up at the moment.

[I ended up continuing in this post]

Thursday, October 7, 2010

How much do we know?

The question here is not how much does humanity know collectively, or how much do we know about some given topic compared to how much we don't, or what portion of things can we reasonably say we "know" as opposed to believing or being "fairly sure" or such. Those are all interesting questions, but what I'm after here is more literal. How much does a typical human being know, by some objective measure?

To get the flavor of the question, it has been estimated that the average high-school graduate knows about 40,000 vocabulary items, or listemes. A listeme is a word, word part or collection of words that you have to memorize in order to understand, as opposed to something you can understand by breaking it into parts you already know. For example

There are two listemes in "listemes": listeme itself and the plural marker -s. If you understand both of those, you can understand their combination [Or three: list, -eme and -s, if you're a linguist and familiar with morpheme, phoneme and such -- see below -- D.H.].
Typical acronyms and such are listemes: USA or LOL, for example, even though the parts they stand for are well known, because you have to know which words the letters stand for.
Idioms are listemes. Knowing flying and saucer is not enough to know flying saucer.
Proper names are listemes. You have to learn that Muskegon is a city and that Michael Jordan is a former NBA player, even if you already know that Michael and Jordan are names.
To some extent, different senses of words count as different listemes. Knowing that you can eat off a plate doesn't tell you how to plate something in gold or what it means for a batter to step up to the plate.
Listemes are somewhat subjective. Someone well-versed in Latin might see intermittent or conjecture as made up of simpler parts, while for most of us they're one listeme each, and of course different languages have largely different listemes.

Each listeme binds a largely arbitrary sign to a meaning. At a bare minimum, then, our typical high school grad knows 40,000 items, however much knowledge an item might represent. Now, I make no pretense of knowing how the mind really represents such things, but the title of this blog is Intermittent Conjecture, so it seems that by a miraculous coincidence I've left myself room to speculate.

I would guess that typical listemes are associated with bundles of memories and their relations to other memories. For example, plate might perhaps conjure up images of typical dinner plates and memories of eating and setting the table and such; images of plated items one may have encountered or a representation of the plating process; images from a baseball game with a batter in stance or a runner sliding into home.

Similarly to how words may be defined with other words, these bundles of images will typically overlap. A memory of a dinner plate may include an image of a table, or of eating, making "something you put on the table" or "something you eat off of" natural, if incomplete, answers to "What's a plate?"

I've used "memory" and "image" fairly interchangeably here, but I suspect that the images that concepts are built on are nothing like fully detailed pictures or movies. Rather, they're highly abstracted, with only the relevant features retained.

By this line of speculation, those 40,000 listemes might represent 400,000 or 1,000,000 or more images, grouped into concepts and with arbitrary signs attached. There is much, much more to the picture, of course, but again we're just trying to get a rough estimate of what's in a typical brain.

Words are only one window into the contents of the mind. We also know things we can't easily put into words, which one reason I had wanted to talk about different kinds of knowledge and formal vs informal education. We learn to walk instinctively, and so it's much harder to characterize what sort of things one must "know" in oder to walk, yet if we can learn something, there must be some kind of knowledge involved. Likewise for other skills like skiing or playing the trumpet, which we learn consciously and in many cases formally, but without necessarily learning a lot of vocabulary in the process.

We can also make associations unconsciously and non-verbally. When the pioneer Lucky Bill in the post I linked to above looks off and sees bad weather brewing in the clouds, he probably doesn't have words for what he's sensing, but it's definitely something he's learned and knows, just as he knows how to let his horse know it's time to go. This knowledge may well be built on the same sort of memories and images that we pin language onto, but it's not readily accessible to language.

If we take a mental image -- an abstracted memory -- as the basic unit of knowledge, with images grouped into concepts which may or may not have language attached, then it seems plausible that an adult human could have millions or tens of millions of such images. We must also allow some capacity for storing the relations among the various images, concepts signs and so forth, but such "metadata" tends to be much smaller than the data it helps organize (see this post on the other blog for more on that).

Being a compugeek, the handiest objective measure of information I have is the byte. Leaving aside that images may differ widely in size and taking an image to be on the order of a megabyte -- a completely wild guess which may well be off by orders of magnitude -- that would put our mental storage capacity on the order of terabytes or dozens of terabytes.

Until fairly recently, that was a lot of storage, but these days it's not a staggering figure. As far as putting together something of the same order as a human brain, we may just now be reaching a necessary, but not necessarily sufficient, technological milestone.

I'm happy to learn that the wild stab in the dark given above turns out to be reasonably in line with other wild stabs in the dark. See for example this Google Answers page (I didn't have a lot of luck tracing this back to the literature, but since it's all guesswork I'm not going to worry about it [and Google Answers itself disappeared a while ago]).

Monday, September 13, 2010

Learning: Formal and otherwise

In the previous post, I tried to paint (and then poke at) a stereotypical picture of "book learning" vs. "real-world learning," also known as "street smarts" (except there aren't really any proper streets where Bill lives). Which kind of learning is better? Depends on which you think you have more of, of course.

Cognitive science is a well-studied discipline with many interesting results on learning and other activities of the mind. One of its most significant results is that we don't have a single, general learning capacity, but a variety of learning mechanisms. Learning to ride a bike is different from learning a language is different from learning people's names is different from learning calculus, etc. There is good experimental evidence to support all this.

In the typical "book learning"/"real-world learning" dichotomy, formal education is held to be narrow and divorced from the world at large. But formal learning is not a monolith. Different subjects require different combinations of lecture, research, rote memorization, structured practice, unstructured practice and so forth. Teaching calculus well is different from teaching the oboe well is different from teaching experimental chemistry well is different from teaching Shakespeare well.

But what is formal education, anyway? Does it have to take place in a classroom or for course credit? Coaching a sport well is a highly structured exercise with its own terminology and a well-developed body of theory and practice. Likewise for apprenticing to a trade. The very fact of a recognized trade implies a set of rules and conventions -- forms, in other words -- to be followed. Formality is about such structures, not the particular venue for learning.

Even taking a broader view of formal learning, though, there is plenty going on outside those bounds. Learning one's first language, or one's culture, or whether one likes bleu cheese, or the way to the grocery store, or the faces and names of friend and family, or how to walk -- these all happen even without codified rules or explicit teaching, and each has its own character (though learning language and learning culture tend to be closely intertwined).

To the extent it can even be made clearly, any distinction between formal and informal learning is exceedingly coarse grained compared to the mosaics that are actual minds and the intricate subdivisions within each category. Ironically enough, it's science, putatively cold and reductionist, that has devleoped and provided support for this basic insight.

Thursday, September 9, 2010

The pioneer and the dude

The American West, not so long ago in the big picture: Into town rides a dude, that is to say, a city-dweller from the East. Call him William. You can spot him a mile away. He's dressed funny -- a dark suit in the heat of the day, a hat that'll blow off with the first good gust, polished shoes that won't look so good once he steps off his horse. Which he can't sit on right, anyhow. Probably won't last a week.

Watching from a distance is an old hand. Call him Lucky Bill, Lucky because you need to be a bit lucky to have made it this far. You couldn't necessarily pick him out in a crowd. In fact, he and his horse look like part of the landscape. Lucky Bill looks off to the west for half a second. Weather coming in. Better get going. With a low clicking sound and a subtle movement he tells his horse to move. The horse already knew to go, maybe from a shift in weight, maybe from some other cue.

Lucky's route home takes him right by William. As they pass, each has one thought of the other: "How ignorant."

Each has a point. Has Lucky heard of Ovid, or Milton? Can he even read? Can he tie a proper Ascot? Does he even know the name Beau Brummell? Put him in the middle of any dinner party in New York and he'd be a curiosity at best.

But of course, New York is a long ways away. We're on Lucky's turf, and here you need to tie a lasso, not a necktie. Not much use for Milton and Ovid unless they can help keep a herd from getting spooked. Better to stick to basics, like how to split wood and build a good fire.

In the proper context, neither William is an ignoramus; each is an expert with extensive knowledge gained from years of experience. Outside that context, however, it's a different story.

Except that Lucky Bill is just William the dude a few years on. It wasn't easy, and yes, there was a good bit of luck involved, but the raw greenhorn in the funny get-up was quick enough on the uptake to make a go of it. His hands are calloused now, his face weathered and his locks shaggy. His mind is a compendium of crucial local knowledge that's saved his life on at least one occasion. Does he still remember his poets? Well yes, he does, and he's not the only one in the area. The local poetry society meets every other Tuesday, rotating through its (four) members' houses. Weather and such permitting.

How did he get to where he is now? How much did he have to learn, and how did he learn it? Did any of his previous knowledge carry over and if so, how? What did he have to leave by the wayside and why? Ample room for conjecture here ...