Intermittent Conjecture

Friday, July 4, 2025

Update: AB and NN chess engines

When I last looked in on computer chess, AlphaZero had recently made waves by beating Stockfish after spending nine hours training by playing games against itself with no outside interference. As I understand it, the configuration that Stockfish was running wasn't its strongest, but this result was still impressive: A chess engine that looked at relatively few positions but used a neural network to evaluate them (an "NN" engine) beat an engine that looked at billions using a hand-tuned human-written algorithm (an "AB" engine). Soon an open-source engine based on Alpha Zero, Leela Chess Zero (LC0), was doing impressively well in tournaments.

The hallmark of NN engines was that they would play wild-looking moves that neither a human chess master nor an engine like Stockfish would have played at the time, moves which looked risky or even downright reckless, but often turned out to lead to a crushing advantage, all of this because similar-looking moves had led to good results in practice games.

At this writing, LC0 is still doing quite well in tournaments, but not quite as well as Stockfish, which consistently beats it. So AB wins, right?

Well, not quite. At the heart of an AB engine is the evaluation function, which takes a position on the board and returns a number that says how good the position is. The rest of the engine is dedicated to efficiently searching the tree of possible moves, replies to moves, replies to replies and so on typically a few dozen levels, to find the move that leads to the best possible positions against the opponent's best moves, according to the evaluation function.

There is a whole lot of software engineering behind making this as efficient as possible, including a technique called alpha-beta pruning that gave rise to the "AB" designation. The principle behind alpha-beta pruning is simple: Stop looking at the continuations from a move as soon as you know that the opponent can do better than it would with your current best move, but my brain gets completely befuddled when I try to understand the code, probably because the rule is applied recursively for both sides, so the meaning of "better" flips each time you switch sides in the search (alpha represents the score of the player's best move so far in the search; beta represents the same thing for the opponent).

Until recently, evaluation functions had been carefully crafted to extract features from a position, like how much material each side had, which pieces had good or bad mobility, how each side's pawns were structured and so forth, and combine those using carefully-selected rules to arrive at a final evaluation.

A significant part of this is figuring out how much weight to assign to each feature in what circumstances. Essentially, this means answering questions like "Is it better to have an extra pawn, or better mobility and pawn structure?". The actual answer is "It depends. We need a rule for deciding how much weight to give each of the features we extracted." This in turn might vary depending on the particulars of the situation. Some things are more important in the middlegame, where there is more material on the board, than in the endgame, for example.

One of the reasons for Stockfish's success is its well-designed test framework for evaluating new code, including new evaluation functions. Different versions of the engine, including versions with different evaluation functions, are systematically played against each other and only changes that win make it into the next version.

Extracting features and carefully tuning various parameters that determine how to combine them certainly seems like what I previously called an "ML-friendly problem", and it didn't take too long for someone to try that out. The result was the NNUE, a neural network that takes the positions of the pieces, with special attention given to the kings, and produces a numerical evaluation. The NNUE was good enough in testing to find its way into the official release, where it remains to this day.

So NN wins, right?

Well, not quite. A pure NN engine like LC0 is applying a large and relatively quite slow neural net to a relatively small number of positions. It doesn't look ahead very far. In principle, an NN engine might look at only the positions after each possible move in the current position, typically a couple dozen. In practice, they look at hundreds of thousands, which is far more than a human player could, but still far fewer than an AB engine does. The power of an NN engine comes from the weightings in its neural net, which in turn come from playing large numbers of training games.

By comparison, the NNUE is tiny. Here's a picture of its weightings for one particular release, and here's a little more technical detail. The NNUE has around a hundred thousand one-bit input parameters and four layers. A parameter file runs to a few dozen megabytes, most of which are for the input weights in the first layer.

Just as importantly "The efficiency of NNUE is due to incremental update of the input layer outputs in make and unmake move, where only a tiny fraction of its neurons need to be considered in case of [no] king moves." This is the result of hand-optimization, not some emergent property of the neural net.

LC0's network is much larger, though still tiny compared to the ChatGPTs of the world (which don't even really now the rules of chess, as this fairly sharply-worded piece argues).

If that's all too vague for you (it is for me), the NNUE code runs on a standard CPU and can do hundreds of millions of evaluations per second, while LC0 prefers running its network on a GPU and does tens of thousands of evaluations per second.

By looking at orders of magnitude more positions than LC0, Stockfish is in effect trusting its neural network much, much less than an NN engine does and instead relies on very deep searches to determine which move to play.

Put another way, its actual evaluation is the aggregate of billions of simplistic evaluations, which happen to use a small neural net, rather than a few hundred thousand sophisticated evaluations using a much larger neural net. More simply, Stockfish is looking at many, many positions quickly while an NN engine is looking at many fewer positions more carefully.

The NNUE is essentially automating the process of extracting features from a position and deciding how to combine them. There's nothing particularly mysterious going on. As far as I understand it, its evaluations are similar to those produced by the older code, though different enough to lead to better outcomes when fed into the AB algorithm.

Even in the case of NN engines, the neural net isn't doing all the work. It's still running in a framework of "look at the possible moves, look at the replies to each move, and so on, with AB pruning". That framework wasn't created by a neural net. It was coded for computers by humans decades ago, in the 1950, to automate something human chess players already did.

That is, a naturally-evolved neural network, the human brain, developed both the concept of looking at moves and counter-moves and its realization as code. No LLM has developed code for a successful chess engine, or even come anywhere close*. This is, at least so far, a notable difference between LLMs and natural neural nets.

Within the framework that actual chess engines are built on, it turns out that a bit of neural network-based code can be helpful. past a certain quite small size, though, adding more NN doesn't seem to help.

* To be a really fair test, the LLM would need to have been trained on a corpus that only mentioned, say, tree searches and the rules of chess, without mentioning anything like alpha-beta pruning or the idea of applying tree-searching to the problem of playing chess. I think it's a very good bet that current chatbots don't meet that standard, so if you've seen something like chess-engine code generated by an LLM, the simplest explanation is that there are similar things in the corpus it was trained on. This is to say nothing of actually producing a full chess engine that uses the tree search as its basis.

Saturday, April 5, 2025

ML-friendly problems and unexpected consequences

This started out as "one more point before I go" in the previous post, but it grew enough while I was getting that one ready to publish that it seemed like it should have its own post.

Where machine learning systems like LLMs do unexpectedly well, like in mimicking our use of language, it might not be because they've developed unanticipated special abilities. Maybe ML being good at generating convincing text says as much about the problem of generating convincing text as it does about the ML doing it.

The current generation of chatbots makes it pretty clear that producing language that's hard to distinguish from what a person would produce isn't actually that hard a problem, if you have a general pattern-matcher (and a lot of training text and computing power). In that case, the hard part, that people have spent decades trying to perfect and staggering amounts of compute power implementing, is the general pattern-matcher itself.

We tend to look at ML systems as problem solvers, and fair enough, but we can also look at current ML technology as a problem classifier. That is, you can sort problems according to whether ML is good at them. From that point of view, producing convincing text, recognizing faces, spotting tumors in radiological images, producing realistic (though still somewhat funny-looking) images and videos, spotting supernovas in astronomical images, predicting how proteins will fold, along with many other problems, are all examples of pattern-matching that a general ML-driven pattern-matcher can solve as well as, or even better than, our own naturally evolved neural networks can.

Not knowing a better term, I'll call these ML-friendly problems. In the previous post, I argued that understanding the structure of natural languages is a separate problem from understanding what meaning natural language is conveying. Pretty clearly, understanding the structure of natural languages is an ML-friendly problem. If you buy that understanding meaning is a distinct problem, I would argue that we don't know one way or another whether it's ML-friendly, partly, I would further argue, because we don't know nearly as much about what that problem involves.

From about 150 years ago into the early 20th century, logicians made a series of discoveries about what we call reasoning and developed formal systems to describe it. This came out of a school of thought, dating back to Leibniz (and as usual, much farther and wider if you look for it), holding that if we could capture rules describing how reasoning worked, we could use those rules to remove all uncertainty from any kind of thought.

Leibniz envisioned a world where, "when there are disputes among persons, we can simply say: Let us calculate, without further ado, to see who is right". That grand vision failed, of course, both because, as Gödel and others discovered, formal logic has inescapable limitations, but also because formal reasoning captures only a small portion of what our minds actually do and how we reason about the world.

Nonetheless, it succeeded in a different sense. The work of early 20th-century logicians was essential to the development of computing in the mid-20th century. For example, LISP -- for my money one of the two most influential programming languages ever, along with ALGOL -- was based directly Church's lambda calculus. I run across and/or use Java lambda expressions on a near-daily basis. For another example, Turing's paper on the halting problem used the same proof technique of diagonalization that Gödel borrowed from Cantor to prove incompleteness, and not by accident.

Current ML technology captures another, probably larger, chunk of what naturally-evolved minds do. Just as formal logic broke open a set of problems in mathematics, ML has broken open a set of problems in computing. Just as formal logic didn't solve quite as wide a range of problems as people thought it might, ML might not solve quite the range of problems people today think it might, but just as formal logic also led to significant advances in other ways, so might ML.

Embedding and meaning

I a previous post entitled Experiences, mechanisms, behaviors and LLMs, I discussed a couple of strawman objections to the idea that an LLM isn't doing anything particularly intelligent: that it's "just manipulating text" and it's "just doing calculations".

The main argument was that "just" is doing an awful lot of work there. Yes, an LLM is "just" calculating and manipulating text, but it's not "just" doing so in the same way as an early system like ELIZA, which just turned one sentence template into another, or even a 90s-era Markov chain, which just generates text based on how often which words appeared directly after which others in a sample text.

In both of those cases, we can point at particular pieces of code or data and say "those are the templates it's using", or "there's the table of probabilities" and explain directly what's going on. Since we can point at the exact calculations going on, and the data driving them, and we understand how those work, it's easy to say that the earlier systems aren't understanding text the way we do.

We can't do that with an LLM, even if an LLM generating text is doing the same general thing as a simple Markov chain. We can say "here's the code that's smashing tensors to produce output text from input text", and we understand the overall strategy, but the data feeding that strategy is far beyond our understanding. Unlike the earlier systems, there's way, way too much of it. It's structured, but that structure is much too complex to fit in a human brain, at least as a matter of conscious thought. Nonetheless, the actual of behavior shows some sort of understanding of the text without having to stretch the meaning of the word "understanding".

In the earlier post, I also said that even if an LLM encodes a lot about how words are used and in which contexts -- which it clearly does -- the LLM doesn't know the referents of those words -- it doesn't know what it means for water to be wet or what it feels like to be thirsty -- and so it doesn't understand text in the same sense we do.

This feels similar to appeals like "but a machine can't have feelings", which I generally find fairly weak, but that wasn't quite the argument I was trying to make. While cleaning up a different old post (I no longer remember which one), I ran across a reference that sharpens the picture by looking more closely at the calculations/manipulations an LLM is actually doing.

I think the first post I mentioned, on experiences etc. puts a pretty solid floor under what sort of understanding an LLM has of text, namely that it encodes some sort of understanding of how sentences are structured and how words (and somewhat larger units) associate with each other. Here, I hope to put a ceiling over that understanding by showing more precisely in what way LLMs don't understand the meaning of text in the way that we do.

Taking these together, we can roughly say that LLMs understand the structure of text but not the meaning, but the understanding of structure is deep enough that an LLM can extract information from a large body of text that's meaningful to us.

In much of what follows, I'm making use of an article in Quanta Magazine that discusses how LLMs do embeddings, that is, how they turn a text (or other input) into a list of vectors to feed into the tensor-smashing machine. It matches up well with papers I've read and a course I've taken, and I found it well-written, so I'd recommend it even if you don't read any further here.

Despite the name, a Large Language Model doesn't process language directly. The core of an LLM drives the processing of a list of tokens. A token is a vector -- an ordered list of numbers of a given length -- that represents a piece of the actual input.

To totally make up an example, if vectors are three numbers long, and a maps to (1.2, 3.0, -7.5), list maps to (6.4, -3.2, 1.6), of maps to (27.5, 9.8, 2.0), and vectors maps to (0.7, 0.3, 6.8), then a list of vectors maps to [(1.2, 3.0, -7.5), (6.4, -3.2, 1.6), (27.5, 9.8, 2.0), (0.7, 0.3, 6.8)].

Here I'm using parentheses for vectors, which in this case always have three numbers, and square brackets for lists, which can have any length (including zero for the empty list, []). In practice, the vectors will have many more than three components. Thousands is typical. The list of vectors encoding a text will be however long the text is.

The particular mapping from input to tokens is called the embedding*. The overall idea is to encode similarities along various dimensions. There are (practically) infinitely many ways to do this mapping. Over time this has evolved from a mostly-manual process, to an automated process using hand-written code, to the current state of the art, which uses machine learning techniques on large bodies of text. The first two approaches are pretty easy to understand.

An ML-produced embedding (that is, the procedure for turning an actual list of words into tokens), on the other hand, relies on a mass of numbers created during a training phase. This mass of numbers drives a generic algorithm that turns words into large vectors. While the numbers themselves don't really lend themselves to easy analysis, people have noticed interesting patterns in the results of applying embedding.

Because the model-building phase is looking at streams of text, it's not surprising that the embedding itself captures information about what words appear in what contexts in that text. For example in typical training corpora, dog and cat appear much more often in contexts like my pet ___ than, say, chair does. They are also likely to occur in conjunction with terms like paw and fur, while other words won't, and so forth.

While we don't really understand exactly how the embedding-building stage of training an LLM extracts relations like this, the article in Quanta gives the example that in one particular embedding the vector for king minus the one for man plus the one for woman is approximately equal to the one for queen (you add or subtract vectors component by component, so (1.2, 3.0, -7.5) + (6.4, -3.2, 1.6) = (7.6, -0.2, -5.9) and so on).

It's long been known that use in similar contexts correlates with similarity in meaning. But we're talking about implied similarities in meaning here, not actual meanings. You can know an analogy like cat : fur :: person : hair without knowing anything about what a cat is, or a person, or fur or hair.

That may seem odd from our perspective. A person would solve a problem like cat : fur :: person : ? by thinking about cats and people, and what about a person is similar to fur for a cat, because we're embodied in the world and we have experience of hair, cats, fur and so forth. Odd as it might seem to know that cat : fur :: person : hair without knowing what any of those things is, that's essentially what's going on with an LLM. It understands relations between words, based on how they appear in a mass of training text, but that's all it understands.

But what, exactly, is the difference between understanding how a word relates to other words and understanding what it means? There are schools of thought that claim there is no difference. The meaning of a word is how it relates to other words. If you believe that, then there's a strong argument that an LLM understands words the same way we do, and about as well as we do.

Personally, I don't think that's all there is to it. The words we use to express our reality are not our reality. For one thing, we can also use the same words to express completely different realities. We can use words in new ways, and the meaning of words can and does shift over time. There are experiences in our own reality that defy expression in words.

Words are something we use to convey meaning, but they aren't that meaning. Meaning ultimately comes from actual experiences in the real world. The way words relate to each other clearly captures something about what they actually mean -- quite a bit of it, by the looks of things -- but just as clearly it doesn't capture everything.

I have no trouble saying that the embeddings that current LLMs use encode something significant about how words relate to each other, and that the combination of the embedding and the LLM itself has a human-level understanding of how language works.

That's not nothing. It's something that sets current LLMs apart from anything before them, and it's an interesting result. For one thing, it goes a long way toward clarifying what's understanding of the world and what's just understanding of how language works and what combinations of words people actually use.

If an LLM is good at it, then it's something about how language works. If an LLM isn't good at it, then it's probably something about the world itself. I'll have a bit more to say about that in the next (shorter) post.

Because LLMs know about language, but not what it represents in the real world, we shouldn't be surprised that LLMs hallucinate, and we shouldn't expect them to stop hallucinating just because they're trained on larger and larger corpora of text.

The earlier post distinguished among behavior, mechanism and experience. An LLM is capable of linguistic behavior very similar to a person's.

The mechanism of an LLM may, or may not, be similar as far as language processing. We may well learn rules like the way that we use the in relation to nouns in a way that's similar to training an LLM. Whether that's the case or not, an LLM, by design, lacks a mechanism for tying words to anything in the real world. This probably accounts for much of the difference between what we would say and what an LLM would say.

All of this is separate from subjective experience. One could imagine a robot that builds up a store of interactions with the world, processes them into some more abstract representation and associates words with them. But even if that is more similar to what we do in terms of mechanism, it says nothing about what the robot might or might not be experiencing subjectively, even if it becomes harder to rule out the possibility that the robot is experiencing the world as we do.

* Wikipedia seems to think it's only an embedding if it's done using feature learning, but that seems overly strict. Mathematically, an embedding is any map from one domain into another, no matter how it's produced.

Thursday, March 27, 2025

Losing my marbles over entropy

In a previous post on Entropy, I offered a garbled notion of "statistical symmetry." I'm currently reading Carlo Rovelli's The Order of Time, and chapter two laid out the idea that I was grasping at concisely, clearly and -- because Rovelli is an actual physicist -- correctly.

What follows is a fairly long and rambling discussion of the same toy system as the previous post, of five marbles in a square box with 25 compartments. It does eventually circle back to the idea of symmetry, but it's really more of a brain dump of me trying to make sure I've got the concepts right. If that sounds interesting, feel free to dive in. Otherwise, you may want to skip this one.

In the earlier post, I described a box split into 25 little compartments with marbles in five of the compartments. If you start with, say, all the marbles on one row (originally I said on one diagonal, but that just made things a bit messier) and give the box a good shake, the odds that the marbles all end up in the same row that they started in are low, about one in 50,000 for this small example. So far, so good.

But this is really true for any starting configuration -- if there are twenty-five compartments in a five-by-five grid, numbered from left to right then top to bottom, and the marbles start out in, say, compartments 2, 7, 8, 20 and 24, the odds that they'll still be in those compartments after you shake the box are exactly the same, about one in 50,000.

On the one hand, it seems like going from five marbles in a row to five marbles in whatever random positions they end up in is making the box more disordered. On the other hand, if you just look at the positions of the individual marbles, you've gone from a set of five numbers from 1 to 25 ... to a set of numbers from 1 to 25, possibly the one you started with. Nothing special has happened.

This is why the technical definition of entropy doesn't mention "disorder". The actual definition of entropy is in terms of microstates and macrostates. A microstate is a particular configuration of the individual components of a system, in this case, the positions of the marbles in the compartments. A macrostate is a collection of microstates that we consider to be equivalent in some sense.

Let's say there are two macrostates: Let's call any microstate with all five marbles in the same row lined-up, and any other microstate scattered. In all there are 53,130 microstates (25 choose 5). Of those, five have all the marbles in a row (one for each row), and the other 53,125 don't. That is, there are five microstates in the lined-up microstate and 53,125 in the scattered microstate.

The entropy of a macrostate is related to the number of microstates consistent with that macrostate (for more context, see the earlier post on entropy, which I put a lot more care into). Specifically, it is the logarithm of the number of such states, multiplied by a factor called the Boltzmann constant to make the units come out right and to scale the numbers down, because in real systems the numbers are ridiculously large (though not as large as some of these numbers), and even their logarithms are quite large. Boltzman's constant is 1.380649×10⁻²³ Joules per Kelvin.

The natural logarithm of 5 is about 1.6 and the natural logarithm of 53,125 is about 10.9. Multiplying by Boltzmann's constant doesn't change their relative size: The scattered macrostate has about 6.8 times the entropy of the lined-up macrostate.

If you start with the marbles in the low-entropy lined-up macrostate and give the box a good shake, 10,625 times out of 10,626 you'll end up in the higher-entropy scattered macrostate. Five marbles in 25 compartments is a tiny system, considering that there are somewhere around 10,800,000,000,000,000,000,000,000 molecules in a milliliter of water. In any real system, except cases like very low-temperature systems with handfuls of particles, the differences in entropy are large enough that "10,625 times out of 10,626" turns into "always" for all intents and purposes.

This distinction between microstates and macrostates gives a rigorous basis for the intuition that going from lined-up marbles to scattered-wherever marbles is a significant change, while going from one particular scattered state to another isn't.

In both cases, the marbles are going from one microstate to another, possibly but very rarely the one they started in. In the first case, the marbles go from one macrostate to another. In the second, they don't. Macrostate changes are, by definition, the ones we consider significant, in this case, between lined-up and scattered. Because of how we've defined the macrostates, the first change is significant and the second isn't.

Let's slice this a bit more finely and consider a scenario where only part of a system can change at any given time. Suppose you don't shake up the box entirely. Instead, you take out one marble and put it back in a random position, including, possibly, the one it came from. In that case, the chance of going from lined-up to scattered is 20 in 21, since out of the 21 positions the marble can end up in, only one, its original position, has the marbles all lined up, and in any case it doesn't matter which marble you choose.

What about the other way around? Of the 53,120 microstates in the scattered macrostate, only 500 have four of the five marbles in one row. For any microstate, there are 105 different ways to take one marble out and replace it: Five marbles times 21 empty places to put it, including the place it came from.

For the 500 microstates with four marbles in a row, only one of those 105 possibilities will result in all five marbles in a row: Remove the lone marble that's not in a row and put it in the only empty place in the row of four. For the other 52,615 microstates in the scattered macrostate, there's no way at all to end up with five marbles lined up by moving only one marble.

So there are 500 cases where the scattered macrostate becomes lined-up, 500*104 cases where it might but doesn't, and 52,615*105 cases where it couldn't possibly. In all, that means that the odds are 11,153.15 to one against scattered becoming lined-up by removing and replacing one marble randomly.

Suppose that the marbles are lined up at some starting time, and every time the clock ticks, one marble gets removed and replaced randomly. After one clock tick, there is a 104 in 105 chance that the marbles will be in the high-entropy scattered state. How about after two ticks? How about if we let the clock run indefinitely -- what portion of the time will the system spend in the lined-up macrostate?

The there are tools to answer questions like this, particularly Markov chains and stochastic matrices (that's the same Markov Chain that can generate random text that resembles an input text). I'll spare you the details, but the answer requires defining a few more macrostates, one for each way to represent the number five as the sum of whole numbers: [5], [4, 1], [3, 2], [3, 1, 1], [2, 2, 1], [2, 1, 1, 1] and [1, 1, 1, 1, 1].

The macrostate [5] comprises all microstates with five marbles in one row, the macrostate [4, 1] comprises all microstates with four marbles in one row and one in another row, the macrostate [2, 2, 1] comprises all microstates with two marbles in one row, two marbles in another row and one marble in a third one, and so forth.

Here's a summary

Macrostate	Microstates	Entropy
[5]	5	1.6
[4,1]	500	6.2
[3,2]	2,000	7.6
[3,1,1]	7,500	8.9
[2,2,1]	15,000	9.6
[2,1,1,1]	25,000	10.1
[1,1,1,1,1]	3,125	8.0

The Entropy column is the natural logarithm of the Microstates column, without multiplying by Boltzmann's constant. Again, this is just to give a basis for comparison. For example [2,1,1,1] is the highest-entropy state, and [2,2,1] has four times the entropy of [5].

It's straightforward, but tedious, to count the number of ways one macrostate can transition to another. For example, of the 105 transitions for [3,2], 4 end up in [4,1], 26 end up back in [3,2] (not always by putting the removed marble back where it was), 30 end up in [3, 1, 1] and 45 end up in [2, 2, 1]. Putting all this into a matrix and taking the matrix to the 10th power (enough to see where this is converging) gives

Macrostate	% time	% microstates
[5]	.0094	.0094
[4,1]	.94	.94
[3,2]	3.8	3.8
[3,1,1]	14	14
[2,2,1]	28	28
[2,1,1,1]	47	47
[1,1,1,1,1]	5.9	5.9

The second column is the result of the tedious matrix calculations. The third column is just the size of the macrostate as the portion of the total number of microstates. For example, there are 500 microstates in [4,1], which is 0.94% of the total, which is also the portion of the time that the matrix calculation says system will spend in [4, 1]. Technically, this means the system is ergodic, which means I didn't have to bother with the matrix and counting all the different transitions.

Even in this toy example, the system will spend very little of its time in the low-entropy lined-up state [5], and if it ever does end up there, it won't stay there for long.

Given some basic assumptions, a system that evolves over time, transitioning from microstate to microstate, will spend the same amount of time in any given microstate (as usual, that's not quite right technically), which means that the time spent in each macrostate is proportional to its size. Higher-entropy states are larger than lower-entropy states, and because entropy is a logarithm, they're actually a lot larger.

For example, the odds of an entropy decrease of one millionth of a Joule per Kelvin are about one in e^(10¹⁷). That's a number with somewhere around 40 quadrillion digits. To a mathematician, the odds still aren't zero, but to anyone else they would be.

For all but the tiniest, coldest systems, the chance of entropy decreasing even by a measurable amount are not just small, but incomprehensibly small. The only systems where the number of microstates isn't incomprehensibly huge are are small collections of particles near absolute zero.

I'm pretty sure I've read about experiments where such a system can go from a higher-entropy state to a very slightly lower-entropy state and vice versa, though I haven't had any luck tracking them down. Even if no one's ever done it, such a system wouldn't violate any laws of thermodynamics, because the laws of thermodynamics are statistical (and there's also the question of definition over whether such a system is in equilibrium).

So you're saying ... there's a chance? Yes, but actually no, in any but the tiniest, coldest systems. Any decrease in entropy that could actually occur in the real world and persist long enough to be measured would be in the vicinity of 10⁻²³ Joules per Kelvin, which is much, much too small to be measured except under very special circumstances.

For example, if you have 1.43 grams of pure oxygen in a one-liter container at standard temperature and pressure, it's very unlikely that you know any of the variables involved -- the mass of the oxygen, its purity, the size of the container, the temperature or the pressure, to even one part in a billion. Detecting changes 100,000,000,000,000 times smaller than that is not going to happen.

But none of that is what got me started on this post. What got me started was that the earlier post tried to define some sort of notion of "statistical symmetry", which isn't really a thing, and what got me started on that was my coming to understand that higher-entropy states are more symmetrical. That in turn was jarring because entropy is usually taken as a synonym for disorder, and symmetry is usually taken as a synonym for order.

Part of the resolution of that paradox is that entropy is a measure of uncertainty, not disorder. The earlier post got that right, but evidently that hasn't stopped my for hammering on the point for dozens more paragraphs and a couple of tables in this one, using a slightly different marbles-in-compartments example.

The other part is that more symmetry doesn't really mean more order, at least not in the way that we usually think about it.

From a mathematical point of view, a symmetry of an object is something you can do to it that doesn't change some aspect of the object that you're interested in. For example, if something has mirror symmetry, that means that it looks the same in the mirror as it does ordinarily.

It matters where you put the mirror. The letter W looks the same if you put a mirror vertically down the middle of it -- it has one axis of symmetry. The letter X looks the same if you put the mirror vertically in the middle, but it also looks the same if you put it horizontally in the middle -- it has two axes of symmetry.

Another way to say this is that if you could draw a vertical line through the middle of the W and rotate the W out of the page around that line, and kept going for 180 degrees until the W was back in the page, but flipped over, it would still look the same. If you chose some other line, it would look different (even if you picked a different vertical line, it would end up in a different place). That is, if you do something to the W -- rotate it around the vertical line through the middle -- it ends up looking the same. The aspect you care about here is how the W looks.

To put it somewhat more rigorously: if f is the particular mapping that takes each point to its mirror image across the axis, then f takes the set of points in the W to the exact same set of points. Any point on the axis maps to itself, and any point off the axis maps to its mirror image, which is also part of the W. The map f is defined for every point on the plane and it moves all of them except for the axis. The aspect we care about, which f doesn't change, is whether a particular point is in the W.

If you look at all the things you can do to an object without changing the aspect you care about, you have a mathematical group. For a W, there are two things you can do: leave it alone and flip it over. For an X, you have four options: leave it alone, flip it around the vertical axis, flip it around the horizontal axis, or do both. Leaving an object alone is called the identity transformation, and it's always considered a symmetry, because math. An asymmetrical object has only that symmetry (it's symmetry group is trivial).

In normal speech, saying something is symmetrical usually means it has the same symmetry group as a W -- half of it is a mirror image of the other half. Technically, it has bilateral symmetry. In some sense, though, an X is more symmetrical, since its symmetry group is larger, and a hexagon, which has 12 elements in its symmetry group, is more symmetrical yet.

A figure with 19 sides, each of which is the same lopsided squiggle, would have a symmetry group of 19 (rotate by 1/19 of a full circle, 2/19 ... 18/19, and also don't rotate at all). That would make it more symmetrical than a hexagon, and quite a bit more symmetrical than a W, but if you asked people which was most symmetrical, they would probably put the 19-sided squigglegon last of the three.

Our visual system is mostly trained to recognize bilateral symmetry. Except for special situations like reflections in a pond, pretty much everything in nature with bilateral symmetry is an animal, which is pretty useful information when it comes to eating and not being eaten. We also recognize rotational symmetry, which includes flowers and some sea creatures, also useful information.

It would make sense, then, that in day to day life, "more symmetrical" generally means "closer to bilateral symmetry". If a house has an equal number of windows at the same level on either side of the front door, we think of it as symmetrical, even though the windows may not be exactly the same, the door itself probably has a doorknob on one side or the other and so forth, so it's not quite exactly symmetrical. We'd still say it's pretty symmetrical, even though from a mathematical point of view it either has bilateral symmetry or it doesn't (and in the real world, nothing we can see is perfectly symmetrical).

That should go some way toward explaining why, along with so many other things, symmetry doesn't necessarily mean the same thing in its mathematical sense as it does ordinarily. The mathematical definition includes things that we don't necessarily think of as symmetry.

Continuing with shapes and their symmetries, you can think of each shape as a macrostate. You can associate a microstate with each mapping (technically, in this case, any rigid transformation of the plane) that leaves the shape unchanged. The macrostate W has two microstates: one for the identity transformation, which leaves the plane unchanged, and one for the mirror transformation around the W's axis.

The X macrostate has four microstates, one for the identity, one for the flip around the vertical axis, one for the flip around the horizontal axis, and one for flipping around one axis and then the other (in this case, it doesn't matter what order you do it in). The X macrostate has a larger symmetry group, which is the same as saying it has more entropy.

In this context, a symmetry is something you can do to the microstate without changing the macrostate. A larger symmetry group -- more symmetry -- means more microstates for the same macrostate, which means more entropy, and vice-versa. They're two ways of looking at the same thing.

In the case of the marbles in a box, a symmetry is any way of switching the positions of the marbles, including not switching them around at all. Technically, this is a permutation group.

For any given microstate, some of the possible permutations just switch the marbles around in their places (for example, switching the first two marbles in a lined-up row), and some of them will move marbles to different compartments. For a microstate of the lined-up macrostate [5], there are many fewer permutations that leave the marbles in the same macrostate (all in one row, though not necessarily the same row) than there are for [2, 1, 1, 1]. Even though five marbles in a row looks more symmetrical, since it happens to have bilateral visual symmetry, it's actually a much less symmetrical macrostate than [2, 1, 1, 1], even though most of its microstates will just look like a jumble.

In the real world, distributing marbles in boxes is really distributing energy among particles, generally a very large number of them. Real particles can be in many different states, many more than the marble/no marble states in the toy example, and different states can have the same energy, which makes the math a bit more complicated. Switching marbles around is really exchanging energy among particles, and there are all sorts of intricacies about how that happens.

Nonetheless, the same basic principles hold: Entropy is a measure of the number of microstates for a given macrostate, and a system in equilibrium will evolve toward the highest-entropy macrostate available, and stay there, simply because the probability of anything else happening is essentially zero.

And yeah, symmetry doesn't necessarily mean what you think it might.

Wednesday, September 25, 2024

Amplified intelligence, or what makes a computer a computer?

I actually cut two chunks out of What would superhuman intelligence even mean?. I think the one that turned into this post is the more interesting of the two, but this one's short and I didn't want to discard it entirely.

Two very clear cases of amplified human intelligence are thousands of years old: writing and the abacus. Both of them amplify human memory, long-term for writing and short-term for the abacus. Is a person reading a clay tablet or calculating with an abacus some sort of superhuman combination of human and technology? No? Why not?

Calculating machines and pieces of writing are passive. They don't do anything on their own. They need a human, or something like a human, to have any effect. Fair enough. To qualify as superhuman by itself, a machine needs some degree of autonomy.

Autonomous machines are more recent than computing and memory aids. The first water clocks were probably built two or three thousand years ago, and there is a long tradition in several parts of the world of building things that, given some source of power, will perform a sequence of actions on their own without any external guidance.

But automata like clocks and music boxes are built to perform a particular sequence of actions from start to finish, though some provide a way to change the program between performances. Many music boxes use some sort of drum that encodes the notes of the tune and can be swapped out to play a different tune, for example. Nevertheless, once the automaton starts its performance, it's going to perform whatever it's been set up to perform.

There's one more missing piece: The ability to react to the external world, to do one thing based on one stimulus and a different thing based on a different stimulus, that is, to perform conditional actions. Combine this with some sort of updatable memory and you have the ability to perform different behavior based on something that happened in the past, or even multiple things that happened at different points in the past.

My guess is that both of those pieces are also older than we might think, but the particular combination of conditional logic and memory that they use is the real difference between the modern computers that first appeared in the mid twentieth century and the automata of the centuries before.

AGI, goals and influence

While putting together What would superhuman intelligence even mean? I took out a few paragraphs that seemed redundant at the time. While I think that post is better for the edit, when I re-read the deleted material, I realized that there was one point in them that I didn't explicitly make in the finished post. Here's the argument (If you have "chess engine" on your AI-post bingo card, yep, you can mark it off yet again. I really do think it's an apt example, but I'm even getting tired of mentioning it):

When it comes to question what are the implications of AGI?, actual intelligence is one factor among many. A superhuman chess engine poses little if any risk. A simple non-linear control system that can behave chaotically is a major risk if it's controlling something dangerous.

To the extent that a control system with some sort of general superintelligence is hard to predict and may make decisions that don't align with our priorities, it would be foolhardy to put it directly in charge of something dangerous. Someone might do that anyway, but that's a hazard of our imperfect human judgment. A superhuman AI is just one more dangerous thing that humans have the potential to misuse.

The more interesting risk is that an AI with limited control of something innocuous could leverage that into more and more control, maybe through extortion -- give the system control of the power plants or it will destroy all the banking data -- or persuasion -- someone hooks a system up to social media where its accounts convince people in power to put it in charge of the power plants.

These are worthy scenarios to contemplate. History is full of examples of human intelligences extorting or persuading people to do horribly destructive things, so why would an AGI be any different? Nonetheless, in my personal estimation, we're still quite a ways from this actually happening.

Current LLMs can sound persuasive if you don't fact-check them and don't let them go on long enough to say something dumb -- which in my experience is not very long -- but what would a chatbot ask for? Whom would it ask? How would the person or persons carry out its instructions? (I initially said "its will", rather than "its instructions", but there's nothing at all to indicate that a chatbot has anything resembling will)

You could imagine some sort of goal-directed agent using a chatbot to generate persuasive arguments on its behalf, but, at least as it stands, I'd say the most likely goal-directed agent for this would a human being using a chatbot to generate a convincing web of deception. But human beings are already highly skilled at conning other human beings. It's not clear what new risk generative AI presents here.

Certainly, an autonomous general AI won't trigger a cataclysm in the real world if it doesn't exist, so in that sense, the world is safer without it. Eventually, though, the odds are good that something will come along that meets DeepMind's definition of AGI (or ASI). Will that AI's skills include parlaying whatever small amount of influence it starts with into something more dangerous? Will its goals include expanding its influence, even if we don't think they do at first?

The idea of an AI with seemingly harmless goals becoming an existential threat to humanity is a staple in fiction (and the occasional computer game). It's good that people have been exploring it, but it's not clear what conclusions to draw from those explorations, beyond a general agreement that existential threats to humanity are bad. Personally, I'm not worried yet, at least not about AGI itself, but I've been wrong many times before.

Sunday, September 22, 2024

Experiences, mechanisms, behaviors and LLMs

This is another post that sat in the attic for a few years. It overlaps a bit with some later posts, but I thought it was still worth dusting off and publishing. By "dusting off", I mean "re-reading, trying to edit, and then rewriting nearly everything but the first few paragraphs from scratch, making somewhat different points."

Here are some similar-looking questions:

Someone writes an application that can successfully answer questions about the content of a story it's given. Does it understand the story?
Other primates can watch each other, pick up cues such as where the other party is looking, and react accordingly. Do they have a "theory of mind", that is, some sort of mental model of what the other is thinking, or are they just reacting directly to where the other party is looking and other superficial clues (see this previous post for more detail)?
How can we tell if something, whether it's a person, another animal, an AI or something else, is really conscious, that is, having conscious experiences as opposed to somehow unconsciously doing everything a conscious being would do?
In the case of the hide-and-seek machine learning agents (see this post and this one), do the agents have some sort of model of the world?
How can you tell if something, whether it's a baby human, another animal or an AI, has object permanence, that is, the ability to know that an object exists somewhere that it can't directly sense?
In the film Blade Runner, is Dekker a replicant?

These are all questions about how things, whether biological or not, understand and experience the world (the story that Blade Runner is based on says this more clearly in its title, Do Androids Dream of Electric Sheep?). They also have a common theme of what you can know about something internally based on what you can observe about it externally. That was originally going to be the main topic, but the previous post on memory covered most of the points I really wanted to make, although from a different angle.

In any case, even though the questions seem similar, some differences appear when you dig into and try to answer them.

The question of whether something is having conscious experiences, or just looks like it, also known as the "philosophical zombie" problem, is different from the others in that it can't be answered objectively, because having conscious experiences is subjective by definition. As to Dekker, well, isn't it obvious?

There are several ways to interpret the others, according to a distinction I've already made in a couple of other posts:

Does the maybe-understander experience the same things as we do when we feel we understand something (perhaps an "aha ... I get it now" sort of feeling). As with the philosophical zombie problem, this is in the realm of philosophy, or at least it's unavoidably subjective. Call this the question of experience.
Does the maybe-understander do the same things we do when understanding something (in some abstract sense). For example, if we read a story that mentions "tears in rain", does the understander have something like memories of crying and of being in the rain, that it combines into an understanding of "tears in rain" (there's a lot we don't know about how people understand things, but it's probably roughly along those lines). Call this the question of mechanism.
Does the maybe-understander behave similarly to how we do if we understand something? For example, if we ask "What does it mean for something to be washed away like tears in rain?" can it give a sensible answer? Call this the question of behavior.

The second interpretation may seem like the right one, but it has practical problems. Rather than just knowing what something did, like what answer it gave to your questions, you have to be able to tell what internal machinery it has and how it uses it, which is difficult to do objectively (I go into this from a somewhat different direction in the previous post).

The third interpretation is much easier to answer rigorously and objectively, but, once you've decided on a set of test cases, what does a "yes" answer actually mean? At the time of this writing, chatbots can give a decent answer to a question like the one about tears in rain, but it's also clear that they don't have any direct experience of tears, or rain.

Over the course of trying to understand AI in general, and the current generation in particular, I've at least been able to clarify my own thinking concerning experience, mechanism and behavior: It would be nice to be able to answer the question of experience, but that's not going to happen. It's not even completely possible when it comes to other people, much less other animals or AIs, even if you take the commonsense position that other people do have the same sorts of experiences as you do.

You and I might look at the same image or read the same text and say similar things about it, but did you really experience understanding it the way I did? How can I really know? The best I can do is ask more questions, look for other external cues (did you wrinkle your forehead when I mentioned something that seemed very clear to me?) and try to form a conclusion as best I can.

Even understanding of individual words is subjective in this sense. The classic question is whether I understand the word blue the same way you do. Even if some sort of functional MRI can show that neurons are firing in the same general way in our brains when we encounter the word blue, what's to say I don't experience blueness in the same way you experience redness and vice versa?

The question of behavior is just the opposite. It's knowable, but not necessarily satisfying. The question of mechanism is somewhere in between. It's somewhat knowable. For example, the previous post talks about how memory in transformer-based models appears to be fundamentally different from our memory (and that of RNN-based models). It's somewhat satisfying to know something more about how something works, in this case being able to say "transformers don't remember things in the same sense that we do". [Re-reading, I've had to make a couple of edits for clarity, because words like how can refer to both mechanism and behavior, which is exactly what I'm trying to distinguish. In this particular case, the difference is in both behavior and mechanism -- D.H. Jan 2025]

Nonetheless, as I discussed in a different previous post, the problem of behavior is most relevant when it comes to figuring out the implications of having some particular form of AI in the real world. There's a long history of attempts to reason "This AI doesn't have X, like we do, therefore it isn't generally intelligent like we are" or "If an AI has Y, like we do, it will be generally intelligent and there will be untold consequences", only to have an AI appear that people agree has Y but doesn't appear to be generally intelligent. The latest Y appears to be "understanding of natural language".

But let's take a closer look at that understanding, from the point of view of behavior. There are several levels of understanding natural language. Some of them are:

Understanding of how words fit together in sentences. This includes what's historically been called syntax or grammar, but also more subtle issues like how people say big, old, gray house rather than old, gray, big house
Understanding the content of a text, for example being able to answer "yes" to Did the doctor go to the store? from a text like The doctor got up and had some breakfast. Later, she went to the store. Questions like these don't require any detailed understanding of what words actually mean.
Understanding meaning that's not directly in a text. If the text is The doctor went to the store, but the store was closed. What day was it? The doctor remembered that the regular Wednesday staff meeting was yesterday. There was a sign on the door: Open Sun - Wed 10 to 6, Sat noon to 6, then to correctly answer Did the doctor go to the store with something like Yes, but it was Thursday and the store was closed, rather than a simple yes without further explanation.

From a human point of view, the stories in the second and third bullet points may seem like the same story in different words, but from an AI point of view one is much harder than the other. But current chatbots can do all three of these, so from a behavioral point of view it's hard to argue that they don't understand text, even though they clearly don't use the same mechanisms.

This is a fairly recent development. The earlier draft of this post noted that chatbots at the time might do fine for a prompt that required knowing that Thursday comes after Wednesday but completely fail on the same prompt using Sunday and Monday. Current [Sep 2024] models do much better with this sort of thing, so in some sense they know more and understand better than the ones from 2019, even if it's not clear what the impact of this has been in the world at large.

Chatbots don't have direct experience of the physical world or social conventions. What they do have is the ability to process text about experiences in the physical world and social conventions. One way of looking at a chatbot is as a simulation of "what would the internet say about this?" or, a bit more precisely, "based on the contents of the training text, what text would be generated in response to the prompt given?" Since that text was written (largely) by people with experiences of the physical world and social conventions, a good simulation will produce results similar to those of a person.

From the point of view of behavior, this is interesting. An LLM is capturing something about the training text that enables behavior that we would attribute to understanding.

It might be interesting to combine a text-based chatbot that can access textual information about the real world with a robot actually embedded in the physical world, and I think there have been experiments along those lines. A robot understands the physical world in the sense of being able to perceive things and interact with them physically. In what sense would the combined chatbot/robot system understand the physical world?

From the point of view of mechanism, there are obvious objections to the idea that chatbots understand the text they're processing. In my view, these are valid, but how relevant they are depends on your perspective. Let's look at a couple of possible objections.

It's just manipulating text. This hearkens back to early programs like ELIZA, which manipulated text in very obvious ways, like responding to I feel happy with Why do you feel happy? because the program will respond to I feel X with Why do you feel X? regardless of what X is. While the author of ELIZA never pretended it was understanding anything, it very much gave the appearance of understanding if you were willing to believe it could understand to begin with, something many people, including the author, found deeply unsettling.

On the one hand, it's literally true that an LLM-based chatbot is just manipulating text. On the other hand, it's doing so in a far from obvious way. Unlike ELIZA, an LLM is able to encode, one way or another, something about how language is structured, facts like "Thursday comes after Wednesday" and implications like "if a store's hours say it's open on some days, then it's closed on the others" (an example of "the exception proves the rule" in the original sense -- sorry, couldn't help it).

As the processing becomes more sophisticated, the just in It's just manipulating text does more and more work. At the present state of the art, a more accurate statement might be It's manipulating text in a way that captures something meaningful about its contents.

It's just doing calculations: Again, this is literally true. At the core of a current LLM is a whole lot of tensor-smashing, basically multiplying and adding numbers according to a small set of well-defined rules, quadrillions of times (the basic unit of computing power for the chips that are used is the teraflop, or trillion floating-point arithmetic operations per second, single chips can do hundreds of teraflops and there may be many such chips involved in answering a particular query).

But again, that just is doing an awful lot of work. Fundamentally, computers do two things

They perform basic calculations, such as addition, multiplication and various logical operations, on blocks of bits
They copy data from one location to another, based on the contents of blocks of bits

That second bullet point includes both conditional logic (since the instruction pointer is one place to put data) and the "pointer chasing" that together underlie a large swath of current software and were particularly important in early AI efforts. While neural net models do a bit of that, the vast bulk of what they do is brute calculation. If anything, they're the least computer science-y and most calculation-heavy AIs [to be clear, there's still a lot of legit CS in making all that work at scale -- D.H. Jan 2025].

Nonetheless, all that calculation is driving something much more subtle, namely simulating the behavior of a network of idealized neurons, which collectively behave in a way we only partly understand. If an app for, say, calculating the price of replacing a deck or patio does a calculation, we can follow along with it and convince ourselves that the calculation is correct. When a pile of GPUs cranks out the result of running a particular stream of input through a transformer-based model, we can make educated guesses as to what it's doing, but at in many contexts the best description is "it does what it does".

In other words, it's just doing calculations may look the same as it's just doing something simple, but that's not really right. It's doing lots and lots and lots of simple things on far too much data for a human brain to understand directly.

All of this is just another way to say that while the question of mechanism is interesting, and we might even learn interesting things about our own mental mechanisms by studying it, it's not particularly helpful in figuring out what to actually do regarding the current generation of AIs.

Tuesday, September 10, 2024

Tying up a few loose ends about models and memory

Most of the time when I write a post, I finish it up before going on to the next one. Sometimes I'll keep a draft around if something else comes up before I have something that feels ready to publish, and sometimes weeks or even months can pass between then and actually publishing, but I still prefer to publish the current post before starting a new one.

However, a while ago ... nearly five years ago, it looks like ... I ran across an article on a demo by Open AI that played games of hide-and-seek in virtual environments. Over the course of hundreds of millions of games, the hiders and seekers developed strategies and counter-strategies, including some that the authors of the article called "tool use".

I've put that in "scare quotes" ("scare quotes" around "scare quotes" because what's scary about noting that it was someone else who said something?), but I don't really have a problem with calling something like moving objects around in a world, real or virtual, to get an advantage "tool use" (those are use/mention quotes, if anyone's keeping score).

As usual, though, I'm more interested in the implications than the terminology, and this seemed like another example of trying to extrapolate from "sure, we can use terms like tool use and planning here with a straight face" to "AI systems are about to develop whatever it is we think is special about our intelligence, which means they might be about to take over the world."

Writing that brought a thought to mind that I'm not sure I've really articulated before: To whatever extent we've taken over the world, it's taken us on the order of 70,000 years to get here, depending on how you count. In that light, it seems a bit odd to conclude that anything else with intelligence similar to ours will be running the place overnight, especially if we know they're coming.

But I'm digressing from what was already a digression. In the process of putting together several posts prompted by that article, and still being in that process when ChatGPT happened, I ended up pondering some questions that didn't quite make it into other posts, at least not in the form that they originally occurred to me.

So here we are:

First, I was most intrigued by the idea that the hide-and-seek agents seemed to have object permanence, that is, the ability to remember that something exists even when you can't see it or otherwise perceive it directly.

This is famously a milestone in human development. As with many if not most cognitive abilities, understanding of object permanence has evolved over time, and there is no singular point at which babies normally "acquire object permanence" (call those whatever kind of quotes you like).

Newborn babies do not appear to have any kind of object permanence, but in their first year or two they pass several milestones, including what the Wikipedia article I linked to calls "Coordination of secondary circular reactions", which among other things means "the child is now able to retrieve an object when its concealment is observed" (straight-up "this is what the article said" quotes there, and I think I'll stop this game now).

The hide-and-seek agents seem to have similar abilities, particularly being able to return to the site of an object they've discovered or to track how many objects have been moved out of sight to the left versus to the right. There are two interesting questions here:

Do the hide-and-seek agents have the same object permanence capabilities as humans?
Do the hide-and-seek agents have object permanence in the same way as humans?

I'm making the same distinction here that I have in previous posts. The first question can be answered directly: Put together an experiment that requires remembering where objects were or which way they've gone and see if the agents perform similarly to humans.

The second is more difficult to answer, because it can't be answered directly. Instead, we have to form a theory about how humans are able to track the existence of unseen objects, and then test whether that theory is consistent with what humans actually do, and then, once there is a way of testing whether someone or something has that particular mechanism, try the same tests on the hide-and-seek agents. Assuming that all goes well, you still don't have an airtight case, but you have reason to believe that the agents are doing similar things to what humans do when demonstrating object permanence (in some particular set of senses).

There's actually a third question: Are the hide-and-seek agents experiencing objects and events in their world the same way we experience objects and events in our world? I would call that a philosophical question, probably unknowable in some fundamental sense. That's not to say that there's no point in exploring it, or exploring whether or not such things are knowable, just that at this point we're far outside the realm of verifiable experiments -- unless some clever philosopher is able to devise an experiment that will give us a meaningful answer.

The interesting part here is that we have a pretty good idea how agents such as the hide-and-seek agents are able to have capabilities like object permanence. In broad strokes, a hide-and-seek agent is consuming a stream of inputs analogous to our own sensory inputs such as sight and sound. In particular (quoting from the OpenAI blog post):

The agents can see objects in their line of sight and within a frontal cone.
The agents can sense distance to objects, walls, and other agents around them using a lidar-like sensor.

At any given time step, the agents are given a summary of what is visible at what distance at that time (rather than, say, getting an image and having to deduce from the pixels what objects are where), or at least I believe this is what the blog post means by "Agents use an entity-centric state-based representation of the world" From this, each agent produces a stream of actions: move, grab an object or lock an object (which prevents other agents from moving it).

In between the stream of inputs and the actions taken at a particular timestep is a neural network which is trained to extract the important parts from the input stream and turn them into actions. This neural network is trained based on the results of millions of simulated games of hide-and-seek, but it's static for any particular game. In some sense, it's encoding a memory of what happened in all the games it's been trained on -- producing this particular stream of actions in response to this particular stream of input resulted in success or failure, times many millions -- but it's not encoding anything about the current game.

Just going by the blog post, I can't tell exactly what sort of memory the agents do have, but from the context of how transformer-based models work, it is a memory of the input stream, either from the beginning of the current game or over a certain window. That is, at any particular timestep, the agent can not only use what it can sense at that time step, but also what it has sensed at previous time steps.

This makes object permanence a little less mysterious. If an agent sensed a box dead ahead and ten units away, then it turned 90 degrees to the right and went three units forward, it's not too surprising for it to act as though there is now a box three units behind it and ten units to the left, given that it remembers both of those things happening.

The key here is "act as though". In the same situation, a person would have some sort of mental image of a box in a particular location. The only things that the hide-and-seek agent is explicitly remembering about the current game is what it's sensed so far.

Presumably, there is something in the neural net that turns "I saw a box at this distance" followed by "I moved in such-and-such a way" into a signal deeper in the net that in some sense means "there is a box at this location", in some sort of robust and general way so that it can encode box locations in general, not just any particular example. Even deeper layers can then use this representation of the world to work out what kinds of actions will have the greatest chance of success. This is probably not exactly what's going on but ... something along those lines.

Is it possible that humans do something similar when remembering locations of objects? It's possible, but people don't always seem to have sequences of events in mind when remembering where objects are. I can be helpful to remember things like "I came downstairs with my keys and then I was talking to you and I think I left the keys on the table", but it doesn't seem to be necessary. If I tell you that I left the keys on the table in a room of a house you've never been to, you can still find the keys. If all I remember is that I left the keys on the table, but I'm not exactly sure how that came to be, I can still find them.

In other words, we seem to form mental images of places and the objects in them. While one way to form such an image is by experiencing moving through a place and observing objects in it, it's not the only way, and we can still access our mental map of places and things in them even after the original sequence of experiences is long forgotten.

We appear to remember things after doing significant processing and throwing away the input that led to the memories (or at least separating our memory of what happened from the memory of what's where). The way that transformer-based models handle sequences of events is not only different from what we appear to do, it's deliberately different.

Bear in mind that I'm not an expert here. I've done a bit of training on the basics of neural net-based ML and I've read up a bit on transformers and related architectures, so I think what follows is basically accurate, but I'm sure an actual expert would have some notes and corrections.

One definition before we dive in: token is the general term for an item in a stream of input, representing something on the order of a word of text or the description of what an agent senses, after it's been boiled down to a vector of numbers by a procedure that varies depending on the particular kind of input.

The problem of attention -- how heavily to weight different tokens in a stream of input -- has been the subject of active research for decades. Transformers handle this differently from other types of models. The previous generation of models used Recurrent Neural Networks (RNNs) that did something more like maintaining short-term memory of what's going on. Each input token is processed by a net to produce two sets of signals: output signals that say what to do at that particular point, and hidden state signals, that are fed back as inputs when processing the next input token.

In some sense, the hidden state signals signals represent the state of the model's memory at that point. Giving a token extra attention means boosting its signal in the hidden state that will be used in processing the next token, and indirectly in processing the tokens after that.

This has two problems: First, because the inputs to the net depend on the hidden state outputs from previous tokens, you have to compute one token at a time, which means you can't just throw more hardware at processing more tokens. More hardware might make each individual step faster, but only up to the limits of current hardware. It's going to take 10,000 steps to process 10,000 tokens, no matter what.

Second, essentially since everything that's come before is boiled down into a set of hidden state signals, the longer ago an input token was processed, the less influence it can have on the final result (the "vanishing gradient problem"). Even if a token has a large influence on the hidden state when it's processed, that influence will get washed out as more tokens are processed.

Unfortunately, events that happened long ago can be more important than ones that happened more recently. Imagine someone saying "I don't think that ..." followed by a long, overly-detailed explanation of what they don't think. The "not" in "don't" may well be more important than the fourth bullet point in the middle.

Even though an RNN works roughly the same way that our brains work, receiving inputs one at a time and maintaining some sort of memory of what's happened, models based purely on hidden state don't perform very well, probably because our own memories do more than just maintain a set of feedback signals. There have been attempts to use more sophisticated forms of memory in RNNs, particularly "Long Short-Term Memory" (LSTM). This works better than just using hidden state, and it was the state of the art before transformers came along.

Transformers take a completely different approach. At each step, they take as input the entire stream of tokens so far. At timestep 1, the model's output is based on what's happening then. At timestep 2, it's based on what happened at timestep 1 and what's happening at timestep 2, and so on. If you only give the model "this happened at timestep 1 and this happened at timestep 2", it should produce the same results whether or not it was ever asked to produce a result for timestep 1.

Processing an input stream at one timestep does not affect how it will process an input stream at any other timestep. The only remembering going on is remembering the whole of the input stream. This means that any token in the input stream can be given as much importance as any other.

A transformer consists of two parts. The first digests the entire input stream and picks out the important parts. It can do this in multiple ways. One "head" in a language-processing model might weight based on what words are next to each other. Another might pay attention to verbs and their objects. Input tokens are tagged with their position in the stream, so a transformer trained to work on text could weight "I don't think that ..." in early positions as being important, or look for some types of words close to other types of words.

Whatever actually comes out of that stage goes into another network that decides what output to actually produce (this network actually consists of multiple stages, and the whole attention-and-other-processing setup can be repeated and stacked up, but that's the basic structure).

A transformer-based model does this at every timestep, which means that the first input token is processed at every timestep, the second one is processed at every timestep but the first, and so forth. This means that handling twice as long a stream of input will require approximately four times as much processing, three times as much will require nine times as much and so on. Technically, the amount of processing require grows quadratically with the size of the input.

For similar reasons, the network that handles attention grows quadratically in the size of the input, at least without some sort of optimization. In this sense, a transformer is less efficient than an RNN, since it will use more computing resources.

Crucially, though, this can all be done by "feed-forward" networks, that is, networks that don't have feedback loops. If you want to be able to process a longer stream of input tokens, you'll need a larger network for the attention stage, and probably more for the later stages as well since there will probably be more output from the attention stage, but you can make both of those bigger by throwing more hardware at them.

Processing twice as big an input stream requires more hardware, but it doesn't take twice as much "wall time" (time on the clock on the wall), even if it takes four times as much CPU time (total time spent by all the processors). Being able to handle a long stream of input quickly is what enables networks to incorporate what happened in the whole history of a stream when deciding what to output.

Transformer-based models, which currently give the best results, don't process events in the world the same way we do. They don't remember anything from input token to input token (that is, timestep to timestep). Instead, they remember everything that has happened up to the current time, and figure out what to do based on that.

This produces the same kind of effects as our memories do, including the effect of object permanence. In our case, if we see a ball roll behind a wall, we remember that there's a ball behind the wall (assuming nothing else happens). In a transformer-based hide-and-seek model, an agent's behavior will likely differ for an input stream that includes a ball moving behind a wall than for one that doesn't, so the model acts like it remembers that there's a ball behind the wall.

It looks like humans are doing something the hide-and-seek agents don't do when dealing with a world of objects, namely maintaining a mental map of the world, even though the agents can produce similar results to what we can. Again, this shouldn't be too surprising. Chess engines are capable of "positional play" and other behaviors that were once thought to be unique to humans even though they clearly use different mechanisms. Chatbots can produce descriptions of seeing, smelling and tasting things that they've clearly never seen, smelled or tasted, and so forth.

Are we "safe" (definitely scare quotes) since these agents aren't forming mental images in the same way we appear to? Wouldn't that mean that they lack the "true understanding" that we have, or some other quality unique to us, and therefore they won't be able to outsmart us? I would say don't bet on it. Chess engines may not have the same sense of positional factors as humans, but they still play much stronger chess.

So are we doomed, then? I wouldn't bet on that either, for reasons I go into in this post and elsewhere.

The one thing that seems clear is that human memory of the world doesn't work the same way as it does for the hide-and-seek agents, or for AIs built on similar principles. In both cases there appears to be some sort of processing of a stream of sense input into a model of what's where. The difference seems to be more that the memory part is happening at a different stage and has a completely different structure.

Sunday, August 11, 2024

Metacognition, metaphor and AGI

In the recent post on abstract thought, I mentioned a couple of meta concepts: metacognition and metaphor.

Metacognition is the ability to think about thinking. I've discussed it before, particularly in this post and these two posts.
Metaphor is a bit harder to define, though there is no shortage of definitions, but the core of it involves using the understanding of one thing to understand a different thing. I've also discussed this before, particularly in this post and this one.

When I was writing the post on abstract thought, I had it in mind that these two abilities have more to do with what we would call "general intelligence" (artificial or not), so I wanted to try to get into that here, without knowing exactly where I'll end up.

In that earlier post, I identified two kinds of abstraction:

Defining things in terms of properties, for example, a house is a building that people live in. I concluded that this isn't essential to general intelligence. At this point, I'd say it's more a by-product of how we think, particularly how we think about words.
Identifying discrete objects (in some general sense) out of the stream of sensory input we encounter, for example, being able to say "that sound was a dog barking". I concluded that this is basic equipment for dealing with the world. At this point, I'd say it's worth noting that LLMs don't do this at all. They have it done for them by the humans that produce the words they're trained on and receive as prompts. On the other hand specialized AIs, like speech recognizers, do exactly this.

It was the first kind of abstraction that led me back to thinking about metaphor.

Like the second kind of abstraction, metaphor is everywhere, to the point that we don't even recognize it until we think to look. For example:

the core of it (a concept has a solid center, with other, softer parts around it)
I had it in mind (the mind is a container of ideas)
I wanted to try to get into that (a puzzle is a space to explore; you know more about it when inside it than outside)
without knowing exactly where I'll end up (writing a post is going on a journey, destination unknown)
at this point (again, writing a post is a journey)
this is basic equipment (mental abilities are tools and equipment)
led me back to thinking (a chain of thought is a path one can follow)
to the point (likewise)

While there's room for discussion as to the details, in each of those cases I'm talking about something in the mind (concepts, the process of writing a blog post ...) in terms of something tangible (a soft object with a core, a journey in the physical world ...).

Metaphor is certainly an important part of intelligence as we experience it. It's quite possible, and I would personally say likely, that the mental tools we use for dealing with the physical world are also used in dealing with less tangible things. For example, the mental circuitry involved in trying to follow what someone is saying probably overlaps with the mental circuitry involved in trying to follow someone moving in the physical world.

This would include not only focusing one's attention on the other person, but also building a mental model of the other person's goals so as to anticipate what they will do next, and also recording what the person has already said in a similar way to recording where one has already been along a path of motion. If some of the same mental machinery is involved in both processes -- listening to someone speak, and physically following them -- then on some level we probably experience the two similarly. If so, it should be no surprise that we use some of the same words in talking about the two.

The overlap is not exact, or else we actually would be talking about the same things, but the overlap is there nonetheless. This can happen in more than one way at the same time. If you're speaking aggressively to me, I might experience that in a similar way to being physically menaced, and I might say things like Back off or Don't attack me, even while I might also say I'm not following you if I can't quite understand what you're saying, but I still feel like it's meant aggressively.

It's interesting that these examples of metaphor, about processing what someone is saying, also involve metacognition, thinking about what the other person is thinking. That's not always the case (consider this day is just rushing by me or it looks like we're out of danger). Rather, we use metaphor when thinking about thinking because we use metaphor generally when thinking about things.

If you buy that metaphor is a key part of what we think of as our own intelligence, is it a key part of what we would call "general intelligence" in an AI? As usual, that seems more like a matter of definition. I've argued previously that the important consideration with artificial general intelligence is its effect. For example, we worry about trying to control a rogue AI that can learn to adapt to our attempts to control it. This ability to adapt might or might not involve metaphor. It might well involve metacognition -- modeling what we're thinking as we try to control it, but maybe not.

Consider chess engines. As noted elsewhere, it's clear that chess engines aren't generally intelligent, but it's also clear that they are superhuman in their abilities. Human chess players clearly use metaphor in thinking about chess, not just attack and defense, but space, time, strength, weakness, walls, gaps, energy and many others. Classic AB chess engines (bash out huge numbers of possible continuations and evaluate them using an explicit formula) clearly don't use metaphor.

The situation with neural network (NN) engines (bash out fewer possible continuations and evaluate them using a neural net) is slightly muddier, since in some sense the evaluation function is looking for similarities with other chess positions, but that's the key point: the NN is comparing chess positions to other chess positions, not to physical-world concepts like space, strength and weakness. You could plausibly say that NNs use analogy, but metaphor involves understanding one thing in terms of a distinct other thing.

Likewise, neither sort of chess engine builds a model of what its opponent is thinking, only of the possible courses of action that the opponent might take, regardless of how it decides to take them. By contrast, human chess players very frequently think about what their opponent might be thinking (my opponent isn't comfortable with closed positions, so I'm going to try to lock up the pawn structure). Human chess players, being human, do this because we humans do this kind of thing constantly when dealing with other people anyway.

One the one hand, metaphors only become visible when we use words to describe things. On the other hand, metaphor (I claim here) comes out of using the mental machinery for dealing with one thing to deal with another thing (and in particular, re-using the machinery for dealing with the physical world to deal with something non-physical). More than that, it comes out of using the same mental machinery and, in some sense, being aware of doing it, if only in experiencing some of the same feelings in each case (there's a subtle distinction here between being aware and being consciously aware, which might be interesting to explore, but not here).

If we define an AGI as something of our making that is difficult to control because it can learn and adapt to our attempts to control it, then we shouldn't assume that it does so in the same ways that we do. Meta-thought like explicitly creating a model of what someone (or something) else is thinking, and using metaphor to understand one thing in terms of another may be key parts of our intelligence, but I don't see any reason to think they're necessarily part of being an AGI in the sense I just gave.

The other half of this is chains of reasoning like "If this AI can do X, which is key to our intelligence, then it must be generally intelligent like we consider ourselves to be" rests on whether abilities like metacognition and metaphorical reasoning are sufficient for AGI.

That may or may not be the case (and it would help if we had a better understanding of AGI and intelligence in general), but so far there's a pretty long track record of things, for example being able to deal with natural language fluently, turning out not to necessarily lead to AGI.