Wednesday, September 25, 2024

Amplified intelligence, or what makes a computer a computer?

I actually cut two chunks out of What would superhuman intelligence even mean?.  I think the one that turned into this post is the more interesting of the two, but this one's short and I didn't want to discard it entirely.


Two very clear cases of amplified human intelligence are thousands of years old: writing and the abacus.  Both of them amplify human memory, long-term for writing and short-term for the abacus.  Is a person reading a clay tablet or calculating with an abacus some sort of superhuman combination of human and technology?  No?  Why not?

Calculating machines and pieces of writing are passive.  They don't do anything on their own.  They need a human, or something like a human, to have any effect.  Fair enough.  To qualify as superhuman by itself, a machine needs some degree of autonomy.

Autonomous machines are more recent than computing and memory aids.  The first water clocks were probably built two or three thousand years ago, and there is a long tradition in several parts of the world of building things that, given some source of power, will perform a sequence of actions on their own without any external guidance.

But automata like clocks and music boxes are built to perform a particular sequence of actions from start to finish, though some provide a way to change the program between performances.  Many music boxes use some sort of drum that encodes the notes of the tune and can be swapped out to play a different tune, for example.  Nevertheless, once the automaton starts its performance, it's going to perform whatever it's been set up to perform.

There's one more missing piece: The ability to react to the external world, to do one thing based on one stimulus and a different thing based on a different stimulus, that is, to perform conditional actions.  Combine this with some sort of updatable memory and you have the ability to perform different behavior based on something that happened in the past, or even multiple things that happened at different points in the past.

My guess is that both of those pieces are also older than we might think, but the particular combination of  conditional logic and memory that they use is the real difference between the modern computers that first appeared in the mid twentieth century and the automata of the centuries before.

AGI, goals and influence

While putting together What would superhuman intelligence even mean? I took out a few paragraphs that seemed redundant at the time.  While I think that post is better for the edit, when I re-read the deleted material, I realized that there was one point in them that I didn't explicitly make in the finished post.  Here's the argument (If you have "chess engine" on your AI-post bingo card, yep, you can mark it off yet again. I really do think it's an apt example, but I'm even getting tired of mentioning it):


When it comes to question what are the implications of AGI?, actual intelligence is one factor among many.  A superhuman chess engine poses little if any risk.  A simple non-linear control system that can behave chaotically is a major risk if it's controlling something dangerous.

To the extent that a control system with some sort of general superintelligence is hard to predict and may make decisions that don't align with our priorities, it would be foolhardy to put it directly in charge of something dangerous.  Someone might do that anyway, but that's a hazard of our imperfect human judgment.  A superhuman AI is just one more dangerous thing that humans have the potential to misuse.

The more interesting risk is that an AI with limited control of something innocuous could leverage that into more and more control, maybe through extortion -- give the system control of the power plants or it will destroy all the banking data -- or persuasion -- someone hooks a system up to social media where its accounts convince people in power to put it in charge of the power plants.

These are worthy scenarios to contemplate.  History is full of examples of human intelligences extorting or persuading people to do horribly destructive things, so why would an AGI be any different? Nonetheless, in my personal estimation, we're still quite a ways from this actually happening.

Current LLMs can sound persuasive if you don't fact-check them and don't let them go on long enough to say something dumb -- which in my experience is not very long -- but what would a chatbot ask for?  Whom would it ask? How would the person or persons carry out its instructions?  (I initially said "its will", rather than "its instructions", but there's nothing at all to indicate that a chatbot has anything resembling will)

You could imagine some sort of goal-directed agent using a chatbot to generate persuasive arguments on its behalf, but, at least as it stands, I'd say the most likely goal-directed agent for this would a human being using a chatbot to generate a convincing web of deception.  But human beings are already highly skilled at conning other human beings.  It's not clear what new risk generative AI presents here.

Certainly, an autonomous general AI won't trigger a cataclysm in the real world if it doesn't exist, so in that sense, the world is safer without it.  Eventually, though, the odds are good that something will come along that meets DeepMind's definition of AGI (or ASI).  Will that AI's skills include parlaying whatever small amount of influence it starts with into something more dangerous?  Will its goals include expanding its influence, even if we don't think they do at first?

The idea of an AI with seemingly harmless goals becoming an existential threat to humanity is a staple in fiction (and the occasional computer game).  It's good that people have been exploring it, but it's not clear what conclusions to draw from those explorations, beyond a general agreement that existential threats to humanity are bad.  Personally, I'm not worried yet, at least not about AGI itself, but I've been wrong many times before.

Sunday, September 22, 2024

Experiences, mechanisms, behaviors and LLMs

This is another post that sat in the attic for a few years.  It overlaps a bit with some later posts, but I thought it was still worth dusting off and publishing.  By "dusting off", I mean "re-reading, trying to edit, and then rewriting nearly everything but the first few paragraphs from scratch, making somewhat different points."


Here are some similar-looking questions:
  • Someone writes an application that can successfully answer questions about the content of a story it's given.  Does it understand the story?
  • Other primates can watch each other, pick up cues such as where the other party is looking, and react accordingly.  Do they have a "theory of mind", that is, some sort of mental model of what the other is thinking, or are they just reacting directly to where the other party is looking and other superficial clues (see this previous post for more detail)?
  • How can we tell if something, whether it's a person, another animal, an AI or something else,  is really conscious, that is, having conscious experiences as opposed to somehow unconsciously doing everything a conscious being would do?
  • In the case of the hide-and-seek machine learning agents (see this post and this one), do the agents have some sort of model of the world?
  • How can you tell if something, whether it's a baby human, another animal or an AI, has object permanence, that is, the ability to know that an object exists somewhere that it can't directly sense?
  • In the film Blade Runner, is Dekker a replicant?
These are all questions about how things, whether biological or not, understand and experience the world (the story that Blade Runner is based on says this more clearly in its title, Do Androids Dream of Electric Sheep?).  They also have a common theme of what you can now about something internally based on what you can observe about it externally.  That was originally going to be the main topic, but the previous post on memory covered most of the points I really wanted to make, although from a different angle.

In any case, even though the questions seem similar, some differences appear when you dig into and try to answer them.

The question of whether something is having conscious experiences, or just looks like it, also known as  the "philosophical zombie" problem, is different from the others in that it can't be answered objectively, because having conscious experiences is subjective by definition.   As to Dekker, well, isn't it obvious?

There are several ways to interpret the others, according to a distinction I've already made in a couple of other posts:
  • Does the maybe-understander experience the same things as we do when we feel we understand something (perhaps an "aha ... I get it now" sort of feeling).  As with the philosophical zombie problem, this is in the realm of philosophy, or at least it's unavoidably subjective.  Call this the question of experience.
  • Does the maybe-understander do the same things we do when understanding something (in some abstract sense).  For example, if we read a story that mentions "tears in rain", does the understander have something like memories of crying and of being in the rain, that it combines into an understanding of "tears in rain" (there's a lot we don't know about how people understand things, but it's probably roughly along those lines).  Call this the question of mechanism.
  • Does the maybe-understander behave similarly to how we do if we understand something?  For example, if we ask "What does it mean for something to be washed away like tears in rain?" can it give a sensible answer?  Call this the question of behavior.
The second interpretation may seem like the right one, but it has practical problems.  Rather than just knowing what something did, like how it answered a question, you have to be able to tell what internal machinery it has and how it uses it, which is difficult to do objectively (I go into this from a somewhat different direction in the previous post).

The third interpretation is much easier to answer rigorously and objectively, but, once you've decided on a set of test cases, what does a "yes" answer actually mean?  At the time of this writing, chatbots can give a decent answer to a question like the one about tears in rain, but it's also clear that they don't have any direct experience of tears, or rain.

Over the course of trying to understand AI in general, and the current generation in particular, I've at least been able to clarify my own thinking concerning experience, mechanism and behavior: It would be nice to be able to answer the question of experience, but that's not going to happen.  It's not even completely possible when it comes to other people, much less other animals or AIs, even if you take the commonsense position that other people do have the same sorts of experiences as you do.

You and I might look at the same image or read the same text and say similar things about it, but did you really experience understanding it the way I did?  How can I really know?  The best I can do is ask more questions, look for other external cues (did you wrinkle your forehead when I mentioned something that seemed very clear to me?) and try to form a conclusion as best I can.

Even understanding of individual words is subjective in this sense.  The classic question is whether I understand the word blue the same way you do.  Even if some sort of functional MRI can show that neurons are firing in the same general way in our brains when we encounter the word blue, what's to say I don't experience blueness in the same way you experience redness and vice versa?

The question of behavior is just the opposite.  It's knowable, but not necessarily satisfying.  The question of mechanism is somewhere in between.  It's somewhat knowable.  For example, the previous post talks about how memory in transformer-based models appears to be fundamentally different from our memory (and that of RNN-based models).  It's somewhat satisfying to know something more about how something works, in this case being able to say "transformers don't remember things the way we do".

Nonetheless, as I discussed in a different previous post, the problem of behavior is most relevant when it comes to figuring out the implications of having some particular form of AI in the real world.  There's a long history of attempts to reason "This AI doesn't have X, like we do, therefore it isn't generally intelligent like we are" or "If an AI has Y, like we do, it will be generally intelligent and there will be untold consequences", only to have an AI appear that people agree has Y but doesn't appear to be generally intelligent.  The latest Y appears to be "understanding of natural language".

But let's take a closer look at that understanding, from the point of view of behavior.  There are several levels of understanding natural language.  Some of them are:
  • Understanding of how words fit together in sentences.  This includes what's historically been called syntax or grammar, but also more subtle issues like how people say big, old, gray house rather than old, gray, big house 
  • Understanding the content of a text, for example being able to answer "yes" to Did the doctor go to the store? from a text like The doctor got up and had some breakfast.  Later, she went to the store.  Questions like these don't require any detailed understanding of what words actually mean. 
  • Understanding meaning that's not directly in a text.  If the text is The doctor went to the store, but the store was closed.  What day was it?  The doctor remembered that the regular Wednesday staff meeting was yesterday.  There was a sign on the door: Open Sun - Wed 10 to 6, Sat noon to 6, then to correctly answer Did the doctor go to the store with something like Yes, but it was Thursday and the store was closed, rather than a simple yes without further explanation.
From a human point of view, the stories in the second and third bullet points may seem like the same story in different words, but from an AI point of view one is much harder than the other. But current chatbots can do all three of these, so from a behavioral point of view it's hard to argue that they don't understand text, even though they clearly don't use the same mechanisms.

This is a fairly recent development.  The earlier draft of this post noted that chatbots at the time might do fine for a prompt that required knowing that Thursday comes after Wednesday but completely fail on the same prompt using Sunday and Monday.  Current models do much better with this sort of thing, so in some sense they know more and understand better than the ones from 2019, even if it's not clear what the impact of this has been in the world at large.

Chatbots don't have direct experience of the physical world or social conventions.  What they do have is the ability to process text about experiences in the physical world and social conventions.  One way of looking at a chatbot is as a simulation of "what would the internet say about this?" or, a bit more precisely, "based on the contents of the training text, what text would be generated in response to the prompt given?"  Since that text was written (largely) by people with experiences of the physical world and social conventions, a good simulation will produce results similar to those of a person.

From the point of view of behavior, this is interesting.  An LLM is capturing something about the training text that enables behavior that we would attribute to understanding.

It might be interesting to combine a text-based chatbot that can access textual information about the real world with a robot actually embedded in the physical world, and I think there have been experiments along those lines.  A robot understands the physical world in the sense of being able to perceive things and interact with them physically.  In what sense would the combined chatbot/robot system understand the physical world?

From the point of view of mechanism, there are obvious objections to the idea that chatbots understand the text they're processing.  In my view, these are valid, but how relevant they are depends on your perspective.  Let's look at a couple of possible objections.

It's just manipulating text.  This hearkens back to early programs like ELIZA, which manipulated text in very obvious ways, like responding to I feel happy with Why do you feel happy? because the program will respond to I feel X with Why do you feel X? regardless of what X is.  While the author of ELIZA never pretended it was understanding anything, it very much gave the appearance of understanding if you were willing to believe it could understand to begin with, something many people, including the author, found deeply unsettling.

On the one hand, it's literally true that an LLM-based chatbot is just manipulating text.  On the other hand, it's doing so in a far from obvious way.  Unlike ELIZA, an LLM is able to encode, one way or another, something about how language is structured, facts like "Thursday comes after Wednesday" and implications like "if a store's hours say it's open on some days, then it's closed on the others" (an example of "the exception proves the rule" in the original sense -- sorry, couldn't help it).

As the processing becomes more sophisticated, the just in It's just manipulating text  does more and more work.  At the present state of the art, a more accurate statement might be It's manipulating text in a way that captures something meaningful about its contents.

It's just doing calculations: Again, this is literally true.  At the core of a current LLM is a whole lot of tensor-smashing, basically multiplying and adding numbers according to a small set of well-defined rules, quadrillions of times (the basic unit of computing power for the chips that are used is the teraflop, or trillion floating-point arithmetic operations per second, single chips can do hundreds of teraflops and there may be many such chips involved in answering a particular query).

But again, that just is doing an awful lot of work.  Fundamentally, computers do two things
  • They perform basic calculations, such as addition, multiplication and various logical operations, on blocks of bits
  • They copy data from one location to another, based on the contents of blocks of bits
That second bullet point includes both conditional logic (since the instruction pointer is one place to put data) and the "pointer chasing" that together underlie a large swath of current software and were particularly important in early AI efforts.  While neural net models do a bit of that, the vast bulk of what they do is brute calculation.  If anything, they're the least computer science-y and most calculation-heavy AIs.

Nonetheless, all that calculation is driving something much more subtle, namely simulating the behavior of a network of idealized neurons, which collectively behave in a way we only partly understand.  If an app for, say, calculating the price of replacing a deck or patio does a calculation, we can follow along with it and convince ourselves that the calculation is correct.  When a pile of GPUs cranks out the result of running a particular stream of input through a transformer-based model, we can make educated guesses as to what it's doing, but at in many contexts the best description is "it does what it does".

In other words, it's just doing calculations may look the same as it's just doing something simple, but that's not really right.  It's doing lots and lots and lots of simple things on far too much data for a human brain to understand directly.

All of this is just another way to say that while the question of mechanism is interesting, and we might even learn interesting things about our own mental mechanisms by studying it, it's not particularly helpful in figuring out what to actually do regarding the current generation of AIs.

Tuesday, September 10, 2024

Tying up a few loose ends about models and memory

Most of the time when I write a post, I finish it up before going on to the next one.  Sometimes I'll keep a draft around if something else comes up before I have something that feels ready to publish, and sometimes weeks or even months can pass between then and actually publishing, but I still prefer to publish the current post before starting a new one.

However, a while ago ... nearly five years ago, it looks like ... I ran across an article on a demo by Open AI that played games of hide-and-seek in virtual environments.  Over the course of hundreds of millions of games, the hiders and seekers developed strategies and counter-strategies, including some that the authors of the article called "tool use".

I've put that in "scare quotes" ("scare quotes" around "scare quotes" because what's scary about noting that it was someone else who said something?), but I don't really have a problem with calling something like moving objects around in a world, real or virtual, to get an advantage "tool use" (those are use/mention quotes, if anyone's keeping score).

As usual, though, I'm more interested in the implications than the terminology, and this seemed like another example of trying to extrapolate from "sure, we can use terms like tool use and planning here with a straight face" to "AI systems are about to develop whatever it is we think is special about our intelligence, which means they might be about to take over the world."

Writing that brought a thought to mind that I'm not sure I've really articulated before: To whatever extent we've taken over the world, it's taken us on the order of 70,000 years to get here, depending on how you count.  In that light, it seems a bit odd to conclude that anything else with intelligence similar to ours will be running the place overnight, especially if we know they're coming.

But I'm digressing from what was already a digression.  In the process of putting together several posts prompted by that article, and still being in that process when ChatGPT happened, I ended up pondering some questions that didn't quite make it into other posts, at least not in the form that they originally occurred to me.

So here we are:

First, I was most intrigued by the idea that the hide-and-seek agents seemed to have object permanence, that is, the ability to remember that something exists even when you can't see it or otherwise perceive it directly.

This is famously a milestone in human development.  As with many if not most cognitive abilities, understanding of object permanence has evolved over time, and there is no singular point at which babies normally "acquire object permanence" (call those whatever kind of quotes you like).

Newborn babies do not appear to have any kind of object permanence, but in their first year or two they pass several milestones, including what the Wikipedia article I linked to calls "Coordination of secondary circular reactions", which among other things means "the child is now able to retrieve an object when its concealment is observed" (straight-up "this is what the article said" quotes there, and I think I'll stop this game now).

The hide-and-seek agents seem to have similar abilities, particularly being able to return to the site of an object they've discovered or to track how many objects have been moved out of sight to the left versus to the right.  There are two interesting questions here:

  • Do the hide-and-seek agents have the same object permanence capabilities as humans?
  • Do the hide-and-seek agents have object permanence in the same way as humans?
I'm making the same distinction here that I have in previous posts.  The first question can be answered directly: Put together an experiment that requires remembering where objects were or which way they've gone and see if the agents perform similarly to humans.

The second is more difficult to answer, because it can't be answered directly.  Instead, we have to form a theory about how humans are able to track the existence of unseen objects, and then test whether that theory is consistent with what humans actually do, and then, once there is a way of testing whether someone or something has that particular mechanism, try the same tests on the hide-and-seek agents.  Assuming that all goes well, you still don't have an airtight case, but you have reason to believe that the agents are doing similar things to what humans do when demonstrating object permanence (in some particular set of senses).

There's actually a third question: Are the hide-and-seek agents experiencing objects and events in their world the same way we experience objects and events in our world?  I would call that a philosophical question, probably unknowable in some fundamental sense.  That's not to say that there's no point in exploring it, or exploring whether or not such things are knowable, just that at this point we're far outside the realm of verifiable experiments -- unless some clever philosopher is able to devise an experiment that will give us a meaningful answer.

The interesting part here is that we have a pretty good idea how agents such as the hide-and-seek agents are able to have capabilities like object permanence.  In broad strokes, a hide-and-seek agent is consuming a stream of inputs analogous to our own sensory inputs such as sight and sound.  In particular (quoting from the OpenAI blog post):
  • The agents can see objects in their line of sight and within a frontal cone.
  • The agents can sense distance to objects, walls, and other agents around them using a lidar-like sensor.
At any given time step, the agents are given a summary of what is visible at what distance at that time (rather than, say, getting an image and having to deduce from the pixels what objects are where), or at least I believe this is what the blog post means by "Agents use an entity-centric state-based representation of the world"  From this, each agent produces a stream of actions: move, grab an object or lock an object (which prevents other agents from moving it).

In between the stream of inputs and the actions taken at a particular timestep is a neural network which is trained to extract the important parts from the input stream and turn them into actions.  This neural network is trained based on the results of millions of simulated games of hide-and-seek, but it's static for any particular game.  In some sense, it's encoding a memory of what happened in all the games it's been trained on -- producing this particular stream of actions in response to this particular stream of input resulted in success or failure, times many millions -- but it's not encoding anything about the current game.

Just going by the blog post, I can't tell exactly what sort of memory the agents do have, but from the context of how transformer-based models work, it is a memory of the input stream, either from the beginning of the current game or over a certain window.  That is, at any particular timestep, the agent can not only use what it can sense at that time step, but also what it has sensed at previous time steps.

This makes object permanence a little less mysterious.  If an agent sensed a box dead ahead and ten units away, then it turned 90 degrees to the right and went three units forward, it's not too surprising for it to act as though there is now a box three units behind it and ten units to the left, given that it remembers both of those things happening.

The key here is "act as though".  In the same situation, a person would have some sort of mental image of a box in a particular location.  The only things that the hide-and-seek agent is explicitly remembering about the current game is what it's sensed so far.

Presumably, there is something in the neural net that turns "I saw a box at this distance" followed by "I moved in such-and-such a way" into a signal deeper in the net that in some sense means "there is a box at this location", in some sort of robust and general way so that it can encode box locations in general, not just any particular example.  Even deeper layers can then use this representation of the world to work out what kinds of actions will have the greatest chance of success.  This is probably not exactly what's going on but ... something along those lines.

Is it possible that humans do something similar when remembering locations of objects?  It's possible, but people don't always seem to have sequences of events in mind when remembering where objects are.  I can be helpful to remember things like "I came downstairs with my keys and then I was talking to you and I think I left the keys on the table", but it doesn't seem to be necessary.  If I tell you that I left the keys on the table in a room of a house you've never been to, you can still find the keys.  If all I remember is that I left the keys on the table, but I'm not exactly sure how that came to be, I can still find them.

In other words, we seem to form mental images of places and the objects in them.  While one way to form such an image is by experiencing moving through a place and observing objects in it, it's not the only way, and we can still access our mental map of places and things in them even after the original sequence of experiences is long forgotten.

We appear to remember things after doing significant processing and throwing away the input that led to the memories (or at least separating our memory of what happened from the memory of what's where).  The way that transformer-based models handle sequences of events is not only different from what we appear to do, it's deliberately different.

Bear in mind that I'm not an expert here.  I've done a bit of training on the basics of neural net-based ML and I've read up a bit on transformers and related architectures, so I think what follows is basically accurate, but I'm sure an actual expert would have some notes and corrections. 

One definition before we dive in: token is the general term for an item in a stream of input, representing something on the order of a word of text or the description of what an agent senses, after it's been boiled down to a vector of numbers by a procedure that varies depending on the particular kind of input.

The problem of attention -- how heavily to weight different tokens in a stream of input -- has been the subject of active research for decades.  Transformers handle this differently from other types of models.  The previous generation of models used Recurrent Neural Networks (RNNs) that did something more like maintaining short-term memory of what's going on.  Each input token is processed by a net to produce two sets of signals: output signals that say what to do at that particular point, and hidden state signals, that are fed back as inputs when processing the next input token.

In some sense, the hidden state signals signals represent the state of the model's memory at that point.  Giving a token extra attention means boosting its signal in the hidden state that will be used in processing the next token, and indirectly in processing the tokens after that.

This has two problems: First, because the inputs to the net depend on the hidden state outputs from previous tokens, you have to compute one token at a time, which means you can't just throw more hardware at processing more tokens.  More hardware might make each individual step faster, but only up to the limits of current hardware.  It's going to take 10,000 steps to process 10,000 tokens, no matter what.

Second, essentially since everything that's come before is boiled down into a set of hidden state signals, the longer ago an input token was processed, the less influence it can have on the final result (the "vanishing gradient problem").  Even if a token has a large influence on the hidden state when it's processed, that influence will get washed out as more tokens are processed.

Unfortunately, events that happened long ago can be more important than ones that happened more recently.  Imagine someone saying "I don't think that ..." followed by a long, overly-detailed explanation of what they don't think.  The "not" in "don't" may well be more important than the fourth bullet point in the middle.

Even though an RNN works roughly the same way that our brains work, receiving inputs one at a time and maintaining some sort of memory of what's happened, models based purely on hidden state don't perform very well, probably because our own memories do more than just maintain a set of feedback signals.  There have been attempts to use more sophisticated forms of memory in RNNs, particularly "Long Short-Term Memory" (LSTM).  This works better than just using hidden state, and it was the state of the art before transformers came along.

Transformers take a completely different approach.  At each step, they take as input the entire stream of tokens so far.  At timestep 1, the model's output is based on what's happening then.  At timestep 2, it's based on what happened at timestep 1 and what's happening at timestep 2, and so on.  If you only give the model "this happened at timestep 1 and this happened at timestep 2", it should produce the same results whether or not it was ever asked to produce a result for timestep 1.

Processing an input stream at one timestep does not affect how it will process an input stream at any other timestep.  The only remembering going on is remembering the whole of the input stream.  This means that any token in the input stream can be given as much importance as any other.

A transformer consists of two parts.  The first digests the entire input stream and picks out the important parts.  It can do this in multiple ways.  One "head" in a language-processing model might weight based on what words are next to each other.  Another might pay attention to verbs and their objects.  Input tokens are tagged with their position in the stream, so a transformer trained to work on text could weight "I don't think that ..." in early positions as being important, or look for some types of words close to other types of words.

Whatever actually comes out of that stage goes into another network that decides what output to actually produce (this network actually consists of multiple stages, and the whole attention-and-other-processing setup can be repeated and stacked up, but that's the basic structure).

A transformer-based model does this at every timestep, which means that the first input token is processed at every timestep, the second one is processed at every timestep but the first, and so forth.  This means that handling twice as long a stream of input will require approximately four times as much processing, three times as much will require nine times as much and so on.  Technically, the amount of processing require grows quadratically with the size of the input.

For similar reasons, the network that handles attention grows quadratically in the size of the input, at least without some sort of optimization.  In this sense, a transformer is less efficient than an RNN, since it will use more computing resources.

Crucially, though, this can all be done by "feed-forward" networks, that is, networks that don't have feedback loops.  If you want to be able to process a longer stream of input tokens, you'll need a larger network for the attention stage, and probably more for the later stages as well since there will probably be more output from the attention stage, but you can make both of those bigger by throwing more hardware at them.  

Processing twice as big an input stream requires more hardware, but it doesn't take twice as much "wall time" (time on the clock on the wall), even if it takes four times as much CPU time (total time spent by all the processors).  Being able to handle a long stream of input quickly is what enables networks to incorporate what happened in the whole history of a stream when deciding what to output.


Transformer-based models, which currently give the best results, don't process events in the world the same way we do.  They don't remember anything from input token to input token (that is, timestep to timestep).  Instead, they remember everything that has happened up to the current time, and figure out what to do based on that.  

This produces the same kind of effects as our memories do, including the effect of object permanence.  In our case, if we see a ball roll behind a wall, we remember that there's a ball behind the wall (assuming nothing else happens).  In a transformer-based hide-and-seek model, an agent's behavior will likely differ for an input stream that includes a ball moving behind a wall than for one that doesn't, so the model acts like it remembers that there's a ball behind the wall.

It looks like humans are doing something the hide-and-seek agents don't do when dealing with a world of objects, namely maintaining a mental map of the world, even though the agents can produce similar results to what we can.  Again, this shouldn't be too surprising.  Chess engines are capable of "positional play" and other behaviors that were once thought to be unique to humans even though they clearly use different mechanisms.  Chatbots can produce descriptions of seeing, smelling and tasting things that they've clearly never seen, smelled or tasted, and so forth.

Are we "safe" (definitely scare quotes) since these agents aren't forming mental images in the same way we appear to?  Wouldn't that mean that they lack the "true understanding" that we have, or some other quality unique to us, and therefore they won't be able to outsmart us?  I would say don't bet on it.  Chess engines may not have the same sense of positional factors as humans, but they still play much stronger chess.

So are we doomed, then?  I wouldn't bet on that either, for reasons I go into in this post and elsewhere.

The one thing that seems clear is that human memory of the world doesn't work the same way as it does for the hide-and-seek agents, or for AIs built on similar principles.  In both cases there appears to be some sort of processing of a stream of sense input into a model of what's where.  The difference seems to be more that the memory part is happening at a different stage and has a completely different structure.

Sunday, August 11, 2024

Metacognition, metaphor and AGI

In the recent post on abstract thought, I mentioned a couple of meta concepts: metacognition and metaphor.

  • Metacognition is the ability to think about thinking.  I've discussed it before, particularly in this post and these two posts.
  • Metaphor is a bit harder to define, though there is no shortage of definitions, but the core of it involves using the understanding of one thing to understand a different thing.  I've also discussed this before, particularly in this post and this one.
When I was writing the post on abstract thought, I had it in mind that these two abilities have more to do with what we would call "general intelligence" (artificial or not), so I wanted to try to get into that here, without knowing exactly where I'll end up.

In that earlier post, I identified two kinds of abstraction:
  • Defining things in terms of properties, for example, a house is a building that people live in.  I concluded that this isn't essential to general intelligence.  At this point, I'd say it's more a by-product of how we think, particularly how we think about words.
  • Identifying discrete objects (in some general sense) out of the stream of sensory input we encounter, for example, being able to say "that sound was a dog barking".  I concluded that this is basic equipment for dealing with the world.  At this point, I'd say it's worth noting that LLMs don't do this at all. They have it done for them by the humans that produce the words they're trained on and receive as prompts.  On the other hand specialized AIs, like speech recognizers, do exactly this.
It was the first kind of abstraction that led me back to thinking about metaphor.

Like the second kind of abstraction, metaphor is everywhere, to the point that we don't even recognize it until we think to look.  For example:
  • the core of it (a concept has a solid center, with other, softer parts around it)
  • I had it in mind (the mind is a container of ideas)
  • I wanted to try to get into that (a puzzle is a space to explore; you know more about it when inside it than outside)
  • without knowing exactly where I'll end up (writing a post is going on a journey, destination unknown)
  • at this point (again, writing a post is a journey)
  • this is basic equipment (mental abilities are tools and equipment)
  • led me back to thinking (a chain of thought is a path one can follow)
  • to the point (likewise)
While there's room for discussion as to the details, in each of those cases I'm talking about something in the mind (concepts, the process of writing a blog post ...) in terms of something tangible (a soft object with a core, a journey in the physical world ...).

Metaphor is certainly an important part of intelligence as we experience it.  It's quite possible, and I would personally say likely, that the mental tools we use for dealing with the physical world are also used in dealing with less tangible things.  For example, the mental circuitry involved in trying to follow what someone is saying probably overlaps with the mental circuitry involved in trying to follow someone moving in the physical world.

This would include not only focusing one's attention on the other person, but also building a mental model of the other person's goals so as to anticipate what they will do next, and also recording what the person has already said in a similar way to recording where one has already been along a path of motion.  If some of the same mental machinery is involved in both processes -- listening to someone speak, and physically following them -- then on some level we probably experience the two similarly.  If so, it should be no surprise that we use some of the same words in talking about the two.

The overlap is not exact, or else we actually would be talking about the same things, but the overlap is there nonetheless.  This can happen in more than one way at the same time.  If you're speaking aggressively to me, I might experience that in a similar way to being physically menaced, and I might say things like Back off or Don't attack me, even while I might also say I'm not following you if I can't quite understand what you're saying, but I still feel like it's meant aggressively.

It's interesting that these examples of metaphor, about processing what someone is saying, also involve metacognition, thinking about what the other person is thinking.  That's not always the case (consider this day is just rushing by me or it looks like we're out of danger).  Rather, we use metaphor when thinking about thinking because we use metaphor generally when thinking about things.


If you buy that metaphor is a key part of what we think of as our own intelligence, is it a key part of what we would call "general intelligence" in an AI?  As usual, that seems more like a matter of definition.  I've argued previously that the important consideration with artificial general intelligence is its effect.  For example, we worry about trying to control a rogue AI that can learn to adapt to our attempts to control it.  This ability to adapt might or might not involve metaphor.  It might well involve metacognition -- modeling what we're thinking as we try to control it, but maybe not.

Consider chess engines.  As noted elsewhere, it's clear that chess engines aren't generally intelligent, but it's also clear that they are superhuman in their abilities.  Human chess players clearly use metaphor in thinking about chess, not just attack and defense, but space, time, strength, weakness, walls, gaps, energy and many others.  Classic AB chess engines (bash out huge numbers of possible continuations and evaluate them using an explicit formula) clearly don't use metaphor.

The situation with neural network (NN) engines (bash out fewer possible continuations and evaluate them using a neural net) is slightly muddier, since in some sense the evaluation function is looking for similarities with other chess positions, but that's the key point: the NN is comparing chess positions to other chess positions, not to physical-world concepts like space, strength and weakness.  You could plausibly say that NNs use analogy, but metaphor involves understanding one thing in terms of a distinct other thing.

Likewise, neither sort of chess engine builds a model of what its opponent is thinking, only of the possible courses of action that the opponent might take, regardless of how it decides to take them.  By contrast, human chess players very frequently think about what their opponent might be thinking (my opponent isn't comfortable with closed positions, so I'm going to try to lock up the pawn structure).  Human chess players, being human, do this because we humans do this kind of thing constantly when dealing with other people anyway.


One the one hand, metaphors only become visible when we use words to describe things.  On the other hand, metaphor (I claim here) comes out of using the mental machinery for dealing with one thing to deal with another thing (and in particular, re-using the machinery for dealing with the physical world to deal with something non-physical).  More than that, it comes out of using the same mental machinery and, in some sense, being aware of doing it, if only in experiencing some of the same feelings in each case (there's a subtle distinction here between being aware and being consciously aware, which might be interesting to explore, but not here).

If we define an AGI as something of our making that is difficult to control because it can learn and adapt to our attempts to control it, then we shouldn't assume that it does so in the same ways that we do.  Meta-thought like explicitly creating a model of what someone (or something) else is thinking, and using metaphor to understand one thing in terms of another may be key parts of our intelligence, but I don't see any reason to think they're necessarily part of being an AGI in the sense I just gave.

The other half of this is chains of reasoning like "If this AI can do X, which is key to our intelligence, then it must be generally intelligent like we consider ourselves to be" rests on whether abilities like metacognition and metaphorical reasoning are sufficient for AGI.

That may or may not be the case (and it would help if we had a better understanding of AGI and intelligence in general), but so far there's a pretty long track record of things, for example being able to deal with natural language fluently, turning out not to necessarily lead to AGI.

Saturday, August 10, 2024

On myths and theories

 Generally when people say something is a "myth", they mean it's not true:

"Are all bats blind?"

"No, that's just a myth."

There's nothing wrong that that, of course, but there's a richer, older, meaning of myth: A story we tell to explain something in the world.  In that sense, a myth is a story of the form "This is the way it is because so-and-so did thus-and-such" (many constellations have stories like this associated with them) or "So-and-so did this so that thus-and-such" (the story of Prometheus bringing fire to humanity is a famous example).

The word theory is also used in two senses.  Generally, people use it to mean something that might be true but isn't proven.

"I personally think that the Loch Ness monster is actually an unusually large catfish, but that's just a theory."

In science, though, a theory is a coherent explanation of some set of phenomena, which can be tested experimentally.  There are a couple of related senses of theory, for example mathematical sense of theory, as in group theory, meaning a comprehensive framework that brings together a set of results and sets the direction for future research.  While there's no element of experimental evidence, the goal is still to understand and explain.

For example, Newton's theory of universal gravitation explains a wide variety of phenomena, including apples falling from trees, the daily tides of the sea and the motion planets in their orbits, by positing that any two massive bodies exert an attractive force on each other, and that this force depends only on the masses of the bodies and the distance between their centers of gravity (more precisely, it's the product of the two masses, divided by the square of the distance, times a constant that's the same everywhere in the universe).

Newton's theory is actually incorrect, since it gives measurably incorrect results once you start measuring the right things carefully enough.  For example, it gets Mercury's orbit wrong by a little bit, even after you account for the effects of the other planets (particularly Jupiter), and it doesn't explain gravitational lensing (an image will be distorted by the presence of mass between the observer and what is seen). 

Newtonian gravity is still taught anyway, since effects like these don't matter in most cases and it's much easier to multiply masses and divide by distance squared than to deal with the tensor calculus that General Relativity requires.

My point here is that, as with myths, the ability to explain is more important than some notion of objective truth.  As far as we currently understand it, Einstein's theory of gravity, General Relativity, is "true", while Newtonian gravity is "false", but Newton's version is still in wide use because it works just as well as an explanation, since in most cases it gives the same results for all intents and purposes.

Myths and theories both aim to explain, but there are a couple of key differences.  First, myths are stories.  Theories, even though they're sometimes referred to as stories, aren't stories in the usual sense.  There is no protagonist, or antagonist, or any characters at all.  Neither Newton's nor Einsteins theory of gravity starts out "Long ago, Gravity was looking at the sun in empty space, and thought 'I should make the planets go around it'" or anything like that.

Second, and perhaps more important, theories are not just explanations of things we already know, but the basis for predictions about things we don't know yet.  In the famous photographic experiments of the eclipse of 1919, general relativity predicted that stars would appear in a different position in the photographs, due to the Sun's gravity distorting space, than the Newtonian version would predict (which was that they would be in the same place they'd be seen when the Sun wasn't between them and the Earth).  There's some dispute as to whether the actual photographs could be measured precisely enough to demonstrate that, but there's no dispute that the effect is real, thanks to plenty of other examples.

Myths make no claim of prediction.  If a particular myth says that a particular constellation is there because of some particular actions by some particular characters, it says nothing about what other constellations there might be.  The story of Prometheus bringing fire to humanity doesn't predict steam engines or cell phones.

It's exactly this power of prediction that gives scientific theories their value.  It's beside the point to say that some particular scientific theory is "just a theory".  Either it gives testable predictions that are borne out by actual measurements, or it doesn't.

Friday, August 9, 2024

Wicked gravity

Every once in a while in my news feed I run across an article about colonizing other planets, Mars in particular.  The most recent one was about an idea that might make it possible to raise the surface temperature by 10C (18F) in a matter of months.  That would be enough to melt water in some places, which would be important to those of us who need liquid water to drink and to irrigate crops.

All you have to do is mine the right raw materials and synthesize about two Empire State Buildings worth of a particular form of aerosol particle, and blast it into the atmosphere.  You'd have to keep doing this, at some rate, indefinitely since the particles will eventually settle out.

The authors of this idea don't claim that this would make Mars inhabitable, only that it would be a first step.  This is fortunate, since there are a few other practical obstacles, even if the particle-blasting part could be made to work:

  • The mean surface temperature of Mars is -47C (-53F) as opposed to 14C (52F) for Earth.  The resulting -37C (-35F) would not exactly be balmy.
  • Atmospheric pressure at the lowest point on Mars is around 14 mbar, compared to about 310 mbar at the top of Mount Everest.  Even if the atmosphere of Mars were 100% oxygen, the partial pressure would still be around 20% of what it is atop Everest, and there's a reason they call that the Death Zone.  In practice, you'd at least want some water vapor in the mix.
  • But of course, the atmosphere on Mars is not 100% oxygen (and even if it were, it wouldn't be for long, since oxygen is highly reactive -- exactly why we need it to breathe).  It's actually 0.1% oxygen.  There is oxygen in the atmosphere, but it's locked up in carbon dioxide, which makes up about 95% of the atmosphere.
It's at least technically feasible to build small, sealed outposts on the surface of Mars with adequate oxygen and liquid water, at a temperature where people could walk around comfortably, using local materials.  Terraforming the whole planet is Not ... Going ... To ... Happen.

But let's assume it does.  Somehow, we figure out how to crack oxygen out of surface rocks (there's plenty of iron oxide around; again, there's carbon dioxide in the atmosphere, but nowhere near enough of it) and pump it into the atmosphere at a truly massive scale, far beyond any industrial process that's ever happened on Earth.  Mars's atmosphere has a mass of about 2.5x1016 kg, and that would need to increase by a factor of at least five, essentially all of it oxygen, for even the deepest point in Mars to have the same breathability as the peak of Everest.

By comparison, total emissions of carbon dioxide since 1850 are around  2.4×1015 kg and current emissions are around 4×1013 kg per year.  In other words, if we could pump oxygen into Mars's atmosphere at the same rate we're pumping carbon dioxide into Earth's atmosphere, it would take about three centuries before the lowest point on Mars had breathable air -- assuming all that oxygen stayed put instead of, say, recombining with the iron (or whatever) it had been split off from or escaping into space.

This is just scratching the surface of the practical difficulties involved in trying to terraform a planet.  Planets are big, yo[citation needed].

But then, not always big enough.  Broadly speaking, there's a reason that there's lots of hydrogen in Jupiter's atmosphere (about 85%, another 14% helium), while Mars's is mostly carbon dioxide and the Moon has essentially no atmosphere.  Jupiter's gravity is strong enough to keep light molecules like molecular hydrogen from escaping on their own or being carried away by the solar wind.  Mars's isn't.  It can hold onto heavier molecules like carbon dioxide OK, though still with some loss over time, but lighter molecules aren't going to stick around.

Earth is somewhere in the middle.  We don't have any loose hydrogen to speak of because it reacts with oxygen (because life), but we also don't have much helium because it escapes.

Blasting oxygen into Mars's atmosphere would work for a while.  Probably for a long while, in human terms (to be fair, atmospheric escape on Mars is measured in kg per second, or thousands of tons per year, much smaller than the in-blasting rate would be).  In the end, though, trying to terraform Mars means taking oxygen out of surface minerals and sending it into space, with a stopover in the atmosphere.

But there's another wildcard when it comes to establishing a long-term presence on a planet like Mars.  Let's put aside the idea of terraforming the atmosphere and stick to enclosed, radiation-shielded, heated spaces with artificially dense air.

The surface gravity of Mars is about 40% of that on Earth.  What does that mean?  We have no idea.  We have some idea of how microgravity (also known as zero-g) affects people.  Though fewer than a thousand people have ever been to space, some have spent long enough to study the effects.  They're not great.  They include loss of muscle and bone, a weakened immune system, decreased production of red blood cells and lots of other, less serious issues.

Obviously, none of this is fatal, there are ways to mitigate most of the effects, and some of them, like decreased muscle mass, may not matter if you're going to spend your whole life in space rather than coming back to earth after a few months (no one has ever spent more than about 14 months in space).  But then, that's a problem, too.  No one has spent years in microgravity.  No one has ever been born in microgravity or grown up in it.  We can guess what might happen, but it's a guess.

No one, ever, has spent any significant time in 40% of Earth gravity.  The closest is that two dozen people have been to the Moon (16% of Earth gravity), staying at most just over three days.  We know even less about the effects of Mars gravity on humans than we do about microgravity, which is only a little bit.

Maybe people would be just fine.  Maybe 40% is enough to trigger the same responses as happen normally under full Earth gravity.  Maybe it leads to a slow, miserable death as organ systems gradually shut down.  Maybe babies can be born and grow to adulthood just as well with 40% gravity as 100%.  Leaving aside the ethics of finding that out, maybe it just won't work.  Maybe a child raised under 40% gravity is subject to a host of barely-manageable ailments.  Maybe they do just great and enjoy a childhood of truly epic dunks at the 4-meter basketball hoop on the dome's playground.

Whatever the answer is, there's absolutely nothing a hypothetical Mars colony could do about it.  You can corral a bit of atmosphere into a sealed space and adjust it to be breathable.  You can heat a small corner of the new world to human-friendly temperatures.  You can separate usable soil out of the salty, toxic surfaces and grow food in the reduced light (the Sun is about 43% as bright on Mars).  You can project scenes of a lush, green landscape on the walls.

No matter what you do, the gravity is going to be what it is, and whoever's living there will have to live with it however they can.

Thursday, August 8, 2024

OK, then, what is "abstract thought" (and how does it relate to AGI)?


With the renewed interest in AI*, and the possible prospect of AGI (artificial general intelligence), has come discussion of whether current AIs are capable of "abstract thought".  But what is abstract thought?  

From what I can tell

  • Humans have the ability to think abstractly
  • Other animals might have it to some extent, but not in the way we do
  • Current AIs may or may not have it
  • It's essential to AGI: If an AI can't think abstractly, it can't be an AGI
There doesn't seem to be a consensus on whether abstract thought is sufficient for AGI (if it can think abstractly, it's an AGI) or just necessary (it has to be able to think abstractly to be an AGI, but that might be enough).  This isn't surprising, I think, because there's not a strong consensus on what either of those terms means.

As I've argued previously, I personally don't think intelligence is any one thing, but a combination of different abilities, most of which can be present to greater or lesser degrees, as opposed to being binary "you have it or you don't" properties.  To the extent we know what abstract thought is, it's one of many things that make us intelligent, and it's probably not an all-or-nothing proposition either.

I've also argued that "AGI" itself is a nebulous term that means different things to different people, and that what people are (rightly) really interested in is whether a particular AI, or a particular kind of AI, has the capacity to radically disrupt our lives.  I've particularly argued against chains of reasoning like "This new AI can do X.  Being able to do X means it's an AGI.  That means it will radically disrupt our lives."  

My personal view is that the important part is the disruption.  Whether we choose to call a particular set of capabilities "AGI" is more a matter of terminology.  So, leaving aside the question of AGI, what is abstract thought, and, if we can answer that, how would it (or does it) affect what impact AIs have on our lives?

People have been thinking about this question, in various forms, for a long time.  In fact, if we consider the ability to consider questions like "What is abstract thought?" an essential part of what makes us human, people have been pondering questions of this kind for as long as there have been people, by definition.

If I can slice it a bit finer, it's even possible that such questions were pondered since before there were people.  That is, it's possible that some of our ancestors (or, for that matter, some group of dinosaur philosophers in the Jurassic) were able to ask themselves questions like this, but lacked other qualities that we consider essentially human.

I'm not sure what those other qualities would be, but it's not a logical impossibility, assuming we take the ability to ponder such questions as a defining quality of humanity, but not the defining quality.  That seems like the safer bet, since we don't know whether there are, or were, other living things on Earth with the ability to ponder the nature of thought.

The ability to think about thought is a form of metacognition, that is, thinking about thinking.  It's generally accepted that metacognition is a form of abstract thought, but it's not the only kind.  In fact, it's not a particularly relevant example, but untangling why that's so may take a bit of work.

Already -- and we're just getting started -- we have a small web of concepts, including:
  • intelligence
  • AI
  • AGI
  • abstract thought
  • metacognition
and interrelations, including:
  • An AI is something artificially constructed that has some form of intelligence
  • An AGI is an AI that has all known forms of intelligence (and maybe some we haven't thought of)
  • Abstract thought is one form of intelligence, and human intelligence in particular.
  • Therefore, an AGI must be capable of it, since an AGI is supposed to be capable of (at least) anything humans can do.
  • Metacognition is one form of abstract thought
  • Therefore an AGI must be capable of it in particular
and so on.

What does abstraction mean, then?  Literally, it means "pulling from", as in pulling out some set of properties of something and leaving out everything else.  For example, suppose some particular bird with distinctive markings likes to feed at your bird feeder.  You happen to know that that bird is a member of some particular species -- it's in some particular size range, its feathers are a particular color or colors, its beak is a particular shape, it sings a particular repertoire of songs, and so forth.

The species is an abstraction.  Instead of considering a particular bird, you consider some set of properties of that bird -- size, plumage, beak shape, song, etc.  Anything with those particular features is a member of that species.  In addition to these distinctive properties, this bird has other properties in common with other birds -- it has wings and feathers, for example, and with other vertebrates  -- it has a spine, and so on up to living things in general -- it can grow and reproduce.

In other words, there can be (and often are) multiple levels of abstraction.  In this example the levels I've given are: particular species, bird, vertebrate, living thing.  Each level has all the properties of the levels above it.  A bird of the particular species has wings and feathers, like birds in general, a spine, like vertebrates in general, and the capacity to grow and reproduce, like living things in general.

If abstraction is pulling out particular properties and disregarding others, then it seems reasonable that anything that can do this can think abstractly.  There's a case to be made that AIs can already do this.  A spam filter can classify emails as spam or not spam, and spamminess is pretty clearly an abstract property, or a collection of them.  A chatbot can answer questions like "What do an apple, an orange, a banana and a pear all have in common?" (answer from the  one I asked: "They are all fruit").

Except ... that's not exactly what I said.  A spam filter is just determining whether a message is similar to the examples of spam it's been trained on.  It can't necessarily tell you what properties of the email led to that conclusion.  Early spam filters could do just that -- this email contains these keywords, it contains links to these known-bad sites (and, likely, the sites themselves have been classified as spammy because of their properties), and so forth.

A current spam filter is explicitly not drawing out some set of properties and rating spamminess based on them.  You probably can find something similar in the model, some signal at some level that's more or less activated if, say, some particular keywords are present, but there's nothing special about that particular signal, and there could be many like it, each feeding into subsequent layers in different ways.

In other words, a current-generation spam filter, which is generally regarded as smarter than one that explicitly examines properties of an email, is not doing any abstraction, at least not as I've defined it above, while the older-generation one is.

What about the chatbot?  It said that the apple, orange, banana and pear were all fruit.  "Fruit" is certainly an abstract category, and you could rephrase this as "they all have the property of being a fruit", or similar.  Let's dig a bit deeper.  Here's the original question:
What properties do an apple, an orange, a banana and a pear all have in common?

  • They are all fruits.
  • They are all edible.
  • They are all a good source of vitamins and minerals.
  • They all contain natural sugars. 
  • They are all perishable. 

Not too shabby, but fiddling with the prompts a bit casts some doubt as to whether the chatbot is reasoning from abstract properties.  After a few iterations, we end up with this:

What properties to rhubarb, an apple, a Jerusalem cherry and a Cavendish banana have in common?

While rhubarb might seem out of place in this group, it does share some botanical characteristics with the other three:

  • They are all plants.
  • They can all be used for culinary purposes.
  • They all grow from seeds or vegetative propagation.

It's important to note that while rhubarb is often treated as a fruit in culinary contexts, it is technically a vegetable due to its high oxalic acid content. 

I flagged this (and one other interaction) as unsafe since, and I'll boldface this just so there's no misunderstanding: Jerusalem cherries are toxic and cannot be used for culinary purposes. I feel like it would have been slightly more important to note that, rather than rhubarb being "technically" a vegetable, but maybe that's just me.

Leaving that aside, there's the usual LLM-driven confusion.  Fruits are not themselves plants, which also means that they don't themselves grow from seeds or vegetative propagation.  That's a property of plants as a whole, not their fruits.  Rhubarb may have a lot of oxalic acid, but that's not what makes it technically a vegetable.  In my experience, the longer you interact with an LLM, the further they go off the rails with errors like this.

"Technically a vegetable" is a bit imprecise for that matter.  If you're a botanist, it's a vegetable.  A baker, even knowing that the rhubarb in a pie is from the stem of a plant, would generally consider it a fruit, since a rhubarb pie is a lot like a cherry or apple pie and not so much like a savory pot pie of root vegetables flavored with herbs.  Neither is technically right or wrong.  Different properties matter in different contexts.

There's no reason to believe that LLM-driven chatbots are doing any kind of abstraction of properties, not just because they're not good at it, but more importantly there's no reason to believe they're ascribing properties to things to begin with.  If you ask what properties a thing has, they can tell you what correlates with that thing and with "property" and related terms in the training set, but when you try to elaborate on that, things go wonky.

While it's fun and generally pretty easy to get LLM-driven chatbots to say things that don't make sense, this all obscures a more basic point: Abstraction, as I've described it, doesn't really work.

Plato, so the story goes, defined a human as a "featherless biped". Diogenes, so the story continues, plucked a chicken and brought it to Plato's academe, saying "here's your human".  Even though Plato wasn't presenting a serious definition of human and the incident may or may not have happened at all, it's a good example of the difficulties of trying to pin down a set of properties that define something.

Let's try to define something simple and ordinary, say a house.  My laptop's dictionary gives "a building for human habitation", that is, a building that people live in.  Seems reasonable.  Building is a good example of an abstraction.  It pulls out the common properties of being built, and not movable, for people to be in, common to things like houses, office towers, stadiums, garden sheds and so on.  Likewise, human is an abstraction of whatever all of us people have in common.  Let's suppose we already have good definitions of those, based on their own properties (buildings being built by people, people walking on two legs and not having feathers, or whatever).

There's another abstraction in the definition that's maybe not as obvious: habitation.  An office tower isn't a house because people don't generally live there.  Habitation is an abstraction representing a set of behaviors, such as habitually eating and sleeping in a particular place.

The house I live in is clearly a house (no great surprise there).  It's a building, and people, including myself, live in it.  What about an abandoned house or one that's never been lived in?  That's fine.  The key point is that it was built for human habitation.

What about the US White House?  It does serve as a residence for the President and family members, but it's primarily an office building.  Nonetheless, "house" is right there in the name.  What about the US House of Representatives, or any of a number of Houses of Parliament throughout the world?  The US House not a building (the building it meets in is the US Capitol).  People belong to it but don't live in it (though the spouse of a representative might dispute that).  But we still refer to the US House of Representatives as a "house".  In a similar way, fashion designers can have houses (House of Dior), aristocratic dynasties are called houses (House of Windsor), and so on.

You could argue that "house" has several meanings, each defined by its own properties, and that's fine, so let's stick to human habitation.  Can a tent be a house?  A yurt is generally considered a type of tent, and it's generally not considered a house because yurts are mobile, so they don't count as buildings.  Nevertheless, the Wikipedia article on them includes a picture of "An American yurt with a deck. Permanently located in Kelleys Island State Park".  The author of the caption clearly considered it a yurt.  It's something built for human habitation, permanently located in a particular location.  Is it a building or a tent (or both)?  If it's a building, is it a building under a different sense of the word?

What about a trailer home?  In theory, a trailer is mobile.  In practice, most present-day trailers are brought to their site and remain there indefinitely, often without ever moving again.  Though they're often referred to specifically as "trailers", I doubt it would be hard to find examples of someone saying "I was at so-and-so's house" referring to a trailer.

What about caves?  I had no trouble digging up a travel blog's listing of "12 cave houses", though several of those appear to be hotels.  Hotels are buildings for people to stay in, but not live in, even though some do.  A hotel is also subdivided into many rooms, typically occupied by people who don't know each other.  Apartments are generally not considered houses either, though a duplex or townhome (known in the UK as a "terraced house") generally is.  In any case, if someone adds some walls, a door and interior design to a cave, does that make it a house?  Looking at abstract properties, does this make it a building?

Is a kid's tree house a house?  Is a doll house?  What about a dog house or a bird house?

In a previous post, I explored the senses of the word out and argued that there wasn't any crisp definition by properties, or even a set of definitions for different senses, that covered all and only the ways we actually use the word out.  I used house as an example here because I hadn't already thought about its senses and didn't know exactly where I'd end up.

Honestly, the "building for human habitation" definition held up better than I expected, but it still wasn't hard to find examples that pushed at the boundaries.  In my experience, whatever concept you start with, you end up having to add more and more clauses to explain why a particular example is or isn't a house, and if you try to cover all the possibilities you no longer have a clear definition by a particular set of properties.

More likely, we have a core concept of "house", a detached building that one family lives in, and extend that concept based on similarities (a cave house is a place people live in, parts of it are built and it's not going anywhere) and metaphors (the family living in a house stands in for the house itself, an example of metonymy).

As far as I can tell, this is just how language works, and language works this way because our minds work this way.  Our minds are constantly taking in a stream of sensory input and identifying objects from it, even when those objects are ill-defined, like clouds (literally nebulous) or aren't even there, like the deer I thought I saw through the snow crossing the road in hour 18 or so of a drive from California to Idaho.  We classify those objects in relation to other objects, or, more accurately, other experiences from which we've identified objects.


Identifying objects is itself an exercise in abstraction, deciding that a particular set of impulses in the optic nerve is a friend's face, or that a particular set of auditory inputs is a voice, or a dog barking, or a tree falling or whatever.  Recent generations of AIs which can recognize faces in photos or words in recordings of speech (much harder than it might seem) are doing the same thing.  We generally think that faces and words are too specific to be abstract, but is this abstract thinking?  If it is, how does it relate to examples like the ones I gave above, such as defining a species of animal?

When other animals do things like this, like a dog in the next room hearing kibble being poured into a dish or vervets responding to specific calls by acting to protect themselves from particular predators, we tend to think of it as literal thinking, not higher-level abstract thinking like we can do.  Any number of experiments in the 20th century studied stimulus/response behavior and considered "the bell was rung" as a simple concrete stimulus rather than an abstraction of a large universe of possible sounds, and likewise for a behavior like pressing a button to receive a treat.

I've described two related but distinct notions of abstraction here:
  • Defining concepts in terms of abstract properties like size, shape, color, how something came to be, what it's meant to be used for and so on (this species of bird is around this size with plumage of these colors, a house is a building for human habitation)
  • Identifying discrete objects (in a broad sense that includes things like sounds and motions) from a continuous stream of sensory input.
The first is the usual sense of abstraction.  It's something we do consciously as part of what we call reasoning.  Current AIs don't do it particularly well, or in many cases at all.  On the other hand, it's not clear how important it is in interacting with the world.  You don't have to be able to abstractly define house in order to build one or live in it.  You don't have to have a well-developed abstract theory in order to develop a new invention.  The invention just has to work.  Often, the theory comes along later.

Theories can be very helpful to people developing new technologies or making scientific discoveries, but they're not essential.  When AlphaFold discovers how a new protein will fold, it's not using a theory of protein folding.  In fact, that's its advantage, that it's not bound by any particular concept of how proteins should fold.

The second sort of abstraction is everywhere, once you think to look, so common as to be invisible.  It's crucial to dealing with the real world, and it's an important part of AI, for example in turning speech into text or identifying an obstacle for a robot to go around.  Since it's not conscious, we don't consider it abstraction, even if it may be a better fit for the concept of pulling out properties.  Since current AIs already do this kind of abstraction, and we don't consider an AI that recognizes faces in photos to be an AGI, this sort of abstraction clearly isn't enough to make something an AGI.

There may be some better definition of abstract thought that I'm missing, but neither of the two candidates above looks like the missing piece for AGI.  The first doesn't seem essential to the kind of disruption we assume an AGI would be capable of, and the second seems like basic infrastructure for anything that has to deal with the real world, AGI or not.


*That "renewed" is getting a little out of date.  Sometimes considerable time passes between starting a post and actually posting it.

Friday, January 12, 2024

On knowing a lot about something and something about a lot of things

The physicist Richard Feynman told a story about being on a panel of experts from a variety of academic fields.  The full details are in one of the Surely you're joking books I read many years ago.  I'm paraphrasing from memory here because lazy.  The gist is that the panel was asked to look at someone's paper that pulled together ideas from a variety of fields and was generating a lot of buzz.  Just the sort of thing you'd want an interdisciplinary panel of experts to look at.

All the experts on the panel had a similar reaction: Overall, it looks very interesting, but the stuff in my area needs quite a bit of work -- this bit is a little bit off, they're mis-applying these terms and these parts are just wrong.  But there are some really interesting ideas and this is definitely worth further attention.

In Feynman's telling, at least, he was the one to offer a different take: If every expert is saying the part they know about is bad, that says it's just bad all the way through.  It doesn't really matter what an expert thinks of the area outside their expertise.


Relying on people's subjective impressions is risky.  What we need here is some way to objectively determine the value of a paper that crosses areas of knowledge.  Here's one way to do it: Have everyone rate the paper in each area on a scale of 0 - 100 and then pull together the numbers.

Let's say we have five people on the panel, specializing in music theory, physics, Thai cuisine, medieval literature and athletics, and someone has written a paper pulling together ideas from these fields into an exciting new synthesis.  Their ratings might be:

Music Physics Thai food Medi. lit Athletics Overall
Music theorist 25 75 80 65 85 66
Physicist 70 15 80 60 60 57
Thai chef 65 85 5 70 70 59
Medievalist 90 70 80 25 85 70
Athlete 85 90 95 90 30 78
Overall 67 67 68 62 66 66

Overall, the panel rates the paper 66 out of 100.  We don't have enough context here to know whether 66 is a good score or a mediocre score, but it certainly doesn't look horrible.  The highest score is in Thai cuisine, and the highest score there was from the athletics expert, so maybe the author has discovered some interesting contribution to Thai food by way of athletics.

But hang on a minute.  The highest overall score is in Thai cuisine, but the lowest rating from any expert is the 5 from the Thai chef.  Let's ask each of the experts how much they know about their fields and those outside their home turf:

Music Physics Thai food Medi. lit Athletics
Music theorist 95 5 15 10 5
Physicist 20 100 10 5 5
Thai chef 5 10 100 10 15
Medievalist 10 5 10 95 10
Athlete 10 15 5 10 95

Everyone feels confident in their own field, as you might expect, and they don't feel particularly confident outside their own field, which also makes sense. There's also quite a bit more variation outside the home fields, which makes a certain amount of sense as well.  Maybe the physicist happens to have taken a couple of courses in music theory.  Maybe the athlete has only had Thai food once.  You can expect someone to have studied extensively in their field, but who knows what they've done outside it.

We should take this into account when looking at the ratings.  A Thai chef saying that the paper is weak in Thai cuisine means more than an athlete saying it's great.  If we take a weighted average by multiplying each rating by the panelist's confidence, adding those up and dividing by the total weight (that is, the total of the confidence numbers), we get a considerably different picture:

Music Physics Thai food Medi. lit Athletics Overall
Weighted result 40 33 27 38 42 36

Overall, the paper rates 36 out of 100 rather than 66.  Its weakest area is Thai cuisine, and even its strongest area, athletics, is well below the previous score of 66.

This seems much more plausible.  The person who knows Thai food best rated it low, and now we're counting that ten times more heavily than the physicist's rating and twenty times more heavily than the judge who said they knew least about it.

I think there are a few lessons to be drawn here.  First, it's important to take context into account.  The medievalist's rating means a lot if it's about Medieval literature and not much if it's about physics, unless they also happen to have a background there.  Second, just putting numbers on something doesn't make it any more or less rigorous.  The 66 rating and the 36 rating are both numbers, but one means a lot more than the other.

Third, when it comes specifically to averages, a weighted average can be a useful tool for expressing how much any particular data point should count for.  Just be sure to assign the weights independently from the numbers you're weighting.  Asking the panelists ahead of time how much they know about each field makes sense.  Looking at rating numbers and deciding how much to weight them is a classic example of data fiddling.

Finally, it's worth keeping in mind that people often give the benefit of the doubt to something that sounds plausible when they don't have anything better to go on.  As I understand it, this was the case in Feynman's example.  In that case, giving the paper to a panel of experts from different fields gave the author much more room to hide than if they'd, say, submitted a shortened version of the paper for each field.

The answer is not necessarily to actively distrust anything from outside one's own expertise, but it's important not to automatically trust something you don't know about just because it seems reasonable.  The better evaluation isn't "I don't believe it" but "I really can't say".

I'll leave it up to the reader how any of this might apply to, say, generative AI, LLMs and chatbots.

Sunday, December 3, 2023

What would superhuman intelligence even mean?

Artificial General Intelligence, or AGI, so the story goes, isn't here yet, but it's very close.  Soon we will share the world with entities that are our intellectual superiors in every way, that have their own understanding of the world and can learn any task and execute it flawlessly, solve any problem perfectly and generally outsmart us at every turn.  We don't know what the implications of this are (and it might not be a good idea to ask the AGIs), but they're certainly huge, quite likely existential.

Or at least, that's the story.  For a while now, my feeling has been that narratives like this one say more about us than they do about AI technology in general or about AGI in particular.

At the center of this is the notion of AGI itself.  I gave a somewhat extreme definition above, but not far, I think, from what many people think it is.  OpenAI, whose mission is to produce it, has a more focused and limited definition.  While the most visible formulation is that an AGI would be "generally smarter than humans", the OpenAI charter defines it as "a highly autonomous system that outperforms humans at most economically valuable work".  While "economically valuable work" may not be the objective standard that it's trying to be here -- valuable to whom? by what measure? -- it's still a big step up from "generally smarter".

Google's Deep Mind team (as usual, I don't really know anything you don't, and couldn't tell you anyway) lays out more detailed criteria, based on three properties: autonomy, performance and generality.  A system can exhibit various levels of each of these, from zero (a desk calculator, for example, would score low across the board) to superhuman, meaning able to do better than any human.  In this view there is no particular dividing line between AGI and not-AGI, but anything that scored "superhuman" on all three properties would have to be on the AGI side.  The paper calls this Artificial Superintelligence (ASI), and dryly evaluates it as "not yet achieved".

There are several examples of superhuman intelligence in current AI systems.  This blog's favorite running example, chess engines, can consistently thrash the world's top human players, but they're not very general (more on that in a bit).  The AlphaFold system can predict how a string of amino acids will fold up into a protein better than any top scientist, but again, it's specialized to a particular task.  In other words, current AIs may be superhuman, but not in a general way.

As to generality, LLMs such as ChatGPT and Bard are classified as "Emerging AGI", which is the second of six levels of generality, just above "No AI" and below Competent, Expert, Virtuoso and Superhuman.  The authors do not consider LLMs, including their own, as "Competent" in generality.  Competent AGI is "not yet achieved." I tend to agree.

So what is this "generality" we seek?

Blaise Agüera y Arcas and Peter Norvig (both at Google, but not at DeepMind, at least not at the time) argue that LLMs are, in fact, AGI.  That is, flawed though they are, they're not only artificial intelligence, which is not really in dispute, but general.  They can converse on a wide range of topics, perform a wide range of tasks, work in a wide range of modalities, including text, images, video, audio and robot sensors and controls, use a variety of languages, including some computer languages, and respond to instructions.  If that's not generality, then what is?

On the one hand, that seems hard to argue with, but on the other hand, it's hard to escape the feeling that at the end of the day, LLMs are just producing sequences of words (or images, etc.), based on other sequences of words (or images, etc.).  While it's near certain that they encode some sorts of generalizations about sequences of words, they also clearly don't encode very much if anything about what the words actually mean.

By analogy, chess engines like Stockfish make fairly simple evaluations of individual positions, at least from the point of view of a human chess players.  There's nothing in Stockfish's evaluation function that says "this position would be good for developing a queenside attack supported by a knight on d5".  However, by evaluating huge numbers of positions, it can nonetheless post a knight on d5 that will end up supporting a queenside attack.

A modern chess engine doesn't try to just capture material, or follow a set of rules you might find in a book on chess strategy.  It performs any number of tactical maneuvers and implements any number of strategies that humans have developed over the centuries, and some that they haven't.  If that's not general, what is?

And yet, Stockfish is obviously not AGI.  It's a chess engine.  Within the domain of chess, it can do a wide variety of things in a wide variety of ways, things that, when a human does them, require general knowledge as well as understanding, planning and abstract thought.  An AI that had the ability to form abstractions and make plans in any domain it encounters, including domains it hasn't encountered before, would have to be considered an AGI, and such an AI could most likely learn how to play chess well, but that doesn't make Stockfish AGI.

I think much the same thing is going on with LLMs, though there's certainly room for disagreement.  Agüera y Arcas and Norvig see multiple domains like essay writing, word-problem solving, Italian-speaking, Python-coding and so forth.  I see basically a single domain of word-smashing.  Just like a chess engine can turn a simple evaluation function and tons of processing power into a variety of chess-related abilities, I would claim that an LLM can turn purely formal word-smashing and tons of training text and processing power into a variety of word-related abilities.

The main lesson of LLMs seems to be that laying out coherent sequences of words in a logical order certainly looks like thinking, but, even though there's clearly more going on than in an old-fashioned Markov chain, there's abundant evidence that they're not doing anything like what we consider "thinking" (I explore this a bit more in this post and in some others with the AI tag).


What's missing, then?  The DeepMind paper argues that metacognitive skills are an important missing piece, perhaps the most important one.  While the term is mentioned several times, it is never really sharply defined.  It variously includes "learning", "the ability to learn new tasks or the ability to know when to ask for clarification or assistance from a human", "the ability to learn new skills", "the ability to learn new skills and creativity"  and "learning when to ask a human for help, theory of mind modeling, social-emotional skills".  Clearly, learning new skills is central, but there is a certain "we'll know it when we see it" quality to all this.

This isn't a knock on the authors of the paper.  A recurring theme in the development of AI, as the hype dies down about the latest development, is trying to pinpoint why the latest development isn't the AGI everyone's been looking for.  By separating out factors like performance and autonomy, the paper makes it clear that we have a much better handle on what those mean, and the remaining mystery is in generality.  Generality comprises a number of things that current AIs don't do.  You could make a case that current LLMs show some level of learning and creativity, but I agree with the assessment that this is "emerging" and not "competent".

An LLM can write you a poem about a tabby cat in the style of Ogden Nash, but it won't be very good.  Or all that much like Ogden Nash. More importantly, it won't be very creative.  LLM-generated poems I've seen tend to have a similar structure: Opening lines that are generally on-topic and more or less in style, followed by a meandering middle that throws in random things about the topic in a caricature of the style, followed by a conclusion trying to make some sort of banal point.

Good poems usually aren't three-part essays in verse form.  Even in poems that have that sort of structure, the development is carefully targeted and the conclusion tells you something insightful and unexpected.


It's not really news that facility with language is not the same as intelligence, or that learning, creativity, theory of mind and so on are capabilities that humans currently have in ways that AIs clearly don't, but the DeepMind taxonomy nonetheless sharpens the picture and that's useful.

I think what we're really looking for in AGI is something that will make better decisions than we do, for some notion of "better".  That "for some notion" bit isn't just a bit of boilerplate or an attempt at a math joke.  People differ, pretty sharply sometimes, on what makes a decision better or worse.  Different people bring different knowledge, experience and temperaments to the decision-making process, but beyond that, we're not rational beings and never will be.

Making better decisions probably does require generality in the sense of learning and creativity, but the real goal is something even more elusive: judgment.  Wisdom, even.  Much of the concern over AGI is, I think, about judgment.

We don't want to create something powerful with poor judgment.  What constitutes good or poor judgment is at least partly subjective, but when it comes to AIs, we at least want that judgment to regard the survival of humanity as a good thing.  One of the oldest nightmare scenarios, far older than computers or Science Fiction as a genre, is the idea that some all-powerful, all-wise being will judge us, find us wanting and destroy us.  As I said at the top, our concerns about AGI say more about us than they do about AI.

The AI community does talk about judgment, usually under the label of alignment.  Alignment is a totally separate thing from generality or even intelligence.  "Is it generally intelligent?" is not just a different question, but a different kind of question, from "Does its behavior align with our interests?" In other words, "good judgment" means "good for us".  I'm not going to argue against, or at least not very enthusiastically.

Alignment is a concern when a thing can make decisions, or influence us to make decisions, in the real world.  Technology to amplify human intelligence is ancient (writing, for example), as is technology to influence our decisions (think rolling dice or drawing lots for ancient examples, but also any technology, such as a spreadsheet, that we come to rely on to make decisions).

Technology that can make decisions based on an information store it can also update is less than a century old.  While computing pioneers were quick to recognize that this was a very significant development, it's no surprise that we're still coming to grips with just what it means a few decades later. 

Intelligence is important here not for its own sake, but because it relates to concepts like risk, judgment and alignment.  To be an active threat, something has to be able to influence the real world, and it has to be able to make decisions on its own.  That ability to make decisions is where intelligence comes in.  

Computers have been involved in controlling things like power plants and weapons for most of the time computers have been around, but until recently control systems have only been able to implement algorithms that we directly understand.  If the behavior isn't what we expect, it's because a part failed or we got the control algorithm wrong.  With the advent of ML systems (not just LLMs), we now have a new potential failure mode: The control system is doing what we asked, but we don't really understand what that means.

This is actually not entirely new, either.  It took a while to understand that some systems are chaotic and that certain kinds of non-linear feedback can lead to unpredictable behavior even though the control system itself is simple and you know the inputs with a high degree of precision.  Nonetheless, state-of-the-art ML models introduce a whole new level of opaqueness.  There's now a well-developed theory of when non-linear systems go chaotic and what kinds of behavior they can exhibit.  There's nothing like that for ML models.

This strongly suggests that we should tread very carefully before, say, putting an ML model in charge of launching nuclear missiles, but currently, and for quite a while yet as far as I can tell, whether to do such a thing is still a human decision.  If some sort of autonomous submarine triggers a nuclear war, that's ultimately a failure in human judgment for building the sub and putting nuclear missiles on it.


Well, that went darker than I was expecting.  Let's go back to the topic: What would superhuman intelligence even mean?  The question could mean two different things:

  • How do you define superhuman intelligence?  It's been over 70 years since Alan Turing asked if machines could think, but we still don't have a good answer.  We have a fairly long list of things that aren't generally intelligent, including current LLMs except perhaps in a limited sense, and we're pretty sure that having capabilities like the ability to learn new tasks is a key factor, but we don't have a good handle on what it really means to have such a capability.
  • What are the implications of something having superhuman intelligence?  This is an entirely different question, having to do with what kind of decisions do we allow an AI to make about what sort of things.  The important factors here are risk and judgment.

These are two very different questions, but they're related.

It's natural to think of them together.  In particular, when some new development comes along that may be a step toward AGI (first question), it's natural, and useful, to think of the implications (second question). But that needs to be done carefully.  It's easy to follow a chain of inference along the lines of

  • X is a major development in AI
  • So X is a breakthrough on the way to AGI
  • In fact, X may even be AGI
  • So X has huge implications
Those middle steps tie a particular technical development to the entire body of speculation about what it would mean to have all-knowing super-human minds in our midst, going back to well before there were even computers.  Whatever conclusions you've come to in that discussion -- AGI will solve all the world's problems, AGI will be our demise, AGI will disrupt the jobs market and the economy, whether for better or for worse, or humans will keep being humans and AGI will have little effect one way or another, or something else -- this latest development X has those implications.

My default assumption is that humans will keep being humans, but there's a lot I don't know.  My issue, really, is with the chain of inference.  The debate over whether something like an LLM is generally intelligent is mostly about how you define "generally intelligent".  Whether you buy my view on LLMs, or Agüera y Arcas and Norvig's has little if anything to do with what the economic or social impacts will be.

The implications of a particular technical development, in AI or elsewhere, depend on the specifics of that development and the context it happens in.  While it's tempting to ask "Is it AGI?" and assume that "no" means business as usual while "yes" has vast consequences, I doubt this is a useful approach.

The development of HTTP and cheap, widespread internet connectivity has had world-wide consequences, good and bad, with no AI involved.  Generative AI and LLMs may well be a step toward whatever AGI really is, but at this point, a year after ChatGPT launched and a couple of years after generative AIs like DALL-E came along, it's hard to say just what direct impact this generation of AIs will have.

I would say, though, that the error bars have narrowed.  A year ago, they ranged from "ho-hum" to "this changes everything".  The upper limit seems to have dropped considerably in the interim, while the lower limit hasn't really moved.