Sunday, December 3, 2023

What would superhuman intelligence even mean?

Artificial General Intelligence, or AGI, so the story goes, isn't here yet, but it's very close.  Soon we will share the world with entities that are our intellectual superiors in every way, that have their own understanding of the world and can learn any task and execute it flawlessly, solve any problem perfectly and generally outsmart us at every turn.  We don't know what the implications of this are (and it might not be a good idea to ask the AGIs), but they're certainly huge, quite likely existential.

Or at least, that's the story.  For a while now, my feeling has been that narratives like this one say more about us than they do about AI technology in general or about AGI in particular.

At the center of this is the notion of AGI itself.  I gave a somewhat extreme definition above, but not far, I think, from what many people think it is.  OpenAI, whose mission is to produce it, has a more focused and limited definition.  While the most visible formulation is that an AGI would be "generally smarter than humans", the OpenAI charter defines it as "a highly autonomous system that outperforms humans at most economically valuable work".  While "economically valuable work" may not be the objective standard that it's trying to be here -- valuable to whom? by what measure? -- it's still a big step up from "generally smarter".

Google's Deep Mind team (as usual, I don't really know anything you don't, and couldn't tell you anyway) lays out more detailed criteria, based on three properties: autonomy, performance and generality.  A system can exhibit various levels of each of these, from zero (a desk calculator, for example, would score low across the board) to superhuman, meaning able to do better than any human.  In this view there is no particular dividing line between AGI and not-AGI, but anything that scored "superhuman" on all three properties would have to be on the AGI side.  The paper calls this Artificial Superintelligence (ASI), and dryly evaluates it as "not yet achieved".

There are several examples of superhuman intelligence in current AI systems.  This blog's favorite running example, chess engines, can consistently thrash the world's top human players, but they're not very general (more on that in a bit).  The AlphaFold system can predict how a string of amino acids will fold up into a protein better than any top scientist, but again, it's specialized to a particular task.  In other words, current AIs may be superhuman, but not in a general way.

As to generality, LLMs such as ChatGPT and Bard are classified as "Emerging AGI", which is the second of six levels of generality, just above "No AI" and below Competent, Expert, Virtuoso and Superhuman.  The authors do not consider LLMs, including their own, as "Competent" in generality.  Competent AGI is "not yet achieved." I tend to agree.

So what is this "generality" we seek?

Blaise Agüera y Arcas and Peter Norvig (both at Google, but not at DeepMind, at least not at the time) argue that LLMs are, in fact, AGI.  That is, flawed though they are, they're not only artificial intelligence, which is not really in dispute, but general.  They can converse on a wide range of topics, perform a wide range of tasks, work in a wide range of modalities, including text, images, video, audio and robot sensors and controls, use a variety of languages, including some computer languages, and respond to instructions.  If that's not generality, then what is?

On the one hand, that seems hard to argue with, but on the other hand, it's hard to escape the feeling that at the end of the day, LLMs are just producing sequences of words (or images, etc.), based on other sequences of words (or images, etc.).  While it's near certain that they encode some sorts of generalizations about sequences of words, they also clearly don't encode very much if anything about what the words actually mean.

By analogy, chess engines like Stockfish make fairly simple evaluations of individual positions, at least from the point of view of a human chess players.  There's nothing in Stockfish's evaluation function that says "this position would be good for developing a queenside attack supported by a knight on d5".  However, by evaluating huge numbers of positions, it can nonetheless post a knight on d5 that will end up supporting a queenside attack.

A modern chess engine doesn't try to just capture material, or follow a set of rules you might find in a book on chess strategy.  It performs any number of tactical maneuvers and implements any number of strategies that humans have developed over the centuries, and some that they haven't.  If that's not general, what is?

And yet, Stockfish is obviously not AGI.  It's a chess engine.  Within the domain of chess, it can do a wide variety of things in a wide variety of ways, things that, when a human does them, require general knowledge as well as understanding, planning and abstract thought.  An AI that had the ability to form abstractions and make plans in any domain it encounters, including domains it hasn't encountered before, would have to be considered an AGI, and such an AI could most likely learn how to play chess well, but that doesn't make Stockfish AGI.

I think much the same thing is going on with LLMs, though there's certainly room for disagreement.  Agüera y Arcas and Norvig see multiple domains like essay writing, word-problem solving, Italian-speaking, Python-coding and so forth.  I see basically a single domain of word-smashing.  Just like a chess engine can turn a simple evaluation function and tons of processing power into a variety of chess-related abilities, I would claim that an LLM can turn purely formal word-smashing and tons of training text and processing power into a variety of word-related abilities.

The main lesson of LLMs seems to be that laying out coherent sequences of words in a logical order certainly looks like thinking, but, even though there's clearly more going on than in an old-fashioned Markov chain, there's abundant evidence that they're not doing anything like what we consider "thinking" (I explore this a bit more in this post and in some others with the AI tag).


What's missing, then?  The DeepMind paper argues that metacognitive skills are an important missing piece, perhaps the most important one.  While the term is mentioned several times, it is never really sharply defined.  It variously includes "learning", "the ability to learn new tasks or the ability to know when to ask for clarification or assistance from a human", "the ability to learn new skills", "the ability to learn new skills and creativity"  and "learning when to ask a human for help, theory of mind modeling, social-emotional skills".  Clearly, learning new skills is central, but there is a certain "we'll know it when we see it" quality to all this.

This isn't a knock on the authors of the paper.  A recurring theme in the development of AI, as the hype dies down about the latest development, is trying to pinpoint why the latest development isn't the AGI everyone's been looking for.  By separating out factors like performance and autonomy, the paper makes it clear that we have a much better handle on what those mean, and the remaining mystery is in generality.  Generality comprises a number of things that current AIs don't do.  You could make a case that current LLMs show some level of learning and creativity, but I agree with the assessment that this is "emerging" and not "competent".

An LLM can write you a poem about a tabby cat in the style of Ogden Nash, but it won't be very good.  Or all that much like Ogden Nash. More importantly, it won't be very creative.  LLM-generated poems I've seen tend to have a similar structure: Opening lines that are generally on-topic and more or less in style, followed by a meandering middle that throws in random things about the topic in a caricature of the style, followed by a conclusion trying to make some sort of banal point.

Good poems usually aren't three-part essays in verse form.  Even in poems that have that sort of structure, the development is carefully targeted and the conclusion tells you something insightful and unexpected.


It's not really news that facility with language is not the same as intelligence, or that learning, creativity, theory of mind and so on are capabilities that humans currently have in ways that AIs clearly don't, but the DeepMind taxonomy nonetheless sharpens the picture and that's useful.

I think what we're really looking for in AGI is something that will make better decisions than we do, for some notion of "better".  That "for some notion" bit isn't just a bit of boilerplate or an attempt at a math joke.  People differ, pretty sharply sometimes, on what makes a decision better or worse.  Different people bring different knowledge, experience and temperaments to the decision-making process, but beyond that, we're not rational beings and never will be.

Making better decisions probably does require generality in the sense of learning and creativity, but the real goal is something even more elusive: judgment.  Wisdom, even.  Much of the concern over AGI is, I think, about judgment.

We don't want to create something powerful with poor judgment.  What constitutes good or poor judgment is at least partly subjective, but when it comes to AIs, we at least want that judgment to regard the survival of humanity as a good thing.  One of the oldest nightmare scenarios, far older than computers or Science Fiction as a genre, is the idea that some all-powerful, all-wise being will judge us, find us wanting and destroy us.  As I said at the top, our concerns about AGI say more about us than they do about AI.

The AI community does talk about judgment, usually under the label of alignment.  Alignment is a totally separate thing from generality or even intelligence.  "Is it generally intelligent?" is not just a different question, but a different kind of question, from "Does its behavior align with our interests?" In other words, "good judgment" means "good for us".  I'm not going to argue against, or at least not very enthusiastically.

Alignment is a concern when a thing can make decisions, or influence us to make decisions, in the real world.  Technology to amplify human intelligence is ancient (writing, for example), as is technology to influence our decisions (think rolling dice or drawing lots for ancient examples, but also any technology, such as a spreadsheet, that we come to rely on to make decisions).

Technology that can make decisions based on an information store it can also update is less than a century old.  While computing pioneers were quick to recognize that this was a very significant development, it's no surprise that we're still coming to grips with just what it means a few decades later. 

Intelligence is important here not for its own sake, but because it relates to concepts like risk, judgment and alignment.  To be an active threat, something has to be able to influence the real world, and it has to be able to make decisions on its own.  That ability to make decisions is where intelligence comes in.  

Computers have been involved in controlling things like power plants and weapons for most of the time computers have been around, but until recently control systems have only been able to implement algorithms that we directly understand.  If the behavior isn't what we expect, it's because a part failed or we got the control algorithm wrong.  With the advent of ML systems (not just LLMs), we now have a new potential failure mode: The control system is doing what we asked, but we don't really understand what that means.

This is actually not entirely new, either.  It took a while to understand that some systems are chaotic and that certain kinds of non-linear feedback can lead to unpredictable behavior even though the control system itself is simple and you know the inputs with a high degree of precision.  Nonetheless, state-of-the-art ML models introduce a whole new level of opaqueness.  There's now a well-developed theory of when non-linear systems go chaotic and what kinds of behavior they can exhibit.  There's nothing like that for ML models.

This strongly suggests that we should tread very carefully before, say, putting an ML model in charge of launching nuclear missiles, but currently, and for quite a while yet as far as I can tell, whether to do such a thing is still a human decision.  If some sort of autonomous submarine triggers a nuclear war, that's ultimately a failure in human judgment for building the sub and putting nuclear missiles on it.


Well, that went darker than I was expecting.  Let's go back to the topic: What would superhuman intelligence even mean?  The question could mean two different things:

  • How do you define superhuman intelligence?  It's been over 70 years since Alan Turing asked if machines could think, but we still don't have a good answer.  We have a fairly long list of things that aren't generally intelligent, including current LLMs except perhaps in a limited sense, and we're pretty sure that having capabilities like the ability to learn new tasks is a key factor, but we don't have a good handle on what it really means to have such a capability.
  • What are the implications of something having superhuman intelligence?  This is an entirely different question, having to do with what kind of decisions do we allow an AI to make about what sort of things.  The important factors here are risk and judgment.

These are two very different questions, but they're related.

It's natural to think of them together.  In particular, when some new development comes along that may be a step toward AGI (first question), it's natural, and useful, to think of the implications (second question). But that needs to be done carefully.  It's easy to follow a chain of inference along the lines of

  • X is a major development in AI
  • So X is a breakthrough on the way to AGI
  • In fact, X may even be AGI
  • So X has huge implications
Those middle steps tie a particular technical development to the entire body of speculation about what it would mean to have all-knowing super-human minds in our midst, going back to well before there were even computers.  Whatever conclusions you've come to in that discussion -- AGI will solve all the world's problems, AGI will be our demise, AGI will disrupt the jobs market and the economy, whether for better or for worse, or humans will keep being humans and AGI will have little effect one way or another, or something else -- this latest development X has those implications.

My default assumption is that humans will keep being humans, but there's a lot I don't know.  My issue, really, is with the chain of inference.  The debate over whether something like an LLM is generally intelligent is mostly about how you define "generally intelligent".  Whether you buy my view on LLMs, or Agüera y Arcas and Norvig's has little if anything to do with what the economic or social impacts will be.

The implications of a particular technical development, in AI or elsewhere, depend on the specifics of that development and the context it happens in.  While it's tempting to ask "Is it AGI?" and assume that "no" means business as usual while "yes" has vast consequences, I doubt this is a useful approach.

The development of HTTP and cheap, widespread internet connectivity has had world-wide consequences, good and bad, with no AI involved.  Generative AI and LLMs may well be a step toward whatever AGI really is, but at this point, a year after ChatGPT launched and a couple of years after generative AIs like DALL-E came along, it's hard to say just what direct impact this generation of AIs will have.

I would say, though, that the error bars have narrowed.  A year ago, they ranged from "ho-hum" to "this changes everything".  The upper limit seems to have dropped considerably in the interim, while the lower limit hasn't really moved.

Monday, October 30, 2023

Language off in the weeds

While out walking, I paused to look at a stand of cattails (genus Typha) growing in a streambed leading to a pond.  "That's a pretty ..." I thought to myself, but what would be the word for the area they were growing in? Marsh? Wetland? Swamp? Bog? Something else?

I've long been fascinated by this sort of distinction.  If you don't have much occasion to use them, those words may seem interchangeable, but they're not.  Technically

  • A wetland is just what it says ... any kind of land area that's wet most or all of the time
  • A marsh is a wetland with herbaceous plants (ones without woody stems) but not trees
  • A bog is a marsh that accumulates peat
  • A swamp is a forested wetland, that is, it does have trees
Wikimedia also has a nice illustration of swamps, marshes and other types of land.  By that reckoning, I was looking at a marsh, which was also the word that came to mind at the time.

This sort of definition by properties is everywhere, especially in dictionaries, encyclopedias and other reference works.  Here, the properties are:
  • Is it land, as opposed to a body of water?
  • Is it wet most or all of the time?
  • Does it have trees?
  • Does it accumulate peat?
The first two are true for all of the words above.  For the last two, there are three possibilities: yes, no and don't care/not specified.  That makes nine possibilities in all

Trees? Peat? Word
Yes Yes peat swamp
Yes No ?
Yes Don't care swamp
No Yes bog/peat bog
No No ?
No Don't care marsh
Don't care Yes peatland
Don't care No ?
Don't care Don't care wetland

As far as I know, there's no common word for the various types of wetland if they specifically don't accumulate peat.  You could always say "peatless swamp" and so forth, but it doesn't look like anyone says this much.  Probably people don't spend that much time looking for swamps with no peat.

Leaving aside the empty spaces, the table above gives a nice, neat picture of the various kinds of wetland and what to call them.  As usual, this nice picture is deceptive.
  • I took the definitions from Wikipedia, which aims to be a reference work.  It's exactly the kind of place where you'd expect to see this kind of definition by properties
  • The Wikipedia articles are about the wetlands themselves, not about language.  They may or may not touch on how people use the various words in practice or whether that lines up with the nice, technical definitions
  • The way the table is set up suggests that a peatland is a particular kind of wetland, but that's not quite true.  A peatland is land, wet or not, where you can find peat.  Permafrost and tundra can be peatland and often are, but they're not wetlands.  Similarly, a moor is generally grassy open land that might be boggy, if it's low-lying, but can also be hilly and dry.  Both peatlands and moors can be wetlands, but they aren't necessarily
  • Even if you take the definitions above at face value, if you have a lake in the middle of some woodlands with a swampy area and a marshy area in between the lake and the woods, there's no sharp line where the woods become swamp, or the swamp becomes marsh, or the marsh becomes lake.
The Wikipedia article for Fen sums this up nicely:
Rigidly defining types of wetlands, including fens, is difficult for a number of reasons. First, wetlands are diverse and varied ecosystems that are not easily categorized according to inflexible definitions. They are often described as a transition between terrestrial and aquatic ecosystems with characteristics of both. This makes it difficult to delineate the exact extent of a wetland. Second, terms used to describe wetland types vary greatly by region. The term bayou, for example, describes a type of wetland, but its use is generally limited to the southern United States. Third, different languages use different terms to describe types of wetlands. For instance, in Russian, there is no equivalent word for the term swamp as it is typically used in North America. The result is a large number of wetland classification systems that each define wetlands and wetland types in their own way. However, many classification systems include four broad categories that most wetlands fall into: marsh, swamp, bog, and fen.

A fen here means "a type of peat-accumulating wetland fed by mineral-rich ground or surface water."  It's that water that seems to make the difference between a bog and a fen: "Typically, this [water] input results in higher mineral concentrations and a more basic pH than found in bogs." (bogs tend to be more acidic).  We could try to account for this in the table above by adding an Acidic? (or Basic?) column, but then we'd have 27 rows with a bunch of question marks in the blank spaces.

In that same paragraph, the article says "Bogs and fens, both peat-forming ecosystems, are also known as mires."  If you buy that definition, it might fit better than peatland in the "trees: don't care, peat: yes" row.

This is all part of a more general pattern: Definitions by properties are a good way to do technical definitions, but people, including technical people when they're not talking about work, don't really care about technical definitions.  For most purposes, radial categories do a better job of describing how people actually use words.  More on that in this post.

All of this is leaving out an important property of bogs and mires: you can get bogged down in a bog and mired in a mire.  Most of these words are old enough that the origins are hard to trace, but bog likely comes from a word for "soft", which more than hints at this (mire is likely related to moss).

This suggests that what we call something depends at least in part on how we experience it.  The interesting part is that defining properties like wetness, grassiness, softness and the presence of peat are also based on experience, which makes untangling the role of experience a bit tricky.


Just because we can distinguish meanings doesn't mean those distinctions are useful, but I'd say they are useful here, and in most cases where we use different words for similar things.  The distinctions are useful because we can draw larger conclusions from them.  For example:
  • It's easier to see what's on the other side of a marsh, since there are no trees in the way
  • A marsh will be sunnier than a swamp
  • There will be different kinds of animals in a swamp than a marsh
  • You can get peat from a bog.  Even today, peat is still a useful material, so it's not surprising that it's played a role in how we've used words for places that may or may not have it.
  • And it's also not surprising that people talk about peat bogs and peat swamps but generally don't specifically call out bogs and swamps without it.
Even the more general term wetland is drawing a useful distinction.  A wetland is, well, wet.  There's a good chance you could get stuck in the mud in a wetland, or even drown, not something that would happen in a desert unless there had been a downpour recently (which does happen, of course).


Let's take a completely different example: Victorian cutlery.  Upper middle-class Victorian society cared quite a bit about which fork or spoon to use when.  Much of this, of course, is about marking membership in the in-group.  If you were raised in that sort of society, you would Just Know which fork was for dinner and which for salad.  If you didn't know that, you obviously weren't raised that way and it was instantly clear that there could be any number of other things that you wouldn't know to do, or not do (If you ever have to bluff your way through, work from the outside in -- the salad fork will be on the outside since salad is served first -- and don't worry, something else will probably give you away anyway).

However, there are still useful distinctions being made, and they're right there in the names.  A salad fork is a bit smaller and better suited for picking up small pieces of lettuce and such.  A dinner fork is bigger, and better for, say, holding something still while you slice it with a knife.  A soup spoon is bigger than a teaspoon so it doesn't take forever to finish your soup, a dinner knife is sharper than a butter knife, a butter knife works better for spreading butter, and so forth.

It's no different for the impressive array of specialized utensils that one might have encountered at the time (and can still find, in many cases).  A grapefruit spoon has a sharper point with a serrated edge so you can dig out pieces of grapefruit.  A honey dipper holds more honey than a plain spoon and honey flows off it more steadily (unless you have a particularly steady spoon hand) and so on.  I have an avocado slicer with a grabber that makes it much easier to get the pit out.  It's very clear (at least once you've used it) that it's an avocado slicer and not well suited for much else.  You can do perfectly well without such things, but they can also be nice to have.  

Consider one more example: The fondue fork, which has a very long, thin stem and two prongs with barbs on them.  You could call it, say, a barb-pronged longfork, and that would be nice and descriptive.  If someone asked you for a barb-pronged longfork and you had to fetch it from a drawer of unfamiliar utensils, you'd have a pretty good chance of finding it.  If someone asked for a "fondue fork" and you didn't know what that was, you'd pretty much be stuck.  The same is true for grapefruit spoons, dinner knives and so forth.  All language use depends on shared context and assumptions about it.


I think there's something general going on here, that how we experience and interact with things isn't just a factor in how we name them, but central to it.  Even abstract properties like softness or dryness are rooted in experience.  Fens and bogs have different soil characteristics, but the names are much older than the chemical theory behind pH levels.

We call it a fondue fork because it's used for putting bits of food in a fondue pot (and, just as importantly, for getting them back out).  A fondue fork has certain qualities, like the long stem and the barbs, that make it well-suited for that task, but they're not directly involved in how we name it.

Words like fen and bog are distinct because fens and bogs support distinct kinds of plant and animal life, moving through a fen is different from moving through a bog, and so forth.  A difference in pH level is a cause of this difference, but that's incidental.  There are almost certainly areas that are called fens that have bog-like pH levels or vice versa.  You could insist that such a fen (or bog) is incorrectly named, but why?


Properties do play a role in how we name things.  Swamps have trees.  Marshes don't.  A knife has a sharp edge.  A fork is split into two or more tines.  A spoon can hold a small amount of liquid.  What we don't have, though, is some definitive list of properties of things, so that someone presented with a teaspoon could definitively say: "This thing is an eating utensil.  It can hold a small amount of liquid.  That amount is less than the limit that separates teaspoons from soup spoons.  Therefore, it's a teaspoon."

In many contexts, it may look like there is such a list of properties.  With marsh and swamp, we can clearly distinguish based on a property -- trees or no trees.  Sometimes, as with red-winged blackbird or needle-nose pliers, but not for marsh and swamp, we use properties directly to build names for things.

But there are thousands of possible properties for things -- sizes, shapes, colors, material properties, temperature, where they are found, who makes them, and on and on.  Of the beyond-astronomically many possible combinations, only a tiny few describe real objects with real names.

At the very least, there has to be some way of narrowing down what properties might even possibly apply to some class of objects.  Stars are classified by properties like mass (huge) and temperature (very hot by human standards), but we don't distinguish, say, a fugue from a sonata based on whether the temperature is over 30,000 Kelvin.

It's not impossible, at least in principle, to create a decision tree or similar structure for handling this.  You could start with dividing things into material objects, like stars, and immaterial ones, like sonatas and fugues.  Within each branch of the tree, only some of the possible properties of things would apply.  After some number of branches, you should reach a point where only a few possible properties apply.  If you're categorizing wetlands, you know that the temperature classifications for stars don't apply, and neither do the various properties used to classify musical forms, but properties like "produces peat" and "has trees" do apply.

In practice, though, even carefully constructed classification systems based on properties, like the Hornbostel-Sachs system for musical instruments discussed in this post, can only go so far.  Property-based systems of classification tend to emphasize particular aspects of the things being categorized, such as (in the case of Hornbostel-Sachs) how they are built and how sound is produced from them.  This often lines up reasonably well with how we use words, but I don't think properties are fundamental.

Rather, how we experience things is fundamental, or at least closer to whatever is fundamental.  Properties describe particular aspects of how we experience something, so it's not surprising that they're useful, but neither should it be surprising that they're not the whole story.

Saturday, October 28, 2023

AI, AI and AI

I have a draft post from just over a year ago continuing a thread on intelligence in general, and artificial intelligence in particular.  In fact, I have two draft AI posts at the moment.  There's also one from early 2020 pondering how the agents in Open AI's hide-and-seek demo figure out that an agent on the opposing team is probably hiding out of sight behind a barrier.

I was pretty sure at the time of the earlier draft that they do this by applying the trained neural network not just to the last thing that happened, but a window of what's happened recently.  In other words, they have a certain amount of short-term memory, but anything like long-term memory is encoded in the neural net itself, which isn't updated during the game.  This ought to produce effects similar to the "horizon effect" in early chess engines, where a player that could look ahead, say, three moves and see that a particular move was a blunder would play another move that led to the same blunder, but only after four moves.

I'm still pretty sure that's what's going on, and I was going to put that into a post one of these days as soon as I read through enough of the source material to confirm that understanding, but I never got around to it.

Because ChatGPT-4 happened.

ChatGPT is widely agreed to have been a major game changer and, yeah ... but which games?

From a personal point of view, my musings on how AIs work and what they might be able to do are now relevant to what my employer talks about in its quarterly earnings reports, so that put a damper on things as well.  For the same reason, I'll be staying away from anything to do with the internals or how AI might play into the business side.  As usual, everything here is me speaking as a private citizen, about publicly available information.

Out in the world at large, I recall a few solid months of THIS CHANGES EVERYTHING!!!, which seems to have died down into a steady stream of "How to use AI for ..." and "How to deal with the impact of AI in your job."  I've found some of this interesting, but a lot of it exasperating, which leads me to the title.

There are at least three very distinct things "AI" could reasonably mean at the moment:

  • The general effort to apply computing technology to things that humans and other living things have historically been much better at, things like recognizing faces in pictures, translating from one language to another, walking, driving a car and so forth.
  • Large language models (LLMs) like the ones behind ChatGPT, Bard and company
  • Artificial General Intelligence (AGI), a very vague notion roughly meaning "able to do any mental task a human can do, except faster and better"
There are several posts under the AI tag here (including the previous post) poking and prodding at all three of those, but here I'm interested in the terms themselves.

To me, the first AI is the interesting part, what I might even call "real AI" if I had to call anything that.  It's been progressing steadily for decades.  LLMs are a part of that progression, but they don't have much to do with, say, recognizing faces or robots walking. All of these applications involve neural networks with back propagation (I'm pretty sure walking robots use neural nets), but training a neural net with trillions of words of speech won't help it recognize faces or control a robot walking across a frozen pond because ... um, words aren't faces or force vectors?

If you ask a hundred people at random what AI is, though, you probably won't hear the first answer very much.  You'll hear the last two quite a bit, and more or less interchangeably, which is a shame, because they have very little in common.

LLMs are a particular application of neural nets.  They encode interesting information about the contents of a large body of text.  That encoded information isn't limited to what we think of as the factual content of the training text, and that's a significant result.  For example, if you give an LLM an email and ask it to summarize the contents, it can do so even though it wasn't explicitly trained to summarize email, or even to summarize unfamiliar text in general.

To be clear, summarizing an email is different from summarizing part of the text that an LLM has been trained on.  You could argue that in some very broad sense the LLM is summarizing the text it's been trained on when it answers a factual question, but the email someone just sent you isn't in that training text.

Somehow, the training phase has built a model, based in part on some number of examples of summaries, of how a text being summarized relates to a summary of that text.  The trained LLM uses that to predict what could reasonably come after the text of the email and a prompt like "please summarize this", and it does a pretty good job.

That's certainly not nothing.  There may or may not be a fundamental difference between answering a factual question based on text that contains a large number of factual statements and performing a task based on a text that contains examples of the task, or descriptions of the task being done, but in any case an LLM summarizing an email is doing something that isn't directly related to the text it's been trained on, and that it wasn't specifically trained to do.

I'm pretty sure this is not a new result with LLMs, but seeing the phenomenon with plain English text that any English speaker can understand is certainly a striking demonstration.

There are a couple of reasons to be cautious about linking this to AGI.

First, to my knowledge, there isn't any widely-accepted definition of what AGI actually is.  From what I can tell, there's a general intuition of something like The Computer from mid 20th-century science fiction, something that you can give any question or task to and it will instantly give you the best possible answer.  

"Computer, what is the probability that this planet is inhabited?"
"Computer, devise a strategy to thwart the enemy"
"Computer, set the controls for the heart of the Sun"
"Computer, end conflict among human beings"

This may seem like an exaggeration or strawman, but at least one widely-circulated manifesto literally sets forth that "Artificial Intelligence is best thought of as a universal problem solver".

There's quite a bit of philosophy, dating back centuries and so probably much, much farther, about what statements like that might even mean, but whatever they mean, it's abundantly clear by now that an LLM is not a universal problem solver, and neither is anything else that's currently going under the name AI.

In my personal opinion, even a cursory look under the hood and kicking of the tires ought to be enough to determine that nothing like a current LLM will ever be a universal problem solver.  This is not just a matter of what kinds of responses current LLM-based chatbots give.  It's also a matter of what they are.  The neural net model underpinning this is based pretty explicitly on how biological computers like human brains work.  Human brains are clearly not universal problem solvers, so why would an LLM be?

There's an important distinction here, between "general problem solver", that is, able to take on an open-ended wide variety of problems, and "universal", able to solve any solvable problem.  Human brains are general problem-solvers, but nothing known so far, including current AIs, is universal.

This may sound like the argument that, because neural nets are built and designed by humans, they could never surpass a human's capabilities.  That's never been a valid argument and it's not the argument here.  Humans have been building machines that can exceed human capabilities for a long, long time, and computing machines that can do things that humans can't have been around for generations or centuries, depending on what you count as a computing machine and a human capability.

The point is that neural nets, or any other computing technology, have a particular set of problems that they can solve with feasible resources in a feasible amount of time.  The burden of proof is on anyone claiming that this set is "anything at all", that by building a network bigger and faster than a human brain, and giving it more data than a human brain could hope to handle, neural nets will not only be able to solve problems that a human brain can't -- which they already can -- but will be able to solve any problem.

So next time you see something talking about AI, consider which AI they're referring to.  It probably won't be the first (the overall progress of machines doing things that human brains also do).  It may well be the second (LLMs), in which case the discussion should probably be about things like how LLMs work or what a particular LLM can do in a particular context.

If it's talking about AGI, it should probably be trying to untangle what that means or giving particular reasons why some particular approach could solve some particular class of problems.

If it's just saying "AI" and things on the lines of "Now that ChatGPT can answer any question and AGI is right around the corner ...", you might look for a bit more on how those two ideas might connect.

My guess is there won't be much.



Sunday, June 18, 2023

AI seems to be back. What is it now?

In one of the earliest posts here, several years ago, I mused What was AI?  At the time, the term AI seemed to have fallen out of favor, at least in the public consciousness, even though there were applications, like phones with speech recognition, that were very much considered AI when they were in the research phase.  My conclusion was that some of this was fashion, but a lot of it was because the popular conception of AI was machines acting like humans.  After all, the examples we saw in pop culture, like, say, C-3PO in Star Wars, were written by humans and portrayed by humans.

There's a somewhat subtle distinction here: A phone with speech recognition is doing something a human can do, but it's not acting particularly like a human.  It's doing the same job as a human stenographer, whether well or badly, but most people aren't stenographers, and even stenographers don't spend most of their time taking dictation (or at least they shouldn't).

Recently, of course, there's been a new wave of interest in AI, and talk of things like "Artificial General Intelligence", which hadn't exactly been on most people's lips before ChatGPT-4 came out.  To avoid focusing too much on one particular example, I'll call things like ChatGPT-4 "LLM chatbots", LLM for Large Language Model.

By many measures, an LLM chatbot isn't a major advance.  As the "-4" part says, ChatGPT-4 is one in a series, and other LLM chatbots were developed incrementally as well.  Under the hood, an LLM chatbot is a particular application of neural net-based machine learning, which was a significant advance, to the particular problem of generating plausible-sounding text in response to a prompt.

But goodness, do they produce plausible-sounding text.

A response from an LLM chatbot may contain completely made-up "facts", it may well break down on closer examination by followup questions or changing the particulars of the prompt, and it may have a disturbing tendency to echo widely-held preconceptions whether they're accurate or not, but if you just read through the response and give it the benefit of the doubt on anything you're not directly familiar with, something people are strongly inclined to do, then it sounds like the response of someone who knows what they're talking about.  The grammar is good, words are used like people would use them, the people and things mentioned are generally real and familiar, and so on.

In other words, when it comes to generating text, an LLM chatbot does a very good job of acting like a human.  If acting like a human is the standard for AI, then an LLM chatbot is definitely an AI, in a way that a speech-transcribing phone app or a model that can pick out supernovae from a mass of telescope images just isn't.

But our perception of whether something is acting intelligent in a human way is heavily tilted toward language use.  All kinds of animals can recognize images and many can respond to speech, but only we can produce large volumes of text in human languages in response to a prompt.  Until now, that is.

Since LLM chatbots are an obvious major advance in acting-like-a-human, it's natural to assume that they represent a major advance in understanding what intelligence is generally, but those are two very different things.  As far as I can tell, we're not really any closer to understanding what "general intelligence" might mean, or how to produce it, than we were before.

To be sure, LLMs have shown some interesting behaviors that may offer hints as to what intelligence might be.  Once the model gets big enough, it seems to get better at learning from small samples.  For example, if you train a model on a huge amount of English text and a little bit of Italian and a little bit of Python code, it seems that it can do a pretty good job of producing plausible Italian or plausible-looking Python.  The Python code might well not do anything useful, unless it's close to one of the examples that the model was trained on, and it might not even build, but it looks like Python someone wrote.  There's some question as to whether this ability arises suddenly once the model hits a certain size, or actually happens gradually, but it's definitely a thing.

It seems likely that, after enough training data and with enough depth to the underlying network, an LLM develops connections that in some way represent something general about the structure of languages, at least ones like Italian and Python that are reasonably close in structure to the English that makes up the bulk of the training data.  It would be interesting to explore that, because it may capture something about language structure that current theories of grammar miss.


It seems reasonable that a suitably deep model trained on larger amounts of language would be able to capture something useful about the structure of language.  Beyond that, though, is it possible that an LLM can capture something about the world that the body of text is talking about?  If the training data contains millions of words about water, and rivers, and thirst, and drinking water, and clouds, rain and so forth, will it in some sense know that water is wet and that rain falls from clouds and some of it drains into rivers that flow into the sea, and that living things need water and that if a person is thirsty they can drink water and not be thirsty any more?

From interacting with an LLM chatbot, it's certainly easy to conclude at first that the answer is yes.  For example, I just asked a chatbot "If I'm thirsty and I drink water, will I still be thirsty?" and got back

Generally, if you are thirsty and you drink water, you will no longer be thirsty. This is because thirst is a signal from your body that it needs water. When you drink water, your body's water levels will increase and the thirst signal will be turned off.

However, there are some cases where you might still be thirsty after drinking water ...

and so on.  That seems like case closed.  One good way to find out if someone knows something is to ask them questions about it and see if the answers make sense.  I did, and the answer made sense.

The problem is, LLM chatbots have a tendency to go down rabbit holes once you get into details.  Later, I asked "If I'm thirsty and I drink ambrosia, will I still be thirsty?"  The answer starts off reasonably:

In Greek mythology, ambrosia is the food of the gods. It is said to be a delicious and intoxicating beverage that grants immortality. If you were to drink ambrosia, you would no longer feel thirst, as it would quench your thirst permanently.

So far, so good.  Going on .. 

However, ambrosia is not a real substance, so it is impossible to say for sure what would happen if you drank it.

This is a bit ... off.  If ambrosia is not a real substance, then you can't drink it, period.  It's not impossible to be sure what would happen, it's impossible to say, period.  The answer goes on to conclude:

However, it is an interesting question to ponder, and it is one that has been explored in many works of fiction.

OK, I'll bite.  "What are some works of fiction that explore the drinking of ambrosia?"  By "explore", I'd expect something like a storyline that features a person, or multiple people, drinking ambrosia in various amounts or circumstances and discusses the consequences of the various possibilities.  I wouldn't want something that just "features" or "includes" the drinking of ambrosia.

Nitpicking?  I mean, isn't it close enough just to mention some stories that include ambrosia and count that as exploring?

No.  If the chatbot has a true understanding of the concepts behind the words, it will catch this distinction.  If it has merely picked up from its training data that "explores" is a word often used when discussing fiction, it won't.  In my experience, LLM chatbots miss this sort of distinction over and over again.

So what works did it come up with?

  • The Odyssey by Homer: In this epic poem, the hero Odysseus drinks ambrosia and nectar while he is a guest of the gods on Mount Olympus. This gives him the strength and vitality to continue his journey home.
The first time through, I took this at face value.  It's been a while since I read the Odyssey.  But it doesn't look like this scene actually occurs.  Ambrosia is mentioned, briefly, in other contexts.  Even if this scene were in the poem, would it really be exploring the drinking of ambrosia?  Not really.  It just records the common understanding in his time and place of what ambrosia was and what it did.
  • The Bacchae by Euripides: In this tragedy, the god Dionysus gives ambrosia to his followers, which makes them immortal. However, the ambrosia also makes them insane, and they go on a rampage, killing people and destroying property.
That would count as an exploration in my book.  One problem, though: That's not the plot of The Bacchae.  From what I can make out, without carefully reading through the whole play, there's no mention at all of ambrosia in The Bacchae.
  • The Tempest by William Shakespeare: In this play, the magician Prospero uses ambrosia to create a magical elixir that can grant youth, beauty, and immortality.
Nope.  The word "ambrosia" is not even in the play.
  • The Last Olympian by Rick Riordan: In this book in the Percy Jackson series, the Greek gods give ambrosia to the demigod heroes to heal their wounds and give them strength.
That's probably about right -- I'm not going to take time to look, given the rest of the entries -- but again, not really exploring the question of what it would be like to drink ambrosia.
  • The Mortal Instruments by Cassandra Clare: In this series of fantasy novels, the main characters drink ambrosia to heal their wounds and give them strength.
As far as I can tell, this is not so.

OK, so maybe the chatbot understands water perfectly well, but it's not good with literature and finer points like what's in it?  Let's try something more direct.

"If I have a bell, a book, a candle and some water, how can I stack them?"

(I picked stacking things because I ran across a pretty impressive example in a news item, of a chatbot suggesting a solution the author hadn't thought of.  I have no doubt this happened, but I strongly suspect that the author took the impressive example as "proof" and didn't press on by, say, asking variants of the original question.)

  • There are a few ways to stack a bell, a book, a candle, and some water. Here are a few examples:
    • Bell, book, candle, water: This is the most straightforward way to stack the items. The bell would be on the bottom, followed by the book [somehow balanced on top of the bell ...], the candle, and the water.
    • Water, bell, book, candle: This stack is a bit more challenging, as the water would need to be supported by the bell and the book [wait, what?]. However, it is possible to do this by placing the water in a small container, such as a cup or a bowl [so in the first stack, the water isn't in a container?].
    • ...
I think the easiest explanation for this one is that the chatbot doesn't really understand what physical objects and substances are and how they interact.

In some meaningful sense, the model understands something interesting about the structure of languages, or at least some languages, because it was trained on a mass of text.  It can also model associations among words, for example that "water" tends to be associated with words like "wet" and "drink" and "flow".

However, since its training had nothing to do with actual water, or anything else in the physical world, it's no surprise that it doesn't show any real understanding of how things interact in the physical world.  Since the training text doubtless included reviews and discussions of various works of fiction, it can imitate something talking about fiction, but it misses details like the meaning of "explore", or even that if you say a story explores something, that thing should actually appear in the story.


So, after that fairly long digression, how does all this fit together?
  • Except among people closely associated with the research, "AI" generally means "acting like a human" and "doing things that are particularly human", like using language
  • LLM chatbots are pretty good at acting human in that sense ...
  • ... including making up plausible-looking responses when they don't really know the answer
  • But language is just one part of the picture.
  • "General Intelligence" is not at all well-defined, but if it includes some sort of general understanding of the world and how to solve problems in it, then there's no real reason to think LLM chatbots have it, or are even close to acquiring it ...
  • ... even if they're sometimes good at looking that way

Saturday, June 17, 2023

Where did I put my car keys, and when did civilization begin?

Some mysteries, like "Where did I put my car keys?" can be solved by discovering new information.  Some of the more interesting ones, though, may resolve by realizing you were asking the wrong question in the first place.

For example, physicists spent a long time trying to understand the medium that light waves propagated in.  Just like ocean waves propagate in water and sound waves propagate in all kinds of material -- but not in a vacuum -- it seemed that light waves must propagate in some sort of medium.  "Luminiferous aether", they called it.

But that brought up questions of what happens to light if you're moving with respect to that medium.  Sounds in the air will sound higher-pitched if you're moving through still air toward the sound, or if the wind is blowing and you're downwind, and so on (examples of the Doppler effect).

There didn't seem to be a "downwind" with light.  The Earth orbits the Sun at about 0.001% of the speed of light, not much, but enough that a careful measurement should detect a change in frequency depending on which direction light is moving and where the Earth is in its orbit.  But it didn't, and people spent a lot of time trying to figure out what was happening with the aether until Einstein put forth a theory (special relativity) that started with the idea that there was no aether.

I just got done scanning through the older posts on this blog to see whether I'd discussed a question that comes up from time to time, in various forms, when discussing human prehistory: "What happened a few thousand years ago in human evolution, that enabled us to move from hunter-gatherer societies to full-blown civilization?"  The closest I could find was a comment at the end of a post on change in human technology:

How did civilization and technology develop in several branches of the human family tree independently, but not to any significant extent in others?

This is not quite the same question, but it's still not a great question because it's loaded with similar assumptions.   All societies have technology and rules of living together, so we're really talking about who has "advanced technology" or "higher forms of social organization" or whatever, which are not exactly the most objective designations.  But even taking those at face value, I think this is another "wrong question" like "What happens if you're moving with respect to the aether?"

Even if you try to stick to mostly objective criteria like whether or not there are cities (civilization ultimately derives from the same roots as Latin civitas -- city -- and civis -- citizen), or whether a particular group of people could smelt iron, once you go back a few thousand years there's a lot we don't know about what happened where, and even where we think we do know, the definitions are still a bit fuzzy.  How big does a settlement have to be to be considered a city?  How much iron do you have to smelt before you're in the "iron age"?  Any amount? Enough to make a sword?  Enough to manufacture swords by the hundred?

Wikipedia (at this writing) defines a civilization as "any complex society characterized by the development of a state, social stratification, urbanization, and symbolic systems of communication beyond natural spoken language (namely, a writing system)" with eight separate supporting citations.  I didn't check the page history, but one gets the impression that this definition evolved over time and much discussion.

By this definition, civilizations started appearing as soon as writing appeared.  In other words, writing is the limiting factor from the list above.  The first known examples (so far) of writing, Sumerian cuneiform and Egyptian hieroglyphs, are about 5400 years old.  By that time there had been cities for thousands of years.  Terms like "state" and "social stratification" are harder to pin down from hard archeological evidence, or even to define objectively in a way people can agree on, but it's pretty clear that, however you slice it, they came well before cuneiform and hieroglyphics.

It may be hard to pin down exactly what a state is, but it's not hard to find examples that people will agree are states.  Most of the world's population now lives in places that most people agree are states, even though there are disagreements about which people are subject to the rules of which state or whether a particular nation's government is effectively functioning as a state.  Nonetheless, if you asked most political scientists whether, say, New Zealand, Laos or Saint Lucia is a state, you'd get a pretty resounding "yes".  Likewise, most people familiar with the subjects would agree that, say, Ancient Rome, the Shang Dynasty and the Inca Empire were states.

The problems come when you try to extract a set of criteria from the examples.  While Wikipedia defines a state as "a centralized political organization that imposes and enforces rules over a population within a territory" it goes on in the very next sentence to say "There is no undisputed definition of a state" (with two supporting references). Wikipedia does not claim to be an authoritative source on its own and I suppose it's possible that the page editors missed the One True Definition of "state", but it seems unlikely.  More likely there really isn't one.

Going with the "centralized political organization ..." definition for the moment, things get slippery when you try to pin down what it means to "impose and enforce rules".  For one thing, except (probably) in the smallest city-states, say Singapore or the Vatican, there is always a tension among various levels of government.

In the US, for example, the federal government is supreme over state and local governments, but in practice it's local laws that mostly determine where you can build a house, how fast you can drive your car on which streets and any of a number of other things that have more visibility in most people's day-to-day life than, say, federal standards for paraffin wax (I checked, there are several).

Certainly the supremacy clause of the Constitution means something, and few would disagree that the federal government imposes and enforces rules throughout the US, or that the US is a state, but on the other hand we also call 50 constituent parts of the US "states" and they impose and enforce their own rules within their boundaries.  Is the State of Wyoming a "state", then, in the sense given above?  If so, is the city of Cheyenne?

This may seem like splitting hairs over definitions, but when you consider something like the Roman Empire, where it could take weeks or months to get a message from the center of government to the far-flung provinces, and the people in those provinces often didn't speak the official language and largely practiced their local religions and customs, and the local power structure was largely still in place, though with some sort of a governor, who may or may not have been Roman, nominally in charge, it's a legitimate question what it might mean to be "part of the Roman Empire" or in what exact territory the imperial state could actually impose and enforce rules at any particular time.

If all you have to go on is excavated ruins without any written records, it's harder still to say what might or might not be a state.  There are monumental constructions going back at least 10,000 years, that would have required cooperation among fairly large numbers of people over years or decades, but that doesn't necessarily mean there was (or wasn't) a centralized government.  So far, no one has found any strong indication that there was.  It's possible that ancient monuments were built at the command of a centralized leadership, but again, there doesn't seem to be any strong evidence to support that, as there definitely is for, say, the Egyptian pyramids.

Likewise for cities.  It's hard to tell by looking at the ruins of a city whether there was a centralized government.  One of the earliest cities known, Çatalhöyük, shows no obvious signs of, say, a City Hall or anything other than a collection of mud-brick houses packed together, though the houses themselves have their own fascinating details.  But then again, neither would any number of large villages / small towns today show obvious signs of a central government.  There may have some sort of centralized government, somewhere, imposing and enforcing rules on Çatalhöyük, but there could very well not have been.  Current thinking seems to be there wasn't.

Empires like the Mongol or Macedonian ones built cities, but most cities in these empires already existed and were brought into the empire by conquest.  If we didn't have extensive written records, it would be much harder to determine that, say, present-day Uch Sharīf, Pakistan, was (possibly) founded by Alexander as part of the Macedonian Empire and was later (definitely) invaded by the Mongols.  While it's a fairly small city of around 20,000 people, it contains a variety of tombs, monuments and places of worship.  If it were suddenly deserted and all writing removed from it, and everything else in the surrounding area were covered in dirt, an archeologist who didn't know the history of the surrounding regions would have a lot of work to do to figure out just what went on when.

Present-day archeologists trying to understand human culture from 10,000 or more years ago are up against a similar situation.  What sites have been discovered are often isolated and what survives has a lot more to do with what sorts of things, like stonework and pottery, are likely to endure for millennia than what was actually there.

In addition, it's clear that while there were cities thousands of years before  Mesopotamian civilization, it's pretty clear most people didn't live in them, but in the surrounding areas, whether nomadically or in villages, and whatever traces they left behind are going to be much harder to find, if they can be found at all.  There's probably at least some selection bias, in that until perhaps recently, there has been more focus on finding signs of civilization, that is, cities, than looking for signs of villages or nomadic peoples.

The result is that we really just don't know that much about how Neolithic people organized themselves.  There are some interesting clues, like the existence of "culture regions" where the same technologies and motifs turn up over and over again across large areas, but it's hard to say whether that's the result of a central government or just large-scale trade and diffusion of ideas (current thinking seems to be that it's probably trade and diffusion).

One of the basic assumptions in talking about civilizations is that civilization requires stable and abundant food supplies so that people can remain in one place over the course of years and at least some people have time to do things besides procuring food.  The converse isn't true, though.  You can have stable and abundant food supplies, and at least the opportunity for people to develop specialized roles, without civilization developing, and that seems to be what actually happened.

Rice was domesticated somewhere between 8,000 and 14,000 years ago, and wheat somewhere in the same range.  Permanent settlements (more technically, sedentism) are at least as old, and there were cultures, such as the Natufian, that settled down thousands of years before showing signs of deliberate agriculture.  Overall, there is good evidence of

  • Permanent settlements without signs of agriculture over periods of millennia (Natufian culture)
  • Large-scale organization without signs of agriculture or permanent settlements (monuments at Göbekli Tepe about 10,000 years old, not to mention later examples such as Stonehenge)
  • Cities without writing, or signs of centralized government (Çatalhöyük, about 9,000 years ago at its peak)
  • Agriculture without large-scale cities, over periods of millennia (domestication of rice and wheat)
  • Food surpluses without grain farming
  • Large-scale trade without evidence of states

Putting this all together

  • There's not really a widely-accepted single definition of what civilization is, particularly since there's no widely-accepted single definition of what concepts like "state" and "social stratification" mean
  • It's hard to say for sure how people organized themselves 10,000 years ago because there's no written record and the physical evidence is scattered and incomplete
  • There are clear signs, particularly monumental structures, that they did organize themselves, at least some of the time
  • There are clear signs that they interacted with each other, whether directly or indirectly, over large areas
  • The various elements of what we now call civilization, particularly agriculture and permanent settlements, didn't arise all at once in one place, but appeared in various combinations over large areas and long periods of time
In other words, there was no particular time and place that civilization began, and questions like the ones I gave at the beginning aren't really meaningful.

Human knowledge has continually evolved and diffused over time.  People have been busy figuring out the world around them for as long as there have been people, and as far as we can tell, people's cognitive abilities haven't changed significantly over the past few dozens of millennia.

Overall, we've become more capable, because, overall, knowledge tends to accumulate over time.  The ability to create what we now call civilization has been part of that, but there was no particular technological change, and certainly no genetic change, that brought about the shift from foraging societies to civilization, because it's not even accurate to talk about "the shift".  There wasn't some pivotal change.  There have been continual changes over large areas and long periods of time that have affected different groups of people in different ways.  We can choose to draw lines around those now, but the results may say more about how we draw lines than about how people lived.

None of this is to say that terms like "civilization" or "state" are meaningless, or that civilizations and states are inherently bad (or good).  Rather, it seems more useful to talk about particular behaviors of particular groups of people and less useful to argue over which groups had "advanced technology" or were "civilized", or to try to say when some group of people crossed some magical boundary between "uncivilized" and "civilized" or when some collection of settlements "became a state".

Among other things, this helps avoid a certain kind of circular reasoning, such as asserting that the people who built Stonehenge must have had an advanced society because only an advanced society could build something like Stonehenge.  What's an advanced society?  It's something that can build monuments like Stonehenge.  I don't think this really represents the current thinking of people who study such things, but such arguments have been made, nearly as baldly.  Better, though, to try to understand how Stonehenge was built and how the people who built it lived and then try to see what led to what.

This also helps avoid a particular kind of narrative that comes up quite a bit, that there is a linear progression from "early, primitive" humanity to "modern, advanced societies".  In the beginning, people lived in a state of nature.  Then agriculture was discovered, and now that people had food surpluses, they could settle down.  Once enough people settled down, they developed the administrative structures that became the modern nation-state as we know it, and so forth.

None of those assertions is exactly false, leaving aside what exactly a "state of nature" might be.  Agriculture did develop, over periods of time and in several places.  Eventually, it enabled higher population densities and larger centers of population, and, in practice, that has involved more elaborate administrative structures.

But that isn't all that happened.  People raised domesticated plants, and eventually animals, and otherwise modified their environments to their advantage, for hundreds or thousands of years at a stretch without building large cities.  Cities arose, but for almost all of human history, as in prehistory, most people didn't live in them -- that's a very recent development.

One problem with this kind of linear narrative is that it can give the impression that there was a sort of dark age, before civilization happened, where people weren't doing much of anything.  If we put the origins of modern humans at, say, 70,000 years ago -- again, at least to some extent this is a matter of where we choose to draw lines, but it couldn't have been much later than that -- then why did it take so long to get from early origins to civilization?  As far as anyone knows, that's a span of over 60,000 years.  What were we doing all that time?

If you require a sharp dividing line between "nothing much going on" and "civilization", this seems like a mystery.  If you don't need such a line, the answer seems pretty mundane, because we were doing pretty much the same thing all the way through:  steadily developing culture, including technology and art.  Eventually, at various times and places, what we now call civilization becomes possible, and some time after that, at some smaller number of times and places, it happens.


One note: This post draws fairly extensively from points made in The Dawn of Everything.  Along with discussing human history, that book explores what implications deep human history might have on how present-day societies might be structured.  I'm not trying to promote or refute any of that here.  Here, I'm more interested in deep human history itself, the stories we tend to build around what we know about it, and how the two can differ.