Sunday, June 18, 2023

AI seems to be back. What is it now?

In one of the earliest posts here, several years ago, I mused What was AI?  At the time, the term AI seemed to have fallen out of favor, at least in the public consciousness, even though there were applications, like phones with speech recognition, that were very much considered AI when they were in the research phase.  My conclusion was that some of this was fashion, but a lot of it was because the popular conception of AI was machines acting like humans.  After all, the examples we saw in pop culture, like, say, C-3PO in Star Wars, were written by humans and portrayed by humans.

There's a somewhat subtle distinction here: A phone with speech recognition is doing something a human can do, but it's not acting particularly like a human.  It's doing the same job as a human stenographer, whether well or badly, but most people aren't stenographers, and even stenographers don't spend most of their time taking dictation (or at least they shouldn't).

Recently, of course, there's been a new wave of interest in AI, and talk of things like "General Artificial Intelligence", which hadn't exactly been on most people's lips before ChatGPT-4 came out.  To avoid focusing too much on one particular example, I'll call things like ChatGPT-4 "LLM chatbots", LLM for Large Language Model.

By many measures, an LLM chatbot isn't a major advance.  As the "-4" part says, ChatGPT-4 is one in a series, and other LLM chatbots were developed incrementally as well.  Under the hood, an LLM chatbot is an application of neural net-based machine learning, which was a significant advance, to the particular problem of generating text in response to a prompt.

But goodness, do they produce plausible-sounding text.

A response from an LLM chatbot may contain completely made-up "facts", it may well break down on closer examination by followup questions or changing the particulars of the prompt, and it may have a disturbing tendency to echo widely-held preconceptions whether they're accurate or not, but if you just read through the response and give it the benefit of the doubt on anything you're not directly familiar with, something people are strongly inclined to do, then it sounds like the response of someone who knows what they're talking about.  The grammar is good, words are used like people would use them, the people and things mentioned are generally real and familiar, and so on.

In other words, when it comes to generating text, an LLM chatbot does a very good job of acting like a human.  If acting like a human is the standard for AI, then an LLM chatbot is definitely an AI, in a way that a speech-transcribing phone app or a model that can pick out supernovae from a mass of telescope images just isn't.

But our perception of whether something is acting intelligent in a human way is heavily tilted toward language use.  All kinds of animals can recognize images and many can respond to speech, but only we can produce large volumes of text in human languages in response to a prompt.  Until now, that is.

Since LLM chatbots are an obvious major advance in acting-like-a-human, it's natural to assume that they represent a major advance in understanding what intelligence is generally, but those are two very different things.  As far as I can tell, we're not really any closer to understanding what "general intelligence" might mean, or how to produce it, than we were before.

To be sure, LLMs have shown some interesting behaviors that may offer hints as to what intelligence might be.  Once the model gets big enough, it seems to get better at learning from small samples.  For example, if you train a model on a huge amount of English text and a little bit of Italian and a little bit of Python code, it seems that it can do a pretty good job of producing plausible Italian or plausible-looking Python.  The code might well not do anything useful, unless it's close to one of the examples that the model was trained on, and it might not even build, but it looks like Python someone wrote.  There's some question as to whether this ability arises suddenly once the model hits a certain size, or actually happens gradually, but it's definitely a thing.

It seems likely that, after enough training data and with enough depth to the underlying network, an LLM develops connections that in some way represent something general about the structure of languages, at least ones like Italian and Python that are reasonably close in structure to the English that makes up the bulk of the training data.  It would be interesting to explore that, because it may capture something about language structure that current theories of grammar miss.

It seems reasonable that a suitably deep model trained on larger amounts of language would be able to capture something useful about the structure of language.  Beyond that, though, is it possible that an LLM can capture something about the world that the body of text is talking about?  If the training data contains millions of words about water, and rivers, and thirst, and drinking water, and clouds, rain and so forth, will it in some sense know that water is wet and that rain falls from clouds and some of it drains into rivers that flow into the sea, and that living things need water and that if a person is thirsty they can drink water and not be thirsty any more?

From interacting with an LLM chatbot, it's certainly easy to conclude at first that the answer is yes.  For example, I just asked a chatbot "If I'm thirsty and I drink water, will I still be thirsty?" and got back

Generally, if you are thirsty and you drink water, you will no longer be thirsty. This is because thirst is a signal from your body that it needs water. When you drink water, your body's water levels will increase and the thirst signal will be turned off.

However, there are some cases where you might still be thirsty after drinking water ...

and so on.  That seems like case closed.  One good way to find out if someone knows something is to ask them questions about it and see if the answers make sense.  I did, and the answer made sense.

The problem is, LLM chatbots have a tendency to go down rabbit holes once you get into details.  Later, I asked "If I'm thirsty and I drink ambrosia, will I still be thirsty?"  The answer starts off reasonably:

In Greek mythology, ambrosia is the food of the gods. It is said to be a delicious and intoxicating beverage that grants immortality. If you were to drink ambrosia, you would no longer feel thirst, as it would quench your thirst permanently.

So far, so good.  Going on .. 

However, ambrosia is not a real substance, so it is impossible to say for sure what would happen if you drank it.

This is a bit ... off.  If ambrosia is not a real substance, then you can't drink it, period.  It's not impossible to be sure what would happen, it's impossible to say, period.  The answer goes on to conclude:

However, it is an interesting question to ponder, and it is one that has been explored in many works of fiction.

OK, I'll bite.  "What are some works of fiction that explore the drinking of ambrosia?"  By "explore", I'd expect something like a storyline that features a person, or multiple people, drinking ambrosia in various amounts or circumstances and discusses the consequences of the various possibilities.  I wouldn't want something that just "features" or "includes" the drinking of ambrosia.

Nitpicking?  I mean, isn't it close enough just to mention some stories that include ambrosia and count that as exploring?

No.  If the chatbot has a true understanding of the concepts behind the words, it will catch this distinction.  If it has merely picked up from its training data that "explores" is a word often used when discussing fiction, it won't.  In my experience, LLM chatbots miss this sort of distinction over and over again.

So what works did it come up with?

  • The Odyssey by Homer: In this epic poem, the hero Odysseus drinks ambrosia and nectar while he is a guest of the gods on Mount Olympus. This gives him the strength and vitality to continue his journey home.
The first time through, I took this at face value.  It's been a while since I read the Odyssey.  But it doesn't look like this scene actually occurs.  Ambrosia is mentioned, briefly, in other contexts.  Even if this scene were in the poem, would it really be exploring the drinking of ambrosia?  Not really.  It just records the common understanding in his time and place of what ambrosia was and what it did.
  • The Bacchae by Euripides: In this tragedy, the god Dionysus gives ambrosia to his followers, which makes them immortal. However, the ambrosia also makes them insane, and they go on a rampage, killing people and destroying property.
That would count as an exploration in my book.  One problem, though: That's not the plot of The Bacchae.  From what I can make out, without carefully reading through the whole play, there's no mention at all of ambrosia in The Bacchae.
  • The Tempest by William Shakespeare: In this play, the magician Prospero uses ambrosia to create a magical elixir that can grant youth, beauty, and immortality.
Nope.  The word "ambrosia" is not even in the play.
  • The Last Olympian by Rick Riordan: In this book in the Percy Jackson series, the Greek gods give ambrosia to the demigod heroes to heal their wounds and give them strength.
That's probably about right -- I'm not going to take time to look, given the rest of the entries -- but again, not really exploring the question of what it would be like to drink ambrosia.
  • The Mortal Instruments by Cassandra Clare: In this series of fantasy novels, the main characters drink ambrosia to heal their wounds and give them strength.
As far as I can tell, this is not so.

OK, so maybe the chatbot understands water perfectly well, but it's not good with literature and finer points like what's in it?  Let's try something more direct.

"If I have a bell, a book, a candle and some water, how can I stack them?"

(I picked stacking things because I ran across a pretty impressive example in a news item, of a chatbot suggesting a solution the author hadn't thought of.  I have no doubt this happened, but I strongly suspect that the author took the impressive example as "proof" and didn't press on by, say, asking variants of the original question.)

  • There are a few ways to stack a bell, a book, a candle, and some water. Here are a few examples:
    • Bell, book, candle, water: This is the most straightforward way to stack the items. The bell would be on the bottom, followed by the book, the candle, and the water.
    • Water, bell, book, candle: This stack is a bit more challenging, as the water would need to be supported by the bell and the book [wait, what?]. However, it is possible to do this by placing the water in a small container, such as a cup or a bowl [so in the first stack, the water isn't in a container?].
    • ...
I think the easiest explanation for this one is that the chatbot doesn't really understand what physical objects and substances are and how they interact.

In some meaningful sense, the model understands something interesting about the structure of languages, or at least some languages, because it was trained on a mass of text.  It can also model associations among words, for example that "water" tends to be associated with words like "wet" and "drink" and "flow".

However, since its training had nothing to do with actual water, or anything else in the physical world, it's no surprise that it doesn't show any real understanding of how things interact in the physical world.  Since the training text doubtless included reviews and discussions of various works of fiction, it can imitate something talking about fiction, but it misses details like the meaning of "explore", or even that if you say a story explores something, that thing should actually appear in the story.

So, after that fairly long digression, how does all this fit together?
  • Except among people closely associated with the research, "AI" generally means "acting like a human" and "doing things that are particularly human", like using language
  • LLM chatbots are pretty good at acting human in that sense ...
  • ... including making up plausible-looking responses when they don't really know the answer
  • But language is just one part of the picture.
  • "General Intelligence" is not at all well-defined, but if it includes some sort of general understanding of the world and how to solve problems in it, then there's no real reason to think LLM chatbots have it, or are even close to acquiring it ...
  • ... even if they're sometimes good at looking that way

Saturday, June 17, 2023

Where did I put my car keys, and when did civilization begin?

Some mysteries, like "Where did I put my car keys?" can be solved by discovering new information.  Some of the more interesting ones, though, may resolve by realizing you were asking the wrong question in the first place.

For example, physicists spent a long time trying to understand the medium that light waves propagated in.  Just like ocean waves propagate in water and sound waves propagate in all kinds of material -- but not in a vacuum -- it seemed that light waves must propagate in some sort of medium.  "Luminiferous aether", they called it.

But that brought up questions of what happens to light if you're moving with respect to that medium.  Sounds in the air will sound higher-pitched if you're moving through still air toward the sound, or if the wind is blowing and you're downwind, and so on (examples of the Doppler effect).  There didn't seem to be a "downwind" with light.  The Earth orbits the Sun at about 0.001% of the speed of light, not much, but enough that a careful measurement should detect a change in frequency depending on which direction light is moving and where the Earth is in its orbit.  But it didn't, and people spent a lot of time trying to figure out what was happening with the aether until Einstein put forth a theory (special relativity) that started with the idea that there was no aether.

I just got done scanning through the older posts on this blog to see whether I'd discussed a question that comes up from time to time, in various forms, when discussing human prehistory: "What happened a few thousand years ago in human evolution, that enabled us to move from hunter-gatherer societies to full-blown civilization?"  The closest I could find was a comment at the end of a post on change in human technology:

How did civilization and technology develop in several branches of the human family tree independently, but not to any significant extent in others?

This is not quite the same question, but it's still not a great question because it's loaded with similar assumptions.   All societies have technology and rules of living together, so we're really talking about who has "advanced technology" or "higher forms of social organization" or whatever, which are not exactly the most objective designations.  But even taking those at face value, I think this is another "wrong question" like "What happens if you're moving with respect to the aether?"

Even if you try to stick to mostly objective criteria like whether or not there are cities (civilization ultimately derives from the same roots as Latin civitas -- city -- and civis -- citizen), or whether a particular group of people could smelt iron, there's a lot we don't know about what happened where and when once you go back a few thousand years, and even where we think we do know, the definitions are still a bit fuzzy.  How big does a settlement have to be to be considered a city?  How much iron do you have to smelt before you're in the "iron age"?  Any amount? Enough to make a sword?  Enough to manufacture swords by the hundred?

Wikipedia (at this writing) defines a civilization as "any complex society characterized by the development of a state, social stratification, urbanization, and symbolic systems of communication beyond natural spoken language (namely, a writing system)" with eight separate supporting citations.  I didn't check the page history, but one gets the impression that this definition evolved over time and much discussion.

By this definition, civilizations started appearing as soon as writing appeared.  In other words, writing is the limiting factor from the list above.  The first known examples (so far) of writing, Sumerian cuneiform and Egyptian hieroglyphs, are about 5400 years old.  By that time there had been cities for thousands of years.  Terms like "state" and "social stratification" are harder to pin down from hard archeological evidence, or even to define objectively in a way people can agree on, but it's pretty clear that, however you slice it, they came well before cuneiform and hieroglyphics.

It may be hard to pin down exactly what a state is, but it's not hard to find examples that people will agree are states.  Most of the world's population now lives in places that most people agree are states, even though there are disagreements about which people are subject to the rules of which state or whether a particular nation's government is effectively functioning as a state.  Nonetheless, if you asked most political scientists whether, say, New Zealand, Laos or Saint Lucia is a state, you'd get a pretty resounding "yes".  Likewise, most people familiar with the subjects would agree that, say, Ancient Rome or the Shang Dynasty or the Inca Empire were states.

The problems come when you try to extract a set of criteria from the examples.  While Wikipedia defines a state as "a centralized political organization that imposes and enforces rules over a population within a territory" it goes on in the very next sentence to say "There is no undisputed definition of a state" (with two supporting references). Wikipedia does not claim to be an authoritative source on its own and I suppose it's possible that the page editors missed the One True Definition of "state", but it seems unlikely.  More likely there really isn't one.

Going with the "centralized political organization ..." definition for the moment, things get slippery when you try to pin down what it means to "impose and enforce rules".  For one thing, except (probably) in the smallest city-states, say Singapore or the Vatican, there is always a tension among various levels of government.

In the US, for example, the federal government is supreme over state and local governments, but in practice it's local laws that mostly determine where you can build a house, how fast you can drive your car on which streets and any of a number of other things that have more visibility in most people's day-to-day life than, say, federal standards for paraffin wax (I checked, there are several).  Certainly the supremacy clause of the Constitution means something, and few would disagree that the federal government imposes and enforces rules throughout the US, or that the US is a state, but on the other hand we also call 50 constituent parts of the US "states" and they impose and enforce their own rules within their boundaries.  Is the State of Wyoming a "state", then, in the sense given above?  If so, is the city of Cheyenne?

This may seem like splitting hairs over definitions, but when you consider something like the Roman Empire, where it could take weeks or months to get a message from the center of government to the far-flung provinces, and the people in those provinces often didn't speak the official language and largely practiced their local religions and customs, and the local power structure was largely still in place, though with some sort of a governor, who may or may not have been Roman, nominally in charge, it's a legitimate question what it might mean to be "part of the Roman Empire" or in what exact territory the imperial state could actually impose and enforce rules at any particular time.

If all you have to go on is excavated ruins without any written records, it's harder still to say what might or might not be a state.  There are monumental constructions going back at least 10,000 years, that would have required cooperation among fairly large numbers of people over years or decades, but that doesn't necessarily mean there was (or wasn't) a centralized government.  So far, no one has found any strong indication that there was.  It's possible that ancient monuments were built at the command of a centralized leadership, but again, there doesn't seem to be any strong evidence to support that, as there definitely is for, say, the Egyptian pyramids.

Likewise for cities.  It's hard to tell by looking at the ruins of a city whether there was a centralized government.  One of the earliest cities known, Çatalhöyük, shows no obvious signs of, say, a City Hall or anything other than a collection of mud-brick houses packed together, though the houses themselves have their own fascinating details.  But then again, neither would any number of large villages / small towns today show obvious signs of a central government.  There may have some sort of centralized government, somewhere, imposing and enforcing rules on Çatalhöyük, but there could very well not have been.  Current thinking seems to be there wasn't.

Empires like the Mongol or Macedonian ones built cities, but most cities in these empires already existed and were brought into the empire by conquest.  If we didn't have extensive written records, it would be much harder to determine that, say, present-day Uch Sharīf, Pakistan, was (possibly) founded by Alexander as part of the Macedonian Empire and was later (definitely) invaded by the Mongols.  While it's a fairly small city of around 20,000 people, it contains a variety of tombs, monuments and places of worship.  If it were suddenly deserted and all writing removed from it, and everything else in the surrounding area were covered in dirt, an archeologist who didn't know the history of the surrounding regions would have a lot of work to do to figure out just what went on when.

Present-day archeologists trying to understand human culture from 10,000 or more years ago are up against a similar situation.  What sites have been discovered are often isolated and what survives has a lot more to do with what sorts of things, like stonework and pottery, are likely to endure for millennia than what was actually there.

In addition, it's clear that while there were cities thousands of years before  Mesopotamian civilization, it's pretty clear most people didn't live in them, but in the surrounding areas, whether nomadically or in villages, and whatever traces they left behind are going to be much harder to find, if they can be found at all.  There's probably at least some selection bias, in that until perhaps recently, there has been more focus on finding signs of civilization, that is, cities, than looking for signs of villages or nomadic peoples.

The result is that we really just don't know that much about how Neolithic people organized themselves.  There are some interesting clues, like the existence of "culture regions" where the same technologies and motifs turn up over and over again across large areas, but it's hard to say whether that's the result of a central government or just large-scale trade and diffusion of ideas (current thinking seems to be that it's probably trade and diffusion).

One of the basic assumptions in talking about civilizations is that civilization requires stable and abundant food supplies so that people can remain in one place over the course of years and at least some people have time to do things besides procuring food.  The converse isn't true, though.  You can have stable and abundant food supplies, and at least the opportunity for people to develop specialized roles, without civilization developing, and that seems to be what actually happened.

Rice was domesticated somewhere between 8,000 and 14,000 years ago, and wheat somewhere in the same range.  Permanent settlements (more technically, sedentism) are at least as old, and there were cultures, such as the Natufian, that settled down thousands of years before showing signs of deliberate agriculture.  Overall, there is good evidence of

  • Permanent settlements without signs of agriculture over periods of millennia (Natufian culture)
  • Large-scale organization without signs of agriculture or permanent settlements (monuments at Göbekli Tepe about 10,000 years old, not to mention later examples such as Stonehenge)
  • Cities without writing, or signs of centralized government (Çatalhöyük, about 9,000 years ago at its peak)
  • Agriculture without large-scale cities, over periods of millennia (domestication of rice and wheat)
  • Food surpluses without grain farming
  • Large-scale trade without evidence of states

Putting this all together

  • There's not really a widely-accepted single definition of what civilization is, particularly since there's no widely-accepted single definition of what concepts like "state" and "social stratification" mean
  • It's hard to say for sure how people organized themselves 10,000 years ago because there's no written record and the physical evidence is scattered and incomplete
  • There are clear signs, particularly monumental structures, that they did organize themselves, at least some of the time
  • There are clear signs that they interacted with each other, whether directly or indirectly, over large areas
  • The various elements of what we now call civilization, particularly agriculture and permanent settlements, didn't arise all at once in one place, but appeared in various combinations over large areas and long periods of time
In other words, there was no particular time and place that civilization began, and questions like the ones I gave at the beginning aren't really meaningful.

Human knowledge has continually evolved and diffused over time.  People have been busy figuring out the world around them for as long as there have been people, and as far as we can tell, people's cognitive abilities haven't changed significantly over the past few dozens of millennia.

Overall, we've become more capable, because, overall, knowledge tends to accumulate over time.  The ability to create what we now call civilization has been part of that, but there was no particular technological change, and certainly no genetic change, that brought about the shift from foraging societies to civilization, because it's not even accurate to talk about "the shift".  There wasn't some pivotal change.  There have been continual changes over large areas and long periods of time that have affected different groups of people in different ways.  We can choose to draw lines around those now, but the results may say more about how we draw lines than about how people lived.

None of this is to say that terms like "civilization" or "state" are meaningless, or that civilizations and states are inherently bad (or good).  Rather, it seems more useful to talk about particular behaviors of particular groups of people and less useful to argue over which groups had "advanced technology" or were "civilized", or to try to say when some group of people crossed some magical boundary between "uncivilized" and "civilized" or when some collection of settlements "became a state".

Among other things, this helps avoid a certain kind of circular reasoning, such as asserting that the people who built Stonehenge must have had an advanced society because only an advanced society could build something like Stonehenge.  What's an advanced society?  It's something that can build monuments like Stonehenge.  I don't think this really represents the current thinking of people who study such things, but such arguments have been made, nearly as baldly.  Better, though, to try to understand how Stonehenge was built and how the people who built it lived and then try to see what led to what.

This also helps avoid a particular kind of narrative that comes up quite a bit, that there is a linear progression from "early, primitive" humanity to "modern, advanced societies".  In the beginning, people lived in a state of nature.  Then agriculture was discovered, and now that people had food surpluses, they could settle down.  Once enough people settled down, they developed the administrative structures that became the modern nation-state as we know it, and so forth.

None of those assertions is exactly false, leaving aside what exactly a "state of nature" might be.  Agriculture did develop, over periods of time and in several places.  Eventually, it enabled higher population densities and larger centers of population, and, in practice, that has involved more elaborate administrative structures.

But that isn't all that happened.  People raised domesticated plants, and eventually animals, and otherwise modified their environments to their advantage, for hundreds or thousands of years at a stretch without building large cities.  Cities arose, but for almost all of human history, as in prehistory, most people didn't live in them -- that's a very recent development.

One problem with this kind of linear narrative is that it can give the impression that there was a sort of dark age, before civilization happened, where people weren't doing much of anything.  If we put the origins of modern humans at, say, 70,000 years ago -- again, at least to some extent this is a matter of where we choose to draw lines, but it couldn't have been much later than that -- then why did it take so long to get from early origins to civilization?  As far as anyone knows, that's a span of over 60,000 years.  What were we doing all that time?

If you require a sharp dividing line between "nothing much going on" and "civilization", this seems like a mystery.  If you don't need such a line, the answer seems pretty mundane, because we were doing pretty much the same thing all the way through:  steadily developing culture, including technology and art.  Eventually, at various times and places, what we now call civilization becomes possible, and some time after that, at some smaller number of times and places, it happens.

One note: This post draws fairly extensively from points made in The Dawn of Everything.  Along with discussing human history, that book explores what implications deep human history might have on how present-day societies might be structured.  I'm not trying to promote or refute any of that here.  Here, I'm more interested in deep human history itself, the stories we tend to build around what we know about it, and how the two can differ.

Wednesday, December 28, 2022

Pushing back on AI (and pushing back on that)

A composer I know is coming to terms with the inevitable appearance of online AIs that can compose music based on general parameters like mood, genre and length, somewhat similar to AI image generators that can create an image based on a prompt (someone I know tried "Willem Dafoe as Ronald McDonald" with ... intriguing ... results).  I haven't looked at any of these in depth, but from what I have seen it looks like they can produce arbitrary amounts of passable music with very little effort, and it's natural to wonder about the implications of that.

A common reaction is that this will sooner or later put actual human composers out of business, and my natural reaction to that is probably not.  Then I started thinking about my reasons for saying that and the picture got a bit more interesting.  Let me start with the hot takes, and then go on to the analysis.

  • This type of AI is generally built by training a neural network against a large corpus of existing music.  Neural nets are now pretty good at extracting general features and patterns and extrapolating from them, which is why the AI-generated music sounds a lot like stuff you've already heard.  That's good because the results sound like "real music", but it's also a limitation.
  • At least in its present form, using an AI still requires human intervention.  In theory, you could just set some parameters and go with whatever comes out, but if you wanted to provide, say, the soundtrack for a movie or video game, you'll need to actually listen to what's produced and decide what music goes well with what parts, and what sounds good after what, and so forth.  In other words, you'll still need to do some curation.
Along with this, I have a general opinion about the progress of AI as a whole: A few years back, there was a breakthrough as hardware got fast enough, thanks in part to special-purpose tensor-smashing chips, and new modeling techniques were developed, for the overall approach of neural network-based machine learning (ML) models to solve interesting problems that had so far resisted solution.  We're now in the phase of working out the possibilities, with new applications turning up left and right.

One way to look at is that there were a bunch of problem spaces out there that computers weren't well suited for before but are a good match for the new techniques, and we're in the process of identifying those.  Because there has been so much progress in applying the new ML, and because these models are based on the way actual brains work, it's tempting to thing that they can handle anything that the human brain can handle, and/or that we've created "general intelligence", but that's not necessarily the case.

My strong hunch is that before too long the limitations will become clear and the flood of new applications will slow.  There may or may not be a new round of "failed promise of AI" proclamations and amnesia about how much progress has been made.  Researchers will keep working away, as they always have, and at some point there will be another breakthrough and another burst of progress.  Lather, rinse, repeat.

That's all well and good, but honestly those bullet-pointed arguments above aren't that great, and the more general argument doesn't even try to say where the limits are.

The bullet points amount to two arguments that go back to the beginnings of AI, if not before, to the first time someone built an automaton that looked like it was doing something human, and they have a long history of looking compelling in the short run but failing in the long run.
  • The first argument is basically that the automaton can only do what it was constructed or taught to do by its human creators, and therefore it cannot surpass them.  But just as a human-built machine can lift more than a human, a human-built AI can do things that no human can.  Chess players have known this for decades now (and I'm pretty sure chess wasn't the first such case).
  • The second argument assumes that there's something about human curation that can't be emulated by computers (though I was careful to say "at least in its present form").  The oldest form of this argument is that a human has a soul, or a "human spark of creativity" or something similar, while a machine doesn't, so there will always be some need for humans in the system.
The problem with that one is that when you try to pin down that human spark, it basically amounts to "whatever we can do that the machines can't ... yet", and over and over again the machines have  eventually turned out to be able to do things they supposedly couldn't.  Chess players used to believe that computers could only play "tactical chess" and couldn't play "positional chess", until Deep Blue demonstrated that if you can calculate deeply enough, there isn't any real difference between the two.

As much as I would like to say that computers will never be able to compose music as well as humans, it's pretty certain that they eventually will, including composing pieces of sublime emotional intensity and inventing new paradigms of composition.  I don't expect that to happen very soon -- more likely there will be an extended period of computers cranking out reasonable facsimiles of popular genres -- but I do expect it to happen.

Where does that leave the composer?  I think a couple of points from the chess world are worth considering:
  • Computer chess did not put chess masters out of business.  The current human world champion would lose badly to the best computer chess player, which has been the case for decades, and we can expect it to be the case from here on out, but people still like to play chess and to watch the best human players play (watching computers play can also be fun).  People will continue to like to make music and to hear music by good composers and players.
  • Current human chess players spend a lot of time practicing with computers, working out variations and picking up new techniques.  I expect similar things will happen with music: at least some composers will get ideas from computer-generated music, or train models with music of their choosing and do creative things with the results, or do all sorts of other experiments.
There is also some relevant history from the music world
  • Drum machines did not put drummers out of business.  People can now produce drum beats without hiring a drummer, including beats that no human drummer could play, and beats that sound like humans playing with "feel" on real instruments, but the effect of that has been more to expand the universe of people who can make music with drum beats than to reduce the need for drummers (I'm not saying that drummers haven't lost gigs, but there is still a whole lot of live performance going on with a drummer keeping the beat).
  • Algorithms have been a part of composition for quite a while now.  Again, this goes back to before computers, including common-practice techniques like inversion, augmentation and diminution and 20th-century serialism.  An aleatoric composition arguably is an algorithm, and electronic music has made use of sequencers since early days.  From this point of view, model-generated music is just one more tool in the toolbox.

Humanity has had a complicated relationship with the machines it builds.  On the one hand, people generally build machines to enable them to do something they couldn't, or relieve them of burdensome tasks.  Computers are no different.  On the other hand, people have always been cautious about the potential for machines to disrupt their way of life, or their livelihood (John Henry comes to mind).  Both attitudes make sense.  Fixating on one at the expense of the other is generally a mistake.

Personally, having watched AI develop for decades now, I don't see any significant change in that dynamic.  We don't seem particularly closer to the Singularity than we ever were (and I argue in that post that's in part because the Singularity isn't really a well-defined concept).  But then, given the way these things are believed to work, we may not know different until it's too late.

If it does happen maybe someone, or something, will compose an epic piece to mark the event.

Friday, March 25, 2022

The house with the green shutters

 Consider these two sentences:

  • I went around the house with the green shutters
In other words, the house has green shutters and I'm going around that house.
  • I went around the house with the green shutters to install
In other words, I have some green shutters I need to install on the house and I'm carrying them around the house.

These are considerably different meanings, and they have different structures from a grammatical point of view.  In the first sentence, with the green shutters is describing the house -- it has green shutters.  In the second, it is describing my going around the house -- I have the shutters with me as I go

This second sentence might be considered a garden-path sentence, which is a sentence that you have to reinterpret midway through because the interpretation you started with stops working.  Wikipedia has three well-known examples:
  • The old man the boats
  • The complex houses married and single soldiers and their families
  • The horse raced past the barn fell
If your first reaction to those sentences is "Wait ... what?" like mine was, they might make more sense with a little more context:
  • The young stay on shore.  The old man the boats.
  • The complex was built by the Corps of Engineers.  The complex houses married and single soldiers and their families.
  • The horse led down the path was fine.  The horse raced past the barn fell.
or with a slight change in wording
  • The old people man the boats
  • The housing complex houses married and single soldiers and their families
  • The horse that was raced past the barn fell
While sentences like these do come up in real life, especially in headlines or other situations where it's common to leave out words like "that" or "who" which can provide valuable clues about the structure of a sentence, they also feel a bit artificial.  An editor would be well within their rights to suggest that an author rephrase any of the three, because they're hard to read, because the whole structure and meaning aren't what you think they are at first.
  • the old man, with the adjective old modifying the noun man, changes to the old, a noun phrase made from an adjective, as the subject of the verb man.  The sentence fragment the old man becomes a complete sentence (though, granted, it's harder to leave the object off of man the boat than in a sentence like I read every day)
  • the complex houses, with the adjective complex modifying the noun houses, becomes the complex as the subject of the verb houses.  This is actually the same pattern as the first case, except that married can keep the game going (The complex houses married elements of the Rococo and Craftsman styles).  It may be worth noting that in this case, the two interpretations would generally sound distinct.  As a noun phrase, the complex houses would have the main stress on houses, while as a noun phrase and a verb, it would have the main stress on complex.
  • the horse raced, a complete sentence with raced in the simple past, becomes the horse modified by the past participle raced
Compared to these, I don't think either of the two "green shutters" sentences is particularly hard to understand.  While the change in meaning is significant, the change in structure isn't as great as in the garden-path examples.  The subject is still  I.  The verb is still went, modified by around the house.  The only difference is in what with applies to.  Every word, except possibly with, is used in the same sense in both sentences.

In technical terms, this is a syntactic ambiguity.  What's uncertain is which particular words relate to which others.  The meanings of the words themselves are the same either way.  At the very least, with remains a preposition.  In the garden-path sentences, the senses of the words, and in particular their parts of speech, change when the sentence is reinterpreted -- a lexical ambiguity, one reason to think there's something different going on in the two cases.

This sort of thing is bread and butter for linguistics and cognitive science experiments where subjects are given sentences and asked to, say, pick the picture that best matches them, with the experimenters timing the responses and looking for differences that suggest that some structures require more processing than others.  In this case, I strongly suspect that the sentences I gave would take much less time for people to sort out than the garden-path sentences.

In short, while I think that there are some similarities, I also think different things are going on in the brain when dealing with the sentences I gave, as opposed to garden-path sentences.

Even without running the experiments or considering garden-path sentences, there are some clear implications just from considering sentences like the "green shutters" ones above:
  • On the one hand, our parsing of sentences is sequential in some strong sense.  At several points, we can stop and say "This is a sentence".  If you hear nothing more, you can still work out possible meanings
    • I went around the house
    • I went around the house with the green shutters
    • I went around the house with the green shutters to install
  • On the other hand, the structure of a sentence is provisional in some sense.  After hearing I went around the house with the green shutters and associating with the green shutters with house, we can then hear to install and fairly easily re-associate with with went around the house.
  • Semantics and context affect this process.  The sentence I went around the house with the green shutters is itself ambiguous.  You could read it the same way as the other sentence, meaning that I was carrying green shutters around the house, but the house with the green shutters is much more likely to refer to the house, so you probably don't.  Similarly, putting a context sentence before a garden-path sentences makes it more likely that the garden-path sentence will make sense without re-reading.
(That last point runs counter to Chomsky's assertion that "[T]he notion of 'probability of a sentence' is an entirely useless one, under any known interpretation of this term")

Assuming that there's some sort of re-structuring going on when you hear to install after I went around the house with the green shutters, it would be interesting to see how different theories of grammar handle it.

In a phrase structure grammar, the shift between the two sentences is from a structure like
  • I [went [around [the house [with the green shutters]]]]
(a full parse tree would have a lot more to it than this) to
  • I [went [around [the house]][with the green shutters [to install]]]
That is, with the green shutters goes from being a constituent of the noun phrase (the house with the green shutters) to a constituent of the verb phrase went around the house with the green shutters to install.  From a phrase-structure point of view, the two possible readings of I went around the house with the green shutters are examples of a bracketing ambiguity, since there are two ways to put the brackets.

You can look at this as lifting [with the green shutters] out of [the house [with the green shutters]] and putting it back next to [around the house].  In principle, the place that [with the green shutters] is lifted out of can be as deep as you want: [I went [down the path [around the house [with the green shutters]]]] and so forth.  You're still moving a chunk of the parse tree from one place to another, but as the nesting gets deeper, you have to navigate through more tree nodes to find what you're moving.

In a dependency grammar, the shift is pretty simple: with switches from a dependent of house to a dependent of went (if I understand correctly, with would be a syntactic dependency of house or went, but the semantic dependency is the other way around: house or went would be a semantic dependency of went -- but there's a good chance I don't understand correctly).  Saying that I went around the house with the green shutters is ambiguous is saying that there are two possible places that with could attach as a dependency.

Consider one more sentence
  • I went around the house with the green shutters to install the awning
After seeing the awning, the shutters are back on the house and we're back where we started (and the object of install is now awning).  The fact that we can handle any of the three sentences suggests that there's something in the brain that can track both possible structures, that is, both ways of associating with, whether as a constituent or a dependency or something else, and switch back and forth between them, or in some cases even end up in a state of "Wait a sec, did you mean the shutters are on the house, or you were carrying them?"

There ought to be experiments to run in order to test this, and I wouldn't be surprised if they've already been run, but I'll leave that to the real linguists.

Wednesday, October 27, 2021

Mortality by the numbers

The following post talks about life expectancy, which inevitably means talking about people dying, and mostly-inevitably doing it in a fairly clinical way.  If that's not a topic you want to get into right now, I get it, and I hope the next post (whatever it is) will be more appealing. 

Maybe I just need to fix my news feed, but in the past few days I've run across at least two articles stating that for most of human existence people only lived 25 years or so.

Well ... no.

It is true that life expectancy at birth has taken a large jump in recent decades.  It's also true that estimates of life expectancy from prehistory up to about 1900 tend to be in the range of 20-35 years, and that estimates for modern-day hunter-gatherer societies are in the same range.  As I understand it, that's not a complete coincidence since estimates for prehistoric societies are generally not based on archeological evidence, which is thin for all but the best-studied cases, or written records, which by definition don't exist.  Rather, they're based on the assumption that ancient people were most similar to modern hunter-gatherers, so there you go.

None of this means that no one used to live past 25 or 30, though.  The life expectancy of a group is not the age by which everyone will have died.  That's the maximum lifespan.  Now that life expectancies are in the 70s and 80s, it's probably easier to confuse life expectancy with maximum lifespan, and from there conclude that life expectancy of 25 means people didn't live past 25, but that's not how it works.  For example, in the US, based on 2018 data, the average life expectancy was 78.7 years, but about half the population could expect to still be alive at age 83, and obviously there are lots of people in the US older than 78.7 years.  The story is similar for any real-world calculation of life expectancy.

A life expectancy of 25 years means that if you looked at everyone in the group you're studying, say, everyone born in a certain place in a given year, then counted up the total number of years everyone lived and divided that by the number of people in your group, you'd get 25 years.  For example, if your group includes ten people, three of them die as infants and the rest live 10, 15, 30, 35, 40, 50 and 70 years, that's 250 person-years.  Dividing that by ten people gives 25 years.

No matter what particular numbers you use, the only way the life expectancy can equal the maximum lifespan is if everybody lives to exactly that age.  If some people in a particular group died younger than the life expectancy, that means that someone else lived longer. 

Sadly, the example above is likely a plausible distribution for most times and places.  Current thinking is that for most of human existence, infant mortality has been much higher than it is now.  If you survived your first year, you had a good chance of making it to age 15, and if you made it that far, you had a good chance of living at least into your forties and probably your fifties.  In the made-up sample above, the people who made it past 15 lived to an average age of 45.  However, there was also a tragically high chance that a newborn wouldn't survive that first year.

Life expectancies in the 20s and 30s are mostly a matter of high infant mortality, and to a lesser extent high child mortality, not a matter of people dying in their mid 20s.  For the same reason, the increase in life expectancy in the late 20th century was largely a matter of many more people surviving their first year and of more children surviving into adulthood (even then, the rise in life expectancy hasn't been universal).

In real environments where average life expectancy is 25, there will be many people considerably older, and a 24-year-old has a very good chance of making it to 25, and then to 26 and onward.  The usual way of quantifying this is with age-specific mortality, which is the chance at any particular birthday that you won't make it to the next one (this is different from age-adjusted mortality, which accounts for age differences when comparing populations).

At any given age, you can use age-specific mortality rates to calculate how much longer a person can expect to live.  By itself, "life expectancy" means "life expectancy at birth", but you can also calculate life expectancy at age 30, or 70 or whatever.  From the US data above, a 70-year old can expect to live to age 86 (85.8 if you want to be picky).  A 70-year-old has a significantly higher chance of living to be 86 than someone just born, just because they've already lived to 70, whether or not infant mortality is low and whether the average life expectancy is in the 70s or 80s or in the 20s or 30s.  They also have a 100% chance of living past 25.

Looking at it from another angle, anyone who makes it to their first birthday has a higher life expectancy than the life expectancy at birth, anyone who makes it to their second birthday has a higher life expectancy still, and so forth.  Overall, the number of years you can expect to live beyond your current age goes down each year, because there's always a chance, even if it's small, that you won't live to see the next year.  However, it goes down by less than a year each year, because that chance isn't 100%.  Even as your expected number of years left decreases, your expected age of death increases, but more and more slowly as you age.

Past a certain point in adulthood, age-specific mortality tends to increase exponentially.  Since the chances of dying at, say, age 20 are pretty low, and the doubling period is pretty long, around 8-10 years, and the maximum for any probability is 100%, this doesn't produce the hockey-stick graph that's usually associated with exponential growth, but it's still exponential.  Every year, your chance of dying is multiplied by a fairly constant factor of around 1.08 to 1.09, or 8-9% annual growth, compounded.  Again from the US data, at age 20 you have about a 0.075% chance of dying that year.  At age 87, it's about 10%.  At age 98, it's about 30%.

This isn't a law of nature, but an empirical observation, and it doesn't seem to quite hold up at the high end.  For example, CDC data for the US shows a pretty plausibly exponential increase up to age 99, where the table stops, but extrapolating, the chance of death would become greater than 100% somewhere around age 110, even though people in the US have lived longer than that.

It's been predicted at some point, thanks to advances in medicine and other fields, life expectancy will start to increase by more than one year per year, and as a consequence anyone young enough when this starts to happen will live forever.  Life expectancy doesn't work that way, either.  There could be a lot of reasons for life expectancy in some population to go up by more than a year in any given year.

Again, the important measure is age-specific mortality.  If the chances of living to see the next year increase just a bit for people from, say, 20 to 50, life expectancy could increase by a year or more, but that just means that more people are going to make it into old age.  It doesn't mean that they'll live longer once they get there.

The key to extending the maximum lifespan is to increase the chances that an old person will live longer, not to increase the chances that someone will live to be old.   If, somehow, anyone 100 or older, but only them, suddenly had a steady 99% chance of living to their next birthday, then the average 100-year-old could look forward to living to about 169.  This wouldn't have much effect on overall life expectancy, though, because there aren't that many 100-year-olds to begin with.  

What are the actual numbers, once you get past, say, 100?  It's hard to tell, because there aren't very many people that old.  How many people live to a certain age depends not only on age-specific mortality, but on how many people are still around at what younger ages.  This may seem too obvious to state, but it's easy to lose track of this if you're only looking at overall probabilities.

Currently there's no verified record of anyone living to 123 and only one person has been verified to live past 120.  No man has been verified to live to 117, and only one has been verified to have lived to 116.  Does that mean that no one could live to, say, 135?  Not necessarily.  Does it mean that women inherently live longer than men?  Possibly, but again not necessarily.  Inference from rare events is tricky, and people who do this for a living know a lot more about the subject than I do, but in any case we're looking at handfuls out of however many people have well-verified birth dates in the early 1900s.

Suppose, for the sake of illustration, that after age 100 you have a steady 50/50 chance of living each subsequent year.  Of the people who live to 100, only 1/2 will live to 101, 1/4 to 102, then 1/8, 1/16 and so forth.  Only 1 in 1024 will live to be 110 and only 1 in 1,048,576 -- call it one in a million -- will live to 120.

If there are fewer than a million 100-year-olds to start with, the odds are against any of them living to 120, but they're not zero.  At any given point, you have to look at the ages of the people who are actually alive, and (your best estimate of) their odds of living each additional year.  If there are a million 100-year-olds now and each year is a 50/50 proposition, there probably won't be any 120-year-olds in twenty years, but if there does happen to be a 119-year-old after 19 years, there's a 50% chance there will be a 120-year-old a year later.  By the same reasoning, it's less likely that there were any 120-year-olds a thousand years ago, not only because age-specific mortality was very likely higher, but because there were simply fewer people around, so there were fewer 100-year-olds with a chance to turn 101, and so forth.

In real life, a 100-year-old has a much better than 50% chance of living to be 101, but we don't really know if age-specific mortality ever levels off.  We know that it's less than 100% at age 121, because someone lived to be 122, but that just indicates that at some point there's no longer an exponential increase in age-specific mortality (else it would hit 100% before then, based on the growth curve at ages where we do have a lot of data).  It doesn't mean that the mortality rate levels off.  It might still be increasing to 100%, but slowly enough that it doesn't actually hit 100% until sometime after age 121.

It may well be that there's some sort of mechanism of human biology that prevents anyone from living past 122 or thereabouts, and some mechanism of female human biology in particular that sets the limit for women higher than for men.  On the other hand, it may be that there aren't any 123-year-olds because so far only one person has made it to 122, and their luck ran out.

Similarly, there may not have been any 117-year-old  men because not enough men made it to, say, 80, for there to be a good chance of any of them making it to 116.  That in turn might be a matter of men being more likely to die younger, for example in the 20th-century wars that were fought primarily by men.  I'm sure that professionals have studied this and could probably confirm or refute this idea.  The main point is that at after a certain point the numbers thin out and it becomes very tricky to sort out all the possible factors behind them.

On the other hand, even if it's luck of the draw that no one has lived to 123, there could still be an inherent limit, whether it's 124, 150 or 1,000, just that no one's been lucky enough to get there.

Along with the difference between life expectancy and lifespan, and the importance of age-specific mortality, it's important to keep in mind where the numbers come from in the first place.  Life expectancy is calculated from age-specific-mortality, and age-specific mortality is measured by looking at people of a given age who are currently alive.  If you're 25 now, your age-specific mortality is based on the population of 25-year-olds from last year and what proportion of them survived to be 26.  Except in exceptional circumstances like a pandemic, that will be a pretty good estimate of your own chances for this year, but it's still based on a group you're not in, because you can only measure things that have happened in the past.

If you're 25 and you want to calculate how long you can expect to live, you'll need to look at the age-specific mortalities for age 25 on up.  The higher the age you're looking at, the more out-of-date it will be when you reach that age.  Current age-specific mortality for 30-year-olds is probably a good estimate of what yours will be at age 30, but current age-specific mortality at 70 might or might not be.  There's a good chance that 45 years from now we'll be significantly better at making sure a 70-year-old lives to be 71.  

Even if medical care doesn't change, a current 70-year-old is more likely to have smoked, or been exposed to high levels of carcinogens, or any of a number of other risk factors, than someone who's currently 25 will have been when they're 70.  Diet and physical activity have also changed over time, not necessarily for the better or worse, and it's a good bet they will continue to change.  There's no guarantee that our future 70-year-old's medical history will include fewer risk factors than a current 70-year-old's, but it will certainly be different.

For those and other reasons, the further into the future you go, the more uncertain the age-specific mortality becomes.  On the other hand, it also becomes less of a factor.  Right now, at least, it won't matter to most people whether age-specific mortality at 99 is half what it is now, because, unless mortality in old age drops by quite a bit, most people alive today are unlikely to live to be 99.

Sunday, May 2, 2021

Things are more like they are now than they have ever been

I hadn't noticed until I looked at the list, but it looks like this is post 100 for this blog.  As with the other blog, I didn't start out with a goal of writing any particular number of posts, or even on any particular schedule.  I can clearly remember browsing through a couple dozen posts early on and feeling like a hundred would be a lot.  Maybe I'd get there some day or maybe I wouldn't.  In any case, it's a nice round number, in base 10 anyway, so I thought I'd take that as an excuse to go off in a different direction from some of the recent themes like math, cognition and language.

The other day, a colleague pointed me at Josh Bloch's A Brief, Opinionated History of the API (disclaimer: Josh Bloch worked at Google for several years, and while he was no longer at Google when he made the video, it does support Google's position on the Google v. Oracle suit).  What jumped out at me, probably because Bloch spends a good portion of the talk on it, was just how much the developers of EDSAC, generally considered "the second electronic digital stored-program computer to go into regular service", anticipated, in 1949.

Bloch argues that its subroutine library -- literally a file cabinet full of punched paper tapes containing instructions for performing various common tasks -- could be considered the first API (Application Program Interface), but the team involved also developed several other building blocks of computing, including a form of mnemonic assembler (a notation for machine instructions designed for people to read and write without having to deal with raw numbers) and a boot loader (a small program whose purpose is to load larger programs into the computer memory).  For many years, their book on the subject, Preparation of Programs for Electronic Digital Computers, was required reading for anyone working with computers.

This isn't the first "Wow, they really thought of everything" moment I've had in my field of computing.  Another favorite is Ivan Sutherland's Sketchpad (which I really thought I'd already blogged about, but apparently not), generally considered the first fully-developed example of a graphical user interface.  It also laid foundations for object-oriented programming and offers an early example of constraint-solving as a way of interacting with computers.  Sutherland wrote it in 1963 as part of his PhD work.

These two pioneering achievements lie either side of the 1950s, a time that Americans often tend to regard as a period of rigid conformity and cold-war paranoia in the aftermath of World War II (as always, I can't speak for the rest of the world, and even when it comes to my own part, my perspective is limited). Nonetheless, it was also a decade of great innovation, both technically and culturally.  The Lincoln X-2 computer that Sketchpad ran on, released in 1958, had over 200 times the memory EDSAC had in 1949 (it must also have run considerably faster, but I haven't found the precise numbers).  This development happened in the midst of a major burst of progress throughout computing.  To pick a few milestones:

  • In 1950, Alan Turing wrote the paper that described the Imitation Game, now generally referred to as the Turing test.
  • In 1951, Remington Rand released the UNIVAC-I, the first general-purpose production computer in the US.  The transition from one-offs to full production is a key development in any technology.
  • In 1951, the solid-state transistor was developed.
  • In 1952, Grace Hopper published her first paper on compilers. The terminology of the time is confusing, but she was specifically talking about translating human-readable notation, at a higher level than just mnemonics for machine instructions, into machine code, exactly what the compilers I use on a daily basis do.  Her first compiler implementation was also in 1952.
  • In 1953, the University of Machester prototyped its Transistor Computer, the world's first transistorized computer, beginning a line of development that includes all commercial computers running today (as of this writing ... I'm counting current quantum computers as experimental).
  • In 1956, IBM prototyped the first hard drive, a technology still in use (though it's on the way out now that SSDs are widely available).
  • In 1957, the first FORTRAN compiler appeared.  In college, we loved to trash FORTRAN (in fact "FORTRASH" was the preferred name), but FORTRAN played a huge role in the development of scientific computing, and is still in use to this day.
  • In 1957, the first COMIT compiler appeared, developed by Victor Yngve et. al..  While the language itself is quite obscure, it begins a line of development in natural-language processing, one branch of which eventually led to everyone's favorite write-only language, Perl.
  • In 1958, John McCarthy developed the first LISP implementation.  LISP is based on Alonzo Church's lambda calculus, a computing model equivalent in power to the Turing/Von Neumann model that CPU designs are based on, but much more amenable to mathematical reasoning.  LISP was the workhorse of much early research in AI and its fundamental constructs, particularly lists, trees and closures, are still in wide use today (Java officially introduced lambda expressions in 2014).  Its explicit treatment of programs as data is foundational to computer language research.  Its automatic memory management, colloquially known as garbage collection, came along a bit later, but is a key feature of several currently popular languages (and explicitly not a key feature of some others). For my money, LISP is one of the two most influential programming languages, ever.
  • Also in 1958, the ZMMD group gave the name ALGOL to the programming language they were working on.  The 1958 version included "block statements", which supported what at the time was known as structured programming and is now so ubiquitous no one even notices there's anything special about it.  The shift from "do this action, now do this calculation and go to this step in the instructions if the result is zero (or negative, etc.)" to "do these things as long as this condition is true" was a major step in moving from a notation for what the computer was doing to a notation specifically designed for humans to work with algorithms.  Two years later, Algol 60 codified several more significant developments from the late 50s, resulting in a language famously described as "an improvement on its predecessors and many of its successors".  Most if not all widely-used languages -- for example Java, C/C++/C#, Python, JavaScript/ECMAScript, Ruby ... can trace their control structures and various other features directly back to Algol, making it, for my money, the other of the two most influential programming languages, ever.
  • In 1959, the CODASYL committee published the specification for COBOL, based on Hopper's work on FLOW-MATIC from 1950-1959.  As with FORTRAN, COBOL is now the target for widespread derision, and its PICTURE clauses turned out to be a major issue in the notorious Y2K panic.  Nonetheless, it has been hugely influential in business and government computing and until not too long ago more lines of code were written in COBOL than anything else (partly because COBOL infamously requires more lines of code than most languages to do the same thing)
  • In 1959, Tony Hoare wrote Quicksort, still one of the fastest ways to sort a list of items, the subject of much deep analysis and arguably one of the most widely-implemented and influential algorithms ever written.
This is just scratching the surface of developments in computing, and I've left off one of the great and needless tragedies of the field, Alan Turing's suicide in 1954.  On a different note, in 1958, the National Advisory Committee on Aeronautics became the National Aeronautics and Space Administration and disbanded its pool of computers, that is, people who performed computations, and Katherine Johnson began her career in aerospace technology in earnest.

It wasn't just a productive decade in computing.  Originally, I tried to list some of the major developments elsewhere in the sciences, and in art and culture in general in 1950s America, but I eventually realized that there was no way to do it without sounding like one of those news-TV specials and also leaving out significant people and accomplishments through sheer ignorance.  Even in the list above, in a field I know something about, I'm sure I've left out a lot, and someone else might come up with a completely different list of significant developments.

As I was thinking through this, though, I realized that I could write much the same post about any of a number of times and places.  The 1940s and 1960s were hardly quiet.  The 1930s saw huge economic upheaval in much of the world.  The Victorian era, also often portrayed as a period of stifling conformity, not to mention one of the starkest examples of rampant imperialism, was also a time of great technical innovation and cultural change.  The idea of the Dark Ages, where supposedly nothing of note happened between the fall of Rome and the Renaissance, has largely been debunked, and so on and on.

All of the above is heavily tilted toward "Western" history, not because it has a monopoly on innovation, but simply because I'm slightly less ignorant of it.  My default assumption now is that there has pretty much always been major innovation affecting large portions of the world's population, often in several places at once, and the main distinction is how well we're aware of it.

While Bloch's lecture was the jumping-off point for this post, I didn't take too long for me to realize that the real driver was one of the recurring themes from the other blog: not-so-disruptive technology.  That in turn comes from my nearly instinctive tendency to push back against "it's all different now" narratives and particularly the sort of breathless hype that, for better or worse, the Valley has excelled in for generations.

It may seem odd for someone to be both a technologist by trade and a skeptical pourer-of-cold-water by nature, but in my experience it's actually not that rare.  I know geeks who are eager first adopters of new shiny things, but I think there are at least as many who make a point of never getting version 1.0 of anything.  I may or may not be more behind-the-times than most, but the principle is widely understood: Version 1.0 is almost always the buggiest and generally harder to use than what will come along once the team has had a chance to catch a breath and respond to feedback from early adopters.  Don't get me wrong: if there weren't early adopters, hardly anything would get released at all.  It's just not in my nature to be one.

There are good reasons to put out a version 1.0 that doesn't do everything you'd want it to and doesn't run as reliably as you'd like.  The whole "launch and iterate" philosophy is based on the idea that you're not actually that good at predicting what people will like or dislike, so you shouldn't waste a lot of time building something based on your speculation.  Just get the basic idea out and be ready to run with whatever aspect of it people respond to.

Equally, a startup, or a new team within an established company, will typically only command a certain amount of resources (and giving a new team or company carte blanche often doesn't end well).  At some point you have to get more resources in, either from paying customers or from whoever you can convince that yes, this is really a thing.  Having a shippable if imperfect product makes that case much better than having a bunch of nice-looking presentations and well-polished sales pitches.  Especially when dealing with paying customers.

But there's probably another reason to put things out in a hurry.  Everyone knows that tech, and software in general, moves fast (whether or not it also breaks stuff).  In other words, there's a built-in cultural bias toward doing things quickly whether it makes sense or not, and then loudly proclaiming how fast you're moving and, therefore, how innovative and influential you must be.  I think this is the part I tend to react to.  It's easy to confuse activity with progress, and after seeing the same avoidable mistakes made a few times in the name of velocity, the eye can become somewhat jaundiced.

As much as I may tend toward that sort of reaction, I don't particularly enjoy it.  A different angle is to go back and study, appreciate, even celebrate, the accomplishments of people who came before.  The developments I mentioned above are all significant advances.  They didn't appear fully-formed out of a vacuum.  Each of them builds on previous developments, many just as significant but not as widely known.

Looking back and focusing on achievements, one doesn't see the many false starts and oversold inventions that went nowhere, just the good bits, the same way that we remember and cherish great music from previous eras and leave aside the much larger volume of unremarkable or outright bad.

Years from now, people will most likely look back on the present era much the same and pick out the developments that really mattered, leaving aside much of the commotion surrounding it.  It's not that the breathless hype is all wrong, much less that everything important has already happened, just that from the middle of it all it's harder to pick out what's what.  Not that there's a lack of opinions on the matter.

The quote in the tile has been attributed to several people, but no one seems to know who really said it first.

Monday, September 14, 2020

How real are real numbers?

There is always one more counting number.

That is, no matter how high you count, you can always count one higher.  Or at least in principle.  In practice you'll eventually get tired and give up.  If you build a machine to do the counting for you, eventually the machine will break down or it will run out of capacity to say what number it's currently on.  And so forth.  Nevertheless, there is nothing inherent in the idea of "counting number" to stop you from counting higher.

In a brief sentence, which after untold work by mathematicians over the centuries we now have several ways to state completely rigorously, we've described something that can exceed the capacity of the entire observable universe as measured in the smallest units we believe to be measurable.  The counting numbers (more formally, the natural numbers) are infinite, but they can be defined not only by finite means, but fairly concisely.

There are levels of infinity beyond the natural numbers.  Infinitely many, in fact.  Again, there are several ways to define these larger infinities, but one way to define the most prominent of them, based on the real numbers, involves the concept of continuity or, more precisely, completeness in the sense that the real numbers contain any number that you can get arbitrarily close to.

For example, you can list fractions that get arbitrarily close to the square root of two: 1.4 (14/10) is fairly close, 1.41 (141/100) is even closer, 1.414 (1414/1000) is closer still, and if I asked for a fraction within one one-millionth, or trillionth, or within 1/googol, that is, one divided by ten to the hundredth power, no problem.  Any number of libraries you can download off the web can do that for you.

Nonetheless, the square root of two is not itself the ratio of two natural numbers, that is, it is not a rational number (more or less what most people would call a fraction, but with a little more math in the definition).  The earliest widely-recognized recorded proof of this goes back to the Pythagoreans.  It's not clear exactly who else also figured it out when, but the idea is certainly ancient.  No matter how closely you approach the square root of two with fractions, you'll never find a fraction whose square is exactly two.

OK, but why shouldn't the square root of two be a number?  If you draw a right triangle with legs one meter long, the hypotenuse certainly has some length, and by the Pythagorean theorem, that length squared is two.  Surely that length is a number?

Over time, there were some attempts to sweep the matter under the rug by asserting that, no, only rational numbers are really numbers and there just isn't a number that squares to two.  That triangle? Dunno, maybe its legs weren't exactly one meter long, or it's not quite a right triangle?

This is not necessarily as misguided as it might sound.  In real life, there is always uncertainty, and we only know the angles and the lengths of the sides approximately.  We can slice fractions as finely as we like, so is it really so bad to say that all numbers are rational, and therefore you can't ever actually construct a right triangle with both legs exactly the same length, even if you can get as close as you like?

Be that as it may, modern mathematics takes the view that there are more numbers than just the rationals and that if you can get arbitrarily close to some quantity, well, that's a number too.  Modern mathematics also says there's a number that squares to negative one, which has its own interesting consequences, but that's for some imaginary other post (yep, sorry, couldn't help myself).

The result of adding all these numbers-you-can-get-arbitrarily-close-to to the original rational numbers (every rational number is already arbitrarily close to itself) is called the real numbers.  It turns out that (math-speak for "I'm not going to tell you why", but see the post on counting for an outline) in defining the real numbers you bring in not only infinitely many more numbers, but so infinitely many more numbers that the original rational numbers "form a set of measure zero", meaning that the chances of any particular real number being rational are zero (as usual, the actual machinery that allows you to apply probabilities here is a bit more involved).

To recap, we started with the infinitely many rational numbers -- countably infinite since it turns out that you can match them up one-for-one with the natural numbers* -- and now we have an uncountably infinite set of numbers, infinitely too big to match up with the naturals.

But again we did this with a finite amount of machinery.  We started with the rule "There is always one more counting number", snuck in some rules about fractions and division, and then added "if you can get arbitrarily close to something with rational numbers, then that something is a number, too".  More concisely, limits always exist (with a few stipulations, since this is math).

One might ask at this point how real any of this is.  In the real world we can only measure uncertainly, and as a result we can generally get by with only a small portion of even the rational numbers, say just those with a hundred decimal digits or fewer, and for most purposes probably those with just a few digits (a while ago I discussed just how tiny a set like this is).  By definition anything we, or all of the civilizations in the observable universe, can do is literally as nothing compared to infinity, so are we really dealing with an infinity of numbers, or just a finite set of rules for talking about them?

One possible reply comes from the world of quantum mechanics, a bit ironic since the whole point of quantum mechanics is that the world, or at least important aspects of it, is quantized, meaning that a given system can only take on a specific set of discrete states (though, to be fair, there are generally a countable infinity of such states, most of them vanishingly unlikely).  An atom is made of a discrete set of particles, each with an electric charge that's either 1, 0 or -1 times the charge of the electron, the particles of an atom can only have a discrete set of energies, and so forth (not everything is necessarily quantized, but that's a discussion well beyond my depth).

All of this stems from the Schrödinger EquationThe discrete nature of quantum systems comes from there only being a discrete set of solutions to that equation for a particular set of boundary conditions.  This is actually a fairly common phenomenon.  It's the same reason that you can only get a certain set of tones by blowing over the opening of a bottle (at least in theory).

The equation itself is a partial differential equation defined over the complex numbers, which have the same completeness property as the real numbers (in fact, a complex number can be expressed as a pair of real numbers).  This is not an incidental feature, but a fundamental part of the definition in at least two ways: Differential equations, including the Schrödinger equation, are defined in terms of limits, and this only works for numbers like the reals or the complex numbers where the limits in question are guaranteed to exist.  Also, it includes π, which is not just irrational, but transcendental, which more or less means it can only be defined as a limit of an infinite sequence.

In other words, the discrete world of quantum mechanics, our best attempt so far at describing the behavior of the world under most conditions, depends critically on the kind of continuous mathematics in which infinities, both countable and uncountable, are a fundamental part of the landscape.  If you can't describe the real world without such infinities, then they must, in some sense, be real.

Of course, it's not actually that simple.

When I said "differential equations are defined in terms of limits", I should have said "differential equations can be defined in terms of limits."  One facet of modern mathematics is the tendency to find multiple ways of expressing the same concept.  There are, for example, several different but equivalent ways of expressing the completeness of the real numbers, and several different ways of defining differential equations.

One common technique in modern mathematics (a technique is a trick you use more than once) is to start with one way of defining a concept, find some interesting properties, and then switch perspective and say that those interesting properties are the actual definition.

For example, if you start with the usual definition of the natural numbers: zero and an "add one" operation to give you the next number, you can define addition in terms of adding one repeatedly -- adding three is the same as adding one three times, because three is the result of adding one to zero three times.  You can then prove that addition gives the same result no matter what order you add numbers in (the commutative property).  You can also prove that adding two numbers and then adding a third one is the same as adding the first number to the sum of the other two (the associative property).

Then you can turn around and say "Addition is an operation that's commutative and associative, with a special number 0 such that adding 0 to a number always gives you that number back."  Suddenly you have a more powerful definition of addition that can apply not just to natural numbers, but to the reals, the complex numbers, the finite set of numbers on a clock face, rotations of a two-dimensional object, orderings of a (finite or infinite) list of items and all sorts of other things.  The original objects that were used to define addition -- the natural numbers 0, 1, 2 ... -- are no longer needed.  The new definition works for them, too, of course, but they're no longer essential to the definition.

You can do the same thing with a system like quantum mechanics.  Instead of saying that the behavior of particles is defined by the Schrödinger equation, you can say that quantum particles behave according to such-and-such rules, which are compatible with the Schrödinger equation the same way the more abstract definition of addition in terms of properties is compatible with the natural numbers.

This has been done, or at least attempted, in a few different ways (of course).  The catch is these more abstract systems depend on the notion of a Hilbert Space, which has even more and hairier infinities in it than the real numbers as described above.

How did we get from "there is always one more number" to "more and hairier infinities"?

The question that got us here was "Are we really dealing with an infinity of numbers, or just a finite set of rules for talking about them?"  In some sense, it has to be the latter -- as finite beings, we can only deal with a finite set of rules and try to figure out their consequences.  But that doesn't tell us anything one way or another about what the world is "really" like.

So then the question becomes something more like "Is the behavior of the real world best described by rules that imply things like infinities and limits?"  The best guess right now is "yes", but maybe the jury is still out.  Maybe we can define a more abstract version of quantum physics that doesn't require infinities, in the same way that defining addition doesn't require defining the natural numbers.  Then the question is whether that version is in some way "better" than the usual definition.

It's also possible that, as well-tested as quantum field theory is, there's some discrepancy between it and the real world that's best explained by assuming that the world isn't continuous and therefore the equations to describe it should be based on a discrete number system.  I haven't the foggiest idea how that could happen, but I don't see any fundamental logical reason to rule it out.

For now, however, it looks like the world is best described by differential equations like the Schrödinger equation, which is built on the complex numbers, which in turn are derived from the reals, with all their limits and infinities.  The (provisional) verdict then: the real numbers are real.

* One crude way to see that the rational numbers are countable is to note that there are no more rational numbers than there are pairs of numerator and denominator, each a natural number.    If you can count the pairs of natural numbers, you can count the rational numbers, by leaving out the pairs that have zero as the denominator and the pairs that aren't in lowest terms.  There will still be infinitely many rational numbers, even though you're leaving out an infinite number of (numerator, denominator) pairs, which is just a fun fact of dealing in infinities.  One way to count the pairs of natural numbers is to put them in a grid and count along the diagonals: (0,0), (1,0), (0,1), (2,0), (1,1), (0, 2), (3,0), (2,1), (1,2), (0,3) ... This gets every pair exactly once.

All of this is ignoring negative rational numbers like -5/42 or whatever, but if you like you can weave all those into the list by inserting a pair with a negative numerator after any pair with a non-zero numerator: (0,0), (1,0), (-1,0) (0,1), (2,0), (-2, 0), (1,1), (-1,1) (0, 2), (3,0), (-3, 0) (2,1), (-2, 1), (1,2), (-1,2) (0,3) ... Putting it all together, leaving out the zero denominators and not-in-lowest-terms, you get (0,1), (1,1), (-1, 1),(2,1),(-2,1),(1,2),(-1,2),(3,1),(-3,1),(1,3),(-1,3) ...

Another, much more interesting way of counting the rational numbers is via the Farey Sequence.

Sunday, September 13, 2020

Entropy and time's arrow

When contemplating the mysteries of time ... what is it, why is it how it is, why do remember the past but not the future ... it's seldom long before the second law of thermodynamics comes up.

In technical terms, the second law of thermodynamics states that the entropy of a closed system increases over time.  I've previously discussed what entropy is and isn't.  The short version is that entropy is a measure of uncertainty about the internal details of a system.  This is often shorthanded as "disorder", and that's not totally wrong, but it probably leads to more confusion than understanding.  This may be in part because uncertainty and disorder are both related to the more technical concept of symmetry, which may not mean what you might expect.  At least, I found some of this surprising when I first went over it.

Consider an ice cube melting.  Is a puddle of water more disordered than an ice cube?  One would think.  In an ice cube, each atom is locked into a crystal matrix, each atom in its place.  An atom in the liquid water is bouncing around, bumping into other atoms, held in place enough to keep from flying off into the air but otherwise free to move.

But which of the two is more symmetrical?  If your answer is "the ice cube", you're not alone.  That was my reflexive answer as well, and I expect that it would be for most people.  Actually, it's the water.  Why?  Symmetry is a measure of what you can do to something and still have it look the same.  The actual mathematical definition is, of course, a bit more technical, but that'll do for now.

An irregular lump of coal looks different if you turn it one way or another, so we call it asymmetrical.  A cube looks the same if you turn it 90 degrees in any of six directions, or 180 degrees in any of three directions, so we say it has "rotational symmetry" (and "reflective symmetry" as well).  A perfect sphere looks the same no matter which way you turn it, including, but not limited to, all the ways you can turn a cube and have the cube still look the same.  The sphere is more symmetrical than the cube, which is more symmetrical than the lump of coal.  So far so good.

A mass of water molecules bouncing around in a drop of water looks the same no matter which way you turn it.  It's symmetrical the same way a sphere is.  The crystal matrix of an ice cube only looks the same if you turn it in particular ways.  That is, liquid water is more symmetrical, at the microscopic level, than frozen water.  This is the same as saying we know less about the locations and motions of the individual molecules in liquid water than those in frozen water.  More uncertainty is the same as more entropy.

Geometrical symmetry is not the only thing going on here.  Ice at -100C has lower entropy than ice at -1C, because molecules in the colder ice have less kinetic energy and a narrower distribution of possible kinetic energies (loosely, they're not vibrating as quickly within the crystal matrix and there's less uncertainty about how quickly they're vibrating).  However, if you do see an increase in geometrical symmetry, you are also seeing an increase in uncertainty, which is to say entropy. The difference between cold ice and near-melting ice can also be expressed in terms of symmetry, but a more subtle kind of symmetry.  We'll get to that.

As with the previous post, I've spent more time on a sidebar than I meant to, so I'll try to get to the point by going off on another sidebar, but one more closely related to the real point.

Suppose you have a box with, say, 25 little bins in it arranged in a square grid.  There are five marbles in the box, one in each bin on the diagonal from upper left to lower right.  This arrangement has "180-degree rotational symmetry".  That is, you can rotate it 180 degrees and it will look the same.  If you rotate it 90 degrees, however, it will look clearly different.

Now put a lid on the box, give it a good shake and remove the lid.  The five marbles will have settled into some random assortment of bins (each bin can only hold one marble).  If you look closely, this random arrangement is very likely to be asymmetrical in the same way a lump of coal is: If you turn it 90 degrees, or 180, or reflect it in a mirror, the individual marbles will be in different positions than if you didn't rotate or reflect the box.

However, if you were to take a quick glimpse at the box from a distance, then have someone flip a coin and turn the box 90 degrees if the coin came up heads, then take another quick glimpse, you'd have trouble telling if the box had been turned or not.  You'd have no trouble with the marbles in their original arrangement on the diagonal.  In that sense, the random arrangement is more symmetrical than the original arrangement, just like the microscopic structure of liquid water is more symmetrical than that of ice.

[I went looking for some kind of textbook exposition along the lines of what follows but came up empty, so I'm not really sure where I got it from.  On the one hand, I think it's on solid ground in that there really is an invariant in here, so the math degree has no objections, though I did replace "statistically symmetrical" with "symmetrical" until I figure out what the right term, if any, actually is.

On the other hand, I'm not a physicist, or particularly close to being one, so this may be complete gibberish from a physicist's point of view.  At the very least, any symmetries involved have more to do with things like phase spaces, and "marbles in bins" is something more like "particles in quantum states".]

The magic word to make this all rigorous is "statistical".  That is, if you have a big enough grid and enough marbles and you just measure large-scale statistical properties, and look at distributions of values rather than the actual values, then an arrangement of marbles is more symmetrical if these rough measures measures don't change when you rotate the box (or reflect it, or shuffle the rows or columns, or whatever -- for brevity I'll stick to "rotate" here).

For example, if you count the number of marbles on each diagonal line (wrapping around so that each line has five bins), then for the original all-on-one-diagonal arrangement, there will be a sharp peak: five marbles on the main diagonal, one on each of the diagonals that cross that main diagonal, and zero on the others.  Rotate the box, and that peak moves.  For a random arrangement, the counts will all be more or less the same, both before and after you rotate the box.  A random arrangement is more symmetrical, in this statistical sense.

The important thing here is that there are many more symmetrical arrangements than not.  For example, there are ten wrap-around diagonals in a 5x5 grid (five in each direction) so there are ten ways to put five marbles in that kind of arrangement.  There are 53,130 total ways to put 5 marbles in 25 bins, so there are approximately 5,000 times as many more-symmetrical, that is, higher-entropy, arrangements.  Granted, some of these are still fairly unevenly distributed, for example four marbles on one diagonal and one off it, but even taking that into account, there are many more arrangements that look more or less the same if you rotate the box than there are that look significantly different.

This is a toy example.  If you scale up to, say, the number of molecules in a balloon at room temperature, "many more" becomes "practically all".  Even if the box has 2500 bins in a 50x50 grid, still ridiculously small compared to the trillions of trillions of molecules in a typical system like a balloon, or a vase, or a refrigerator or whatever, the odds that all of the balls line up on a diagonal are less than one in googol (that's ten to the hundredth power, not the search engine company). You can imagine all the molecules in a balloon crowding into one particular region, but for practical purposes it's not going to happen, at least not by chance in a balloon at room temperature.

If you start with the box of marbles in a not-very-symmetrical state and shake it up, you'll almost certainly end up with a more symmetrical state, simply because there are many more ways for that to happen.  Even if you only change one part of the system, say by taking out one marble and putting it back in a random empty bin adjacent to its original position, there are still more cases than not in which the new arrangement is more symmetrical than the old one.

If you continue making more random changes, whether large or small, the state of the box will get more symmetrical over time.  Strictly speaking, this is not an absolute certainty, but for anything we encounter in daily life the numbers are so big that the chances of anything else happening are essentially zero.  This will continue until the system reaches its maximum entropy, at which point large or small random changes will (essentially certainly) leave the system in a state just as symmetrical as it was before.

That's the second law -- as a closed system evolves, its entropy will essentially never decrease, and if it starts in a state of less than maximum entropy, its entropy will essentially always increase until it reaches maximum entropy.

And now to the point.

The second law gives a rigorous way to tell that time is passing.  In a classic example, if you watch a film of a vase falling off a table and shattering on the floor, you can tell instantly if the film is running forward or backward: if you see the pieces of a shattered vase assembling themselves into an intact vase, which then rises up and lands neatly on the table, you know the film is running backwards.  Thus it is said that the second law of thermodynamics gives time its direction.

As compelling as that may seem, there are a couple of problems with this view.  I didn't come up with any of these, of course, but I do find them convincing:

  • The argument is only compelling for part of the film.  In the time between the vase leaving the table and it making contact with the floor, the film looks fine either way.  You either see a vase falling, or you see it rising, presumably having been launched by some mechanism.  Either one is perfectly plausible, while the vase assembling itself from its many pieces is totally implausible.  But the lack of any obvious cue like pottery shards improbably assembling themselves doesn't stop time from passing.
  • If your recording process captured enough data, beyond just the visual image of the vase, you could in principle detect that the entropy of the contents of the room increases slightly if you run the film in one direction and decreases in the other, but that doesn't actually help because entropy can decrease locally without violating the second law.  For example, you can freeze water in a freezer or by leaving it out in the cold.  Its entropy decreases, but that's fine because entropy overall is still increasing, one way or another (for example, a refrigerator produces more entropy by dumping heat into the surrounding environment than it removes in cooling its contents).  If you watch a film of ice melting, there may not be any clear cues to tell you that you're not actually watching a film of ice freezing, running backward.  But time passes regardless of whether entropy is increasing or decreasing in the local environment.
  • Most importantly, though, in an example like a film running, we're only able to say "That film of a vase shattering is running backward" because we ourselves perceive time passing.  We can only say the film is running backward because it's running at all.  By "backward", we really mean "in the other direction from our perception of time".  Likewise, if we measure the entropy of a refrigerator and its contents, we can only say that entropy is increasing as time as we perceive it increases.
In other words, entropy increasing is a way that we can tell time is passing, but it's not the cause of time passing, any more than a mile marker on a road makes your car move.  In the example of the box of marbles, we can only say that the box went from a less symmetrical to more symmetrical state because we can say it was in one state before it was in the other.

If you printed a diagram of each arrangement of marbles on opposite sides of a piece of paper, you'd have two diagrams on a piece of paper.  You couldn't say one was before the other, or that time progressed from one to the other.  You can only say that if the state of the system undergoes random changes over time, then the system will get more symmetrical over time, and in particular the less symmetrical arrangement (almost certainly) won't happen after the more symmetrical one.  That is, entropy will increase.

You could even restate the second law as something like "As a system evolves over time, all state changes allowed by its current state are equally likely" and derive increasing entropy from that (strictly speaking you may have to distinguish identical-looking potential states in order to make "equally likely" work correctly -- the rigorous version of this is the ergodic hypothesis).  This in turn depends on the assumptions that systems have state, and that state changes over time.  Time is a fundamental assumption here, not a by-product.

In short, while you can use the second law to demonstrate that time is passing, you can't appeal to the second law to answer questions like "Why do we remember the past and not the future?"  It just doesn't apply.