Friday, March 25, 2022

The house with the green shutters

 Consider these two sentences:

  • I went around the house with the green shutters
In other words, the house has green shutters and I'm going around that house.
  • I went around the house with the green shutters to install
In other words, I have some green shutters I need to install on the house and I'm carrying them around the house.

These are considerably different meanings, and they have different structures from a grammatical point of view.  In the first sentence, with the green shutters is describing the house -- it has green shutters.  In the second, it is describing my going around the house -- I have the shutters with me as I go

This second sentence might be considered a garden-path sentence, which is a sentence that you have to reinterpret midway through because the interpretation you started with stops working.  Wikipedia has three well-known examples:
  • The old man the boats
  • The complex houses married and single soldiers and their families
  • The horse raced past the barn fell
If your first reaction to those sentences is "Wait ... what?" like mine was, they might make more sense with a little more context:
  • The young stay on shore.  The old man the boats.
  • The complex was built by the Corps of Engineers.  The complex houses married and single soldiers and their families.
  • The horse led down the path was fine.  The horse raced past the barn fell.
or with a slight change in wording
  • The old people man the boats
  • The housing complex houses married and single soldiers and their families
  • The horse that was raced past the barn fell
While sentences like these do come up in real life, especially in headlines or other situations where it's common to leave out words like "that" or "who" which can provide valuable clues about the structure of a sentence, they also feel a bit artificial.  An editor would be well within their rights to suggest that an author rephrase any of the three, because they're hard to read, because the whole structure and meaning aren't what you think they are at first.
  • the old man, with the adjective old modifying the noun man, changes to the old, a noun phrase made from an adjective, as the subject of the verb man.  The sentence fragment the old man becomes a complete sentence (though, granted, it's harder to leave the object off of man the boat than in a sentence like I read every day)
  • the complex houses, with the adjective complex modifying the noun houses, becomes the complex as the subject of the verb houses.  This is actually the same pattern as the first case, except that married can keep the game going (The complex houses married elements of the Rococo and Craftsman styles).  It may be worth noting that in this case, the two interpretations would generally sound distinct.  As a noun phrase, the complex houses would have the main stress on houses, while as a noun phrase and a verb, it would have the main stress on complex.
  • the horse raced, a complete sentence with raced in the simple past, becomes the horse modified by the past participle raced
Compared to these, I don't think either of the two "green shutters" sentences is particularly hard to understand.  While the change in meaning is significant, the change in structure isn't as great as in the garden-path examples.  The subject is still  I.  The verb is still went, modified by around the house.  The only difference is in what with applies to.  Every word, except possibly with, is used in the same sense in both sentences.

In technical terms, this is a syntactic ambiguity.  What's uncertain is which particular words relate to which others.  The meanings of the words themselves are the same either way.  At the very least, with remains a preposition.  In the garden-path sentences, the senses of the words, and in particular their parts of speech, change when the sentence is reinterpreted -- a lexical ambiguity, one reason to think there's something different going on in the two cases.

This sort of thing is bread and butter for linguistics and cognitive science experiments where subjects are given sentences and asked to, say, pick the picture that best matches them, with the experimenters timing the responses and looking for differences that suggest that some structures require more processing than others.  In this case, I strongly suspect that the sentences I gave would take much less time for people to sort out than the garden-path sentences.

In short, while I think that there are some similarities, I also think different things are going on in the brain when dealing with the sentences I gave, as opposed to garden-path sentences.

Even without running the experiments or considering garden-path sentences, there are some clear implications just from considering sentences like the "green shutters" ones above:
  • On the one hand, our parsing of sentences is sequential in some strong sense.  At several points, we can stop and say "This is a sentence".  If you hear nothing more, you can still work out possible meanings
    • I went around the house
    • I went around the house with the green shutters
    • I went around the house with the green shutters to install
  • On the other hand, the structure of a sentence is provisional in some sense.  After hearing I went around the house with the green shutters and associating with the green shutters with house, we can then hear to install and fairly easily re-associate with with went around the house.
  • Semantics and context affect this process.  The sentence I went around the house with the green shutters is itself ambiguous.  You could read it the same way as the other sentence, meaning that I was carrying green shutters around the house, but the house with the green shutters is much more likely to refer to the house, so you probably don't.  Similarly, putting a context sentence before a garden-path sentences makes it more likely that the garden-path sentence will make sense without re-reading.
(That last point runs counter to Chomsky's assertion that "[T]he notion of 'probability of a sentence' is an entirely useless one, under any known interpretation of this term")

Assuming that there's some sort of re-structuring going on when you hear to install after I went around the house with the green shutters, it would be interesting to see how different theories of grammar handle it.

In a phrase structure grammar, the shift between the two sentences is from a structure like
  • I [went [around [the house [with the green shutters]]]]
(a full parse tree would have a lot more to it than this) to
  • I [went [around [the house]][with the green shutters [to install]]]
That is, with the green shutters goes from being a constituent of the noun phrase (the house with the green shutters) to a constituent of the verb phrase went around the house with the green shutters to install.  From a phrase-structure point of view, the two possible readings of I went around the house with the green shutters are examples of a bracketing ambiguity, since there are two ways to put the brackets.

You can look at this as lifting [with the green shutters] out of [the house [with the green shutters]] and putting it back next to [around the house].  In principle, the place that [with the green shutters] is lifted out of can be as deep as you want: [I went [down the path [around the house [with the green shutters]]]] and so forth.  You're still moving a chunk of the parse tree from one place to another, but as the nesting gets deeper, you have to navigate through more tree nodes to find what you're moving.

In a dependency grammar, the shift is pretty simple: with switches from a dependent of house to a dependent of went (if I understand correctly, with would be a syntactic dependency of house or went, but the semantic dependency is the other way around: house or went would be a semantic dependency of went -- but there's a good chance I don't understand correctly).  Saying that I went around the house with the green shutters is ambiguous is saying that there are two possible places that with could attach as a dependency.

Consider one more sentence
  • I went around the house with the green shutters to install the awning
After seeing the awning, the shutters are back on the house and we're back where we started (and the object of install is now awning).  The fact that we can handle any of the three sentences suggests that there's something in the brain that can track both possible structures, that is, both ways of associating with, whether as a constituent or a dependency or something else, and switch back and forth between them, or in some cases even end up in a state of "Wait a sec, did you mean the shutters are on the house, or you were carrying them?"

There ought to be experiments to run in order to test this, and I wouldn't be surprised if they've already been run, but I'll leave that to the real linguists.

Wednesday, October 27, 2021

Mortality by the numbers

The following post talks about life expectancy, which inevitably means talking about people dying, and mostly-inevitably doing it in a fairly clinical way.  If that's not a topic you want to get into right now, I get it, and I hope the next post (whatever it is) will be more appealing. 

Maybe I just need to fix my news feed, but in the past few days I've run across at least two articles stating that for most of human existence people only lived 25 years or so.

Well ... no.

It is true that life expectancy at birth has taken a large jump in recent decades.  It's also true that estimates of life expectancy from prehistory up to about 1900 tend to be in the range of 20-35 years, and that estimates for modern-day hunter-gatherer societies are in the same range.  As I understand it, that's not a complete coincidence since estimates for prehistoric societies are generally not based on archeological evidence, which is thin for all but the best-studied cases, or written records, which by definition don't exist.  Rather, they're based on the assumption that ancient people were most similar to modern hunter-gatherers, so there you go.

None of this means that no one used to live past 25 or 30, though.  The life expectancy of a group is not the age by which everyone will have died.  That's the maximum lifespan.  Now that life expectancies are in the 70s and 80s, it's probably easier to confuse life expectancy with maximum lifespan, and from there conclude that life expectancy of 25 means people didn't live past 25, but that's not how it works.  For example, in the US, based on 2018 data, the average life expectancy was 78.7 years, but about half the population could expect to still be alive at age 83, and obviously there are lots of people in the US older than 78.7 years.  The story is similar for any real-world calculation of life expectancy.

A life expectancy of 25 years means that if you looked at everyone in the group you're studying, say, everyone born in a certain place in a given year, then counted up the total number of years everyone lived and divided that by the number of people in your group, you'd get 25 years.  For example, if your group includes ten people, three of them die as infants and the rest live 10, 15, 30, 35, 40, 50 and 70 years, that's 250 person-years.  Dividing that by ten people gives 25 years.

No matter what particular numbers you use, the only way the life expectancy can equal the maximum lifespan is if everybody lives to exactly that age.  If some people in a particular group died younger than the life expectancy, that means that someone else lived longer. 

Sadly, the example above is likely a plausible distribution for most times and places.  Current thinking is that for most of human existence, infant mortality has been much higher than it is now.  If you survived your first year, you had a good chance of making it to age 15, and if you made it that far, you had a good chance of living at least into your forties and probably your fifties.  In the made-up sample above, the people who made it past 15 lived to an average age of 45.  However, there was also a tragically high chance that a newborn wouldn't survive that first year.

Life expectancies in the 20s and 30s are mostly a matter of high infant mortality, and to a lesser extent high child mortality, not a matter of people dying in their mid 20s.  For the same reason, the increase in life expectancy in the late 20th century was largely a matter of many more people surviving their first year and of more children surviving into adulthood (even then, the rise in life expectancy hasn't been universal).

In real environments where average life expectancy is 25, there will be many people considerably older, and a 24-year-old has a very good chance of making it to 25, and then to 26 and onward.  The usual way of quantifying this is with age-specific mortality, which is the chance at any particular birthday that you won't make it to the next one (this is different from age-adjusted mortality, which accounts for age differences when comparing populations).

At any given age, you can use age-specific mortality rates to calculate how much longer a person can expect to live.  By itself, "life expectancy" means "life expectancy at birth", but you can also calculate life expectancy at age 30, or 70 or whatever.  From the US data above, a 70-year old can expect to live to age 86 (85.8 if you want to be picky).  A 70-year-old has a significantly higher chance of living to be 86 than someone just born, just because they've already lived to 70, whether or not infant mortality is low and whether the average life expectancy is in the 70s or 80s or in the 20s or 30s.  They also have a 100% chance of living past 25.

Looking at it from another angle, anyone who makes it to their first birthday has a higher life expectancy than the life expectancy at birth, anyone who makes it to their second birthday has a higher life expectancy still, and so forth.  Overall, the number of years you can expect to live beyond your current age goes down each year, because there's always a chance, even if it's small, that you won't live to see the next year.  However, it goes down by less than a year each year, because that chance isn't 100%.  Even as your expected number of years left decreases, your expected age of death increases, but more and more slowly as you age.

Past a certain point in adulthood, age-specific mortality tends to increase exponentially.  Since the chances of dying at, say, age 20 are pretty low, and the doubling period is pretty long, around 8-10 years, and the maximum for any probability is 100%, this doesn't produce the hockey-stick graph that's usually associated with exponential growth, but it's still exponential.  Every year, your chance of dying is multiplied by a fairly constant factor of around 1.08 to 1.09, or 8-9% annual growth, compounded.  Again from the US data, at age 20 you have about a 0.075% chance of dying that year.  At age 87, it's about 10%.  At age 98, it's about 30%.

This isn't a law of nature, but an empirical observation, and it doesn't seem to quite hold up at the high end.  For example, CDC data for the US shows a pretty plausibly exponential increase up to age 99, where the table stops, but extrapolating, the chance of death would become greater than 100% somewhere around age 110, even though people in the US have lived longer than that.

It's been predicted at some point, thanks to advances in medicine and other fields, life expectancy will start to increase by more than one year per year, and as a consequence anyone young enough when this starts to happen will live forever.  Life expectancy doesn't work that way, either.  There could be a lot of reasons for life expectancy in some population to go up by more than a year in any given year.

Again, the important measure is age-specific mortality.  If the chances of living to see the next year increase just a bit for people from, say, 20 to 50, life expectancy could increase by a year or more, but that just means that more people are going to make it into old age.  It doesn't mean that they'll live longer once they get there.

The key to extending the maximum lifespan is to increase the chances that an old person will live longer, not to increase the chances that someone will live to be old.   If, somehow, anyone 100 or older, but only them, suddenly had a steady 99% chance of living to their next birthday, then the average 100-year-old could look forward to living to about 169.  This wouldn't have much effect on overall life expectancy, though, because there aren't that many 100-year-olds to begin with.  

What are the actual numbers, once you get past, say, 100?  It's hard to tell, because there aren't very many people that old.  How many people live to a certain age depends not only on age-specific mortality, but on how many people are still around at what younger ages.  This may seem too obvious to state, but it's easy to lose track of this if you're only looking at overall probabilities.

Currently there's no verified record of anyone living to 123 and only one person has been verified to live past 120.  No man has been verified to live to 117, and only one has been verified to have lived to 116.  Does that mean that no one could live to, say, 135?  Not necessarily.  Does it mean that women inherently live longer than men?  Possibly, but again not necessarily.  Inference from rare events is tricky, and people who do this for a living know a lot more about the subject than I do, but in any case we're looking at handfuls out of however many people have well-verified birth dates in the early 1900s.

Suppose, for the sake of illustration, that after age 100 you have a steady 50/50 chance of living each subsequent year.  Of the people who live to 100, only 1/2 will live to 101, 1/4 to 102, then 1/8, 1/16 and so forth.  Only 1 in 1024 will live to be 110 and only 1 in 1,048,576 -- call it one in a million -- will live to 120.

If there are fewer than a million 100-year-olds to start with, the odds are against any of them living to 120, but they're not zero.  At any given point, you have to look at the ages of the people who are actually alive, and (your best estimate of) their odds of living each additional year.  If there are a million 100-year-olds now and each year is a 50/50 proposition, there probably won't be any 120-year-olds in twenty years, but if there does happen to be a 119-year-old after 19 years, there's a 50% chance there will be a 120-year-old a year later.  By the same reasoning, it's less likely that there were any 120-year-olds a thousand years ago, not only because age-specific mortality was very likely higher, but because there were simply fewer people around, so there were fewer 100-year-olds with a chance to turn 101, and so forth.

In real life, a 100-year-old has a much better than 50% chance of living to be 101, but we don't really know if age-specific mortality ever levels off.  We know that it's less than 100% at age 121, because someone lived to be 122, but that just indicates that at some point there's no longer an exponential increase in age-specific mortality (else it would hit 100% before then, based on the growth curve at ages where we do have a lot of data).  It doesn't mean that the mortality rate levels off.  It might still be increasing to 100%, but slowly enough that it doesn't actually hit 100% until sometime after age 121.

It may well be that there's some sort of mechanism of human biology that prevents anyone from living past 122 or thereabouts, and some mechanism of female human biology in particular that sets the limit for women higher than for men.  On the other hand, it may be that there aren't any 123-year-olds because so far only one person has made it to 122, and their luck ran out.

Similarly, there may not have been any 117-year-old  men because not enough men made it to, say, 80, for there to be a good chance of any of them making it to 116.  That in turn might be a matter of men being more likely to die younger, for example in the 20th-century wars that were fought primarily by men.  I'm sure that professionals have studied this and could probably confirm or refute this idea.  The main point is that at after a certain point the numbers thin out and it becomes very tricky to sort out all the possible factors behind them.

On the other hand, even if it's luck of the draw that no one has lived to 123, there could still be an inherent limit, whether it's 124, 150 or 1,000, just that no one's been lucky enough to get there.

Along with the difference between life expectancy and lifespan, and the importance of age-specific mortality, it's important to keep in mind where the numbers come from in the first place.  Life expectancy is calculated from age-specific-mortality, and age-specific mortality is measured by looking at people of a given age who are currently alive.  If you're 25 now, your age-specific mortality is based on the population of 25-year-olds from last year and what proportion of them survived to be 26.  Except in exceptional circumstances like a pandemic, that will be a pretty good estimate of your own chances for this year, but it's still based on a group you're not in, because you can only measure things that have happened in the past.

If you're 25 and you want to calculate how long you can expect to live, you'll need to look at the age-specific mortalities for age 25 on up.  The higher the age you're looking at, the more out-of-date it will be when you reach that age.  Current age-specific mortality for 30-year-olds is probably a good estimate of what yours will be at age 30, but current age-specific mortality at 70 might or might not be.  There's a good chance that 45 years from now we'll be significantly better at making sure a 70-year-old lives to be 71.  

Even if medical care doesn't change, a current 70-year-old is more likely to have smoked, or been exposed to high levels of carcinogens, or any of a number of other risk factors, than someone who's currently 25 will have been when they're 70.  Diet and physical activity have also changed over time, not necessarily for the better or worse, and it's a good bet they will continue to change.  There's no guarantee that our future 70-year-old's medical history will include fewer risk factors than a current 70-year-old's, but it will certainly be different.

For those and other reasons, the further into the future you go, the more uncertain the age-specific mortality becomes.  On the other hand, it also becomes less of a factor.  Right now, at least, it won't matter to most people whether age-specific mortality at 99 is half what it is now, because, unless mortality in old age drops by quite a bit, most people alive today are unlikely to live to be 99.

Sunday, May 2, 2021

Things are more like they are now than they have ever been

I hadn't noticed until I looked at the list, but it looks like this is post 100 for this blog.  As with the other blog, I didn't start out with a goal of writing any particular number of posts, or even on any particular schedule.  I can clearly remember browsing through a couple dozen posts early on and feeling like a hundred would be a lot.  Maybe I'd get there some day or maybe I wouldn't.  In any case, it's a nice round number, in base 10 anyway, so I thought I'd take that as an excuse to go off in a different direction from some of the recent themes like math, cognition and language.

The other day, a colleague pointed me at Josh Bloch's A Brief, Opinionated History of the API (disclaimer: Josh Bloch worked at Google for several years, and while he was no longer at Google when he made the video, it does support Google's position on the Google v. Oracle suit).  What jumped out at me, probably because Bloch spends a good portion of the talk on it, was just how much the developers of EDSAC, generally considered "the second electronic digital stored-program computer to go into regular service", anticipated, in 1949.

Bloch argues that its subroutine library -- literally a file cabinet full of punched paper tapes containing instructions for performing various common tasks -- could be considered the first API (Application Program Interface), but the team involved also developed several other building blocks of computing, including a form of mnemonic assembler (a notation for machine instructions designed for people to read and write without having to deal with raw numbers) and a boot loader (a small program whose purpose is to load larger programs into the computer memory).  For many years, their book on the subject, Preparation of Programs for Electronic Digital Computers, was required reading for anyone working with computers.

This isn't the first "Wow, they really thought of everything" moment I've had in my field of computing.  Another favorite is Ivan Sutherland's Sketchpad (which I really thought I'd already blogged about, but apparently not), generally considered the first fully-developed example of a graphical user interface.  It also laid foundations for object-oriented programming and offers an early example of constraint-solving as a way of interacting with computers.  Sutherland wrote it in 1963 as part of his PhD work.

These two pioneering achievements lie either side of the 1950s, a time that Americans often tend to regard as a period of rigid conformity and cold-war paranoia in the aftermath of World War II (as always, I can't speak for the rest of the world, and even when it comes to my own part, my perspective is limited). Nonetheless, it was also a decade of great innovation, both technically and culturally.  The Lincoln X-2 computer that Sketchpad ran on, released in 1958, had over 200 times the memory EDSAC had in 1949 (it must also have run considerably faster, but I haven't found the precise numbers).  This development happened in the midst of a major burst of progress throughout computing.  To pick a few milestones:

  • In 1950, Alan Turing wrote the paper that described the Imitation Game, now generally referred to as the Turing test.
  • In 1951, Remington Rand released the UNIVAC-I, the first general-purpose production computer in the US.  The transition from one-offs to full production is a key development in any technology.
  • In 1951, the solid-state transistor was developed.
  • In 1952, Grace Hopper published her first paper on compilers. The terminology of the time is confusing, but she was specifically talking about translating human-readable notation, at a higher level than just mnemonics for machine instructions, into machine code, exactly what the compilers I use on a daily basis do.  Her first compiler implementation was also in 1952.
  • In 1953, the University of Machester prototyped its Transistor Computer, the world's first transistorized computer, beginning a line of development that includes all commercial computers running today (as of this writing ... I'm counting current quantum computers as experimental).
  • In 1956, IBM prototyped the first hard drive, a technology still in use (though it's on the way out now that SSDs are widely available).
  • In 1957, the first FORTRAN compiler appeared.  In college, we loved to trash FORTRAN (in fact "FORTRASH" was the preferred name), but FORTRAN played a huge role in the development of scientific computing, and is still in use to this day.
  • In 1957, the first COMIT compiler appeared, developed by Victor Yngve et. al..  While the language itself is quite obscure, it begins a line of development in natural-language processing, one branch of which eventually led to everyone's favorite write-only language, Perl.
  • In 1958, John McCarthy developed the first LISP implementation.  LISP is based on Alonzo Church's lambda calculus, a computing model equivalent in power to the Turing/Von Neumann model that CPU designs are based on, but much more amenable to mathematical reasoning.  LISP was the workhorse of much early research in AI and its fundamental constructs, particularly lists, trees and closures, are still in wide use today (Java officially introduced lambda expressions in 2014).  Its explicit treatment of programs as data is foundational to computer language research.  Its automatic memory management, colloquially known as garbage collection, came along a bit later, but is a key feature of several currently popular languages (and explicitly not a key feature of some others). For my money, LISP is one of the two most influential programming languages, ever.
  • Also in 1958, the ZMMD group gave the name ALGOL to the programming language they were working on.  The 1958 version included "block statements", which supported what at the time was known as structured programming and is now so ubiquitous no one even notices there's anything special about it.  The shift from "do this action, now do this calculation and go to this step in the instructions if the result is zero (or negative, etc.)" to "do these things as long as this condition is true" was a major step in moving from a notation for what the computer was doing to a notation specifically designed for humans to work with algorithms.  Two years later, Algol 60 codified several more significant developments from the late 50s, resulting in a language famously described as "an improvement on its predecessors and many of its successors".  Most if not all widely-used languages -- for example Java, C/C++/C#, Python, JavaScript/ECMAScript, Ruby ... can trace their control structures and various other features directly back to Algol, making it, for my money, the other of the two most influential programming languages, ever.
  • In 1959, the CODASYL committee published the specification for COBOL, based on Hopper's work on FLOW-MATIC from 1950-1959.  As with FORTRAN, COBOL is now the target for widespread derision, and its PICTURE clauses turned out to be a major issue in the notorious Y2K panic.  Nonetheless, it has been hugely influential in business and government computing and until not too long ago more lines of code were written in COBOL than anything else (partly because COBOL infamously requires more lines of code than most languages to do the same thing)
  • In 1959, Tony Hoare wrote Quicksort, still one of the fastest ways to sort a list of items, the subject of much deep analysis and arguably one of the most widely-implemented and influential algorithms ever written.
This is just scratching the surface of developments in computing, and I've left off one of the great and needless tragedies of the field, Alan Turing's suicide in 1954.  On a different note, in 1958, the National Advisory Committee on Aeronautics became the National Aeronautics and Space Administration and disbanded its pool of computers, that is, people who performed computations, and Katherine Johnson began her career in aerospace technology in earnest.

It wasn't just a productive decade in computing.  Originally, I tried to list some of the major developments elsewhere in the sciences, and in art and culture in general in 1950s America, but I eventually realized that there was no way to do it without sounding like one of those news-TV specials and also leaving out significant people and accomplishments through sheer ignorance.  Even in the list above, in a field I know something about, I'm sure I've left out a lot, and someone else might come up with a completely different list of significant developments.

As I was thinking through this, though, I realized that I could write much the same post about any of a number of times and places.  The 1940s and 1960s were hardly quiet.  The 1930s saw huge economic upheaval in much of the world.  The Victorian era, also often portrayed as a period of stifling conformity, not to mention one of the starkest examples of rampant imperialism, was also a time of great technical innovation and cultural change.  The idea of the Dark Ages, where supposedly nothing of note happened between the fall of Rome and the Renaissance, has largely been debunked, and so on and on.

All of the above is heavily tilted toward "Western" history, not because it has a monopoly on innovation, but simply because I'm slightly less ignorant of it.  My default assumption now is that there has pretty much always been major innovation affecting large portions of the world's population, often in several places at once, and the main distinction is how well we're aware of it.

While Bloch's lecture was the jumping-off point for this post, I didn't take too long for me to realize that the real driver was one of the recurring themes from the other blog: not-so-disruptive technology.  That in turn comes from my nearly instinctive tendency to push back against "it's all different now" narratives and particularly the sort of breathless hype that, for better or worse, the Valley has excelled in for generations.

It may seem odd for someone to be both a technologist by trade and a skeptical pourer-of-cold-water by nature, but in my experience it's actually not that rare.  I know geeks who are eager first adopters of new shiny things, but I think there are at least as many who make a point of never getting version 1.0 of anything.  I may or may not be more behind-the-times than most, but the principle is widely understood: Version 1.0 is almost always the buggiest and generally harder to use than what will come along once the team has had a chance to catch a breath and respond to feedback from early adopters.  Don't get me wrong: if there weren't early adopters, hardly anything would get released at all.  It's just not in my nature to be one.

There are good reasons to put out a version 1.0 that doesn't do everything you'd want it to and doesn't run as reliably as you'd like.  The whole "launch and iterate" philosophy is based on the idea that you're not actually that good at predicting what people will like or dislike, so you shouldn't waste a lot of time building something based on your speculation.  Just get the basic idea out and be ready to run with whatever aspect of it people respond to.

Equally, a startup, or a new team within an established company, will typically only command a certain amount of resources (and giving a new team or company carte blanche often doesn't end well).  At some point you have to get more resources in, either from paying customers or from whoever you can convince that yes, this is really a thing.  Having a shippable if imperfect product makes that case much better than having a bunch of nice-looking presentations and well-polished sales pitches.  Especially when dealing with paying customers.

But there's probably another reason to put things out in a hurry.  Everyone knows that tech, and software in general, moves fast (whether or not it also breaks stuff).  In other words, there's a built-in cultural bias toward doing things quickly whether it makes sense or not, and then loudly proclaiming how fast you're moving and, therefore, how innovative and influential you must be.  I think this is the part I tend to react to.  It's easy to confuse activity with progress, and after seeing the same avoidable mistakes made a few times in the name of velocity, the eye can become somewhat jaundiced.

As much as I may tend toward that sort of reaction, I don't particularly enjoy it.  A different angle is to go back and study, appreciate, even celebrate, the accomplishments of people who came before.  The developments I mentioned above are all significant advances.  They didn't appear fully-formed out of a vacuum.  Each of them builds on previous developments, many just as significant but not as widely known.

Looking back and focusing on achievements, one doesn't see the many false starts and oversold inventions that went nowhere, just the good bits, the same way that we remember and cherish great music from previous eras and leave aside the much larger volume of unremarkable or outright bad.

Years from now, people will most likely look back on the present era much the same and pick out the developments that really mattered, leaving aside much of the commotion surrounding it.  It's not that the breathless hype is all wrong, much less that everything important has already happened, just that from the middle of it all it's harder to pick out what's what.  Not that there's a lack of opinions on the matter.

The quote in the tile has been attributed to several people, but no one seems to know who really said it first.

Monday, September 14, 2020

How real are real numbers?

There is always one more counting number.

That is, no matter how high you count, you can always count one higher.  Or at least in principle.  In practice you'll eventually get tired and give up.  If you build a machine to do the counting for you, eventually the machine will break down or it will run out of capacity to say what number it's currently on.  And so forth.  Nevertheless, there is nothing inherent in the idea of "counting number" to stop you from counting higher.

In a brief sentence, which after untold work by mathematicians over the centuries we now have several ways to state completely rigorously, we've described something that can exceed the capacity of the entire observable universe as measured in the smallest units we believe to be measurable.  The counting numbers (more formally, the natural numbers) are infinite, but they can be defined not only by finite means, but fairly concisely.

There are levels of infinity beyond the natural numbers.  Infinitely many, in fact.  Again, there are several ways to define these larger infinities, but one way to define the most prominent of them, based on the real numbers, involves the concept of continuity or, more precisely, completeness in the sense that the real numbers contain any number that you can get arbitrarily close to.

For example, you can list fractions that get arbitrarily close to the square root of two: 1.4 (14/10) is fairly close, 1.41 (141/100) is even closer, 1.414 (1414/1000) is closer still, and if I asked for a fraction within one one-millionth, or trillionth, or within 1/googol, that is, one divided by ten to the hundredth power, no problem.  Any number of libraries you can download off the web can do that for you.

Nonetheless, the square root of two is not itself the ratio of two natural numbers, that is, it is not a rational number or fraction.  The earliest widely-recognized recorded proof of this goes back to the Pythagoreans.  It's not clear exactly who else also figured it out when, but the idea is certainly ancient.  No matter how closely you approach the square root of two with fractions, you'll never find a fraction whose square is exactly two.

OK, but why shouldn't the square root of two be a number?  If you draw a right triangle with legs one meter long, the hypotenuse certainly has some length, and by the Pythagorean theorem, that length squared is two.  Surely that length is a number?

Over time, there were some attempts to sweep the matter under the rug by asserting that, no, only rational numbers are really numbers and there just isn't a number that squares to two.  That triangle? Dunno, maybe its legs weren't exactly one meter long, or it's not quite a right triangle?

This is not necessarily as misguided as it might sound.  In real life, there is always uncertainty, and we only know the angles and the lengths of the sides approximately.  We can slice fractions as finely as we like, so is it really so bad to say that all numbers are rational, and therefore you can't ever actually construct a right triangle with both legs exactly the same length, even if you can get as close as you like?

Be that as it may, modern mathematics takes the view that there are more numbers than just the rationals and that if you can get arbitrarily close to some quantity, well, that's a number too.  Modern mathematics also says there's a number that squares to negative one, which has its own interesting consequences, but that's for some imaginary other post (yep, sorry, couldn't help myself).  The result of adding all these numbers-you-can-get-arbitrarily-close-to to the original rational numbers (every rational number is already arbitrarily close to itself) is called the real numbers.

It turns out that (math-speak for "I'm not going to tell you why", but see the post on counting for an outline) in defining the real numbers you bring in not only infinitely many more numbers, but so infinitely many more numbers that the original rational numbers "form a set of measure zero", meaning that the chances of any particular real number being rational are zero (as usual, the actual machinery that allows you to apply probabilities here is a bit more involved).

To recap, we started with the infinitely many rational numbers -- countably infinite since it turns out that you can match them up one-for-one with the natural numbers* -- and now we have an uncountably infinite set of numbers, infinitely too big to match up with the naturals.

But again we did this with a finite amount of machinery.  We started with the rule "There is always one more counting number", snuck in some rules about fractions and division, and then added "if you can get arbitrarily close to something with rational numbers, then that something is a number, too".  More concisely, limits always exist (with a few stipulations, since this is math).

One might ask at this point how real any of this is.  In the real world we can only measure uncertainly, and as a result we can generally get by with only a small portion of even the rational numbers, say just those with a hundred decimal digits or fewer, and for most purposes probably those with just a few digits (a while ago I discussed just how tiny a set like this is).  By definition anything we, or all of the civilizations in the observable universe, can do is literally as nothing compared to infinity, so are we really dealing with an infinity of numbers, or just a finite set of rules for talking about them?

One possible reply comes from the world of quantum mechanics, a bit ironic since the whole point of quantum mechanics is that the world, or at least important aspects of it, is quantized, meaning that a given system can only take on a specific set of discrete states (though, to be fair, there are generally a countable infinity of such states, most of them vanishingly unlikely).  An atom is made of a discrete set of particles, each with an electric charge that's either 1, 0 or -1 times the charge of the electron, the particles of an atom can only have a discrete set of energies, and so forth (not everything is necessarily quantized, but that's a discussion well beyond my depth).

All of this stems from the Schrödinger EquationThe discrete nature of quantum systems comes from there only being a discrete set of solutions to that equation for a particular set of boundary conditions.  This is actually a fairly common phenomenon.  It's the same reason that you can only get a certain set of tones by blowing over the opening of a bottle (at least in theory).

The equation itself is a partial differential equation defined over the complex numbers, which have the same completeness property as the real numbers (in fact, a complex number can be expressed as a pair of real numbers).  This is not an incidental feature, but a fundamental part of the definition in at least two ways: Differential equations, including the Schrödinger equation, are defined in terms of limits, and this only works for numbers like the reals or the complex numbers where the limits in question are guaranteed to exist.  Also, it includes π, which is not just irrational, but transcendental, which more or less means it can only be defined as a limit of an infinite sequence.

In other words, the discrete world of quantum mechanics, our best attempt so far at describing the behavior of the world under most conditions, depends critically on the kind of continuous mathematics in which infinities, both countable and uncountable, are a fundamental part of the landscape.  If you can't describe the real world without such infinities, then they must, in some sense, be real.

Of course, it's not actually that simple.

When I said "differential equations are defined in terms of limits", I should have said "differential equations can be defined in terms of limits."  One facet of modern mathematics is the tendency to find multiple ways of expressing the same concept.  There are, for example, several different but equivalent ways of expressing the completeness of the real numbers, and several different ways of defining differential equations.

One common technique (a technique is a trick you use more than once) is to start with one way of defining a concept, find some interesting properties, and then switch perspective and say that those interesting properties are the actual definition.

For example, if you start with the usual definition of the natural numbers: zero and an "add one" operation to give you the next number, you can define addition in terms of adding one repeatedly -- adding three is the same as adding one three times, because three is the result of adding one to zero three times.  You can then prove that addition gives the same result no matter what order you add numbers in (the commutative property).  You can also prove that adding two numbers and then adding a third one is the same as adding the first number to the sum of the other two (the associative property).

Then you can turn around and say "Addition is an operation that's commutative and associative, with a special number 0 such that adding 0 to a number always gives you that number back."  Suddenly you have a more powerful definition of addition that can apply not just to natural numbers, but the reals, the complex numbers, the finite set of numbers on a clock face, rotations of a two-dimensional object, orderings of a (finite or infinite) list of items and all sorts of other things.  The original objects that were used to define addition -- the natural numbers 0, 1, 2 ... -- are no longer needed.  The new definition works for them, too, of course, but they're no longer essential to the definition.

You can do the same thing with a system like quantum mechanics.  Instead of saying that the behavior of particles is defined by the Schrödinger equation, you can say that quantum particles behave according to such-and-such rules, which are compatible with the Schrödinger equation the same way the more abstract definition of addition in terms of properties is compatible with the natural numbers.

This has been done, or at least attempted, in a few different ways (of course).  The catch is these more abstract systems depend on the notion of a Hilbert Space, which has even more and hairier infinities in it than the real numbers as described above.

How did we get from "there is always one more number" to "more and hairier infinities"?

The question that got us here was "Are we really dealing with an infinity of numbers, or just a finite set of rules for talking about them?"  In some sense, it has to be the latter -- as finite beings, we can only deal with a finite set of rules and try to figure out their consequences.  But that doesn't tell us anything one way or another about what the world is "really" like.

So then the question becomes something more like "Is the behavior of the real world best described by rules that imply things like infinities and limits?"  The best guess right now is "yes", but maybe the jury is still out.  Maybe we can define a more abstract version of quantum physics that doesn't require infinities, in the same way that defining addition doesn't require defining the natural numbers.  Then the question is whether that version is in some way "better" than the usual definition.

It's also possible that, as well-tested as quantum field theory is, there's some discrepancy between it and the real world that's best explained by assuming that the world isn't continuous and therefore the equations to describe it should be based on a discrete number system.  I haven't the foggiest idea how that could happen, but I don't see any fundamental logical reason to rule it out.

For now, however, it looks like the world is best described by differential equations like the Schrödinger equation, which is built on the complex numbers, which in turn are derived from the reals, with all their limits and infinities.  The (provisional) verdict then: the real numbers are real.

* One crude way to see that the rational numbers are countable is to note that there are no more rational numbers than there are pairs of numerator and denominator, each a natural number.    If you can count the pairs of natural numbers, you can count the rational numbers, by leaving out the pairs that have zero as the denominator and the pairs that aren't in lowest terms.  There will still be infinitely many rational numbers, even though you're leaving out an infinite number of (numerator, denominator) pairs, which is just a fun fact of dealing in infinities.  One way to count the pairs of natural numbers is to put them in a grid and count along the diagonals: (0,0), (1,0), (0,1), (2,0), (1,1), (0, 2), (3,0), (2,1), (1,2), (0,3) ... This gets every pair exactly once.

All of this is ignoring negative rational numbers like -5/42 or whatever, but if you like you can weave all those into the list by inserting a pair with a negative numerator after any pair with a non-zero numerator: (0,0), (1,0), (-1,0) (0,1), (2,0), (-2, 0), (1,1), (-1,1) (0, 2), (3,0), (-3, 0) (2,1), (-2, 1), (1,2), (-1,2) (0,3) ... Putting it all together, leaving out the zero denominators and not-in-lowest-terms, you get (0,1), (1,1), (-1, 1),(2,1),(-2,1),(1,2),(-1,2),(3,1),(-3,1),(1,3),(-1,3) ...

Another, much more interesting way of counting the rational numbers is via the Farey Sequence.

Sunday, September 13, 2020

Entropy and time's arrow

When contemplating the mysteries of time ... what is it, why is it how it is, why do remember the past but not the future ... it's seldom long before the second law of thermodynamics comes up.

In technical terms, the second law of thermodynamics states that the entropy of a closed system increases over time.  I've previously discussed what entropy is and isn't.  The short version is that entropy is a measure of uncertainty about the internal details of a system.  This is often shorthanded as "disorder", and that's not totally wrong, but it probably leads to more confusion than understanding.  This may be in part because uncertainty and disorder are both related to the more technical concept of symmetry, which may not mean what you might expect.  At least, I found some of this surprising when I first went over it.

Consider an ice cube melting.  Is a puddle of water more disordered than an ice cube?  One would think.  In an ice cube, each atom is locked into a crystal matrix, each atom in its place.  An atom in the liquid water is bouncing around, bumping into other atoms, held in place enough to keep from flying off into the air but otherwise free to move.

But which of the two is more symmetrical?  If your answer is "the ice cube", you're not alone.  That was my reflexive answer as well, and I expect that it would be for most people.  Actually, it's the water.  Why?  Symmetry is a measure of what you can do to something and still have it look the same.  The actual mathematical definition is, of course, a bit more technical, but that'll do for now.

An irregular lump of coal looks different if you turn it one way or another, so we call it asymmetrical.  A cube looks the same if you turn it 90 degrees in any of six directions, or 180 degrees in any of three directions, so we say it has "rotational symmetry" (and "reflective symmetry" as well).  A perfect sphere looks the same no matter which way you turn it, including, but not limited to, all the ways you can turn a cube and have the cube still look the same.  The sphere is more symmetrical than the cube, which is more symmetrical than the lump of coal.  So far so good.

A mass of water molecules bouncing around in a drop of water looks the same no matter which way you turn it.  It's symmetrical the same way a sphere is.  The crystal matrix of an ice cube only looks the same if you turn it in particular ways.  That is, liquid water is more symmetrical, at the microscopic level, than frozen water.  This is the same as saying we know less about the locations and motions of the individual molecules in liquid water than those in frozen water.  More uncertainty is the same as more entropy.

Geometrical symmetry is not the only thing going on here.  Ice at -100C has lower entropy than ice at -1C, because molecules in the colder ice have less kinetic energy and a narrower distribution of possible kinetic energies (loosely, they're not vibrating as quickly within the crystal matrix and there's less uncertainty about how quickly they're vibrating).  However, if you do see an increase in geometrical symmetry, you are also seeing an increase in uncertainty, which is to say entropy. The difference between cold ice and near-melting ice can also be expressed in terms of symmetry, but a more subtle kind of symmetry.  We'll get to that.

As with the previous post, I've spent more time on a sidebar than I meant to, so I'll try to get to the point by going off on another sidebar, but one more closely related to the real point.

Suppose you have a box with, say, 25 little bins in it arranged in a square grid.  There are five marbles in the box, one in each bin on the diagonal from upper left to lower right.  This arrangement has "180-degree rotational symmetry".  That is, you can rotate it 180 degrees and it will look the same.  If you rotate it 90 degrees, however, it will look clearly different.

Now put a lid on the box, give it a good shake and remove the lid.  The five marbles will have settled into some random assortment of bins (each bin can only hold one marble).  If you look closely, this random arrangement is very likely to be asymmetrical in the same way a lump of coal is: If you turn it 90 degrees, or 180, or reflect it in a mirror, the individual marbles will be in different positions than if you didn't rotate or reflect the box.

However, if you were to take a quick glimpse at the box from a distance, then have someone flip a coin and turn the box 90 degrees if the coin came up heads, then take another quick glimpse, you'd have trouble telling if the box had been turned or not.  You'd have no trouble with the marbles in their original arrangement on the diagonal.  In that sense, the random arrangement is more symmetrical than the original arrangement, just like the microscopic structure of liquid water is more symmetrical than that of ice.

[I went looking for some kind of textbook exposition along the lines of what follows but came up empty, so I'm not really sure where I got it from.  On the one hand, I think it's on solid ground in that there really is an invariant in here, so the math degree has no objections, though I did replace "statistically symmetrical" with "symmetrical" until I figure out what the right term, if any, actually is.

On the other hand, I'm not a physicist, or particularly close to being one, so this may be complete gibberish from a physicist's point of view.  At the very least, any symmetries involved have more to do with things like phase spaces, and "marbles in bins" is something more like "particles in quantum states".]

The magic word to make this all rigorous is "statistical".  That is, if you have a big enough grid and enough marbles and you just measure large-scale statistical properties, and look at distributions of values rather than the actual values, then an arrangement of marbles is more symmetrical if these rough measures measures don't change when you rotate the box (or reflect it, or shuffle the rows or columns, or whatever -- for brevity I'll stick to "rotate" here).

For example, if you count the number of marbles on each diagonal line (wrapping around so that each line has five bins), then for the original all-on-one-diagonal arrangement, there will be a sharp peak: five marbles on the main diagonal, one on each of the diagonals that cross that main diagonal, and zero on the others.  Rotate the box, and that peak moves.  For a random arrangement, the counts will all be more or less the same, both before and after you rotate the box.  A random arrangement is more symmetrical, in this statistical sense.

The important thing here is that there are many more symmetrical arrangements than not.  For example, there are ten wrap-around diagonals in a 5x5 grid (five in each direction) so there are ten ways to put five marbles in that kind of arrangement.  There are 53,130 total ways to put 5 marbles in 25 bins, so there are approximately 5,000 times as many more-symmetrical, that is, higher-entropy, arrangements.  Granted, some of these are still fairly unevenly distributed, for example four marbles on one diagonal and one off it, but even taking that into account, there are many more arrangements that look more or less the same if you rotate the box than there are that look significantly different.

This is a toy example.  If you scale up to, say, the number of molecules in a balloon at room temperature, "many more" becomes "practically all".  Even if the box has 2500 bins in a 50x50 grid, still ridiculously small compared to the trillions of trillions of molecules in a typical system like a balloon, or a vase, or a refrigerator or whatever, the odds that all of the balls line up on a diagonal are less than one in googol (that's ten to the hundredth power, not the search engine company). You can imagine all the molecules in a balloon crowding into one particular region, but for practical purposes it's not going to happen, at least not by chance in a balloon at room temperature.

If you start with the box of marbles in a not-very-symmetrical state and shake it up, you'll almost certainly end up with a more symmetrical state, simply because there are many more ways for that to happen.  Even if you only change one part of the system, say by taking out one marble and putting it back in a random empty bin adjacent to its original position, there are still more cases than not in which the new arrangement is more symmetrical than the old one.

If you continue making more random changes, whether large or small, the state of the box will get more symmetrical over time.  Strictly speaking, this is not an absolute certainty, but for anything we encounter in daily life the numbers are so big that the chances of anything else happening are essentially zero.  This will continue until the system reaches its maximum entropy, at which point large or small random changes will (essentially certainly) leave the system in a state just as symmetrical as it was before.

That's the second law -- as a closed system evolves, its entropy will essentially never decrease, and if it starts in a state of less than maximum entropy, its entropy will essentially always increase until it reaches maximum entropy.

And now to the point.

The second law gives a rigorous way to tell that time is passing.  In a classic example, if you watch a film of a vase falling off a table and shattering on the floor, you can tell instantly if the film is running forward or backward: if you see the pieces of a shattered vase assembling themselves into an intact vase, which then rises up and lands neatly on the table, you know the film is running backwards.  Thus it is said that the second law of thermodynamics gives time its direction.

As compelling as that may seem, there are a couple of problems with this view.  I didn't come up with any of these, of course, but I do find them convincing:

  • The argument is only compelling for part of the film.  In the time between the vase leaving the table and it making contact with the floor, the film looks fine either way.  You either see a vase falling, or you see it rising, presumably having been launched by some mechanism.  Either one is perfectly plausible, while the vase assembling itself from its many pieces is totally implausible.  But the lack of any obvious cue like pottery shards improbably assembling themselves doesn't stop time from passing.
  • If your recording process captured enough data, beyond just the visual image of the vase, you could in principle detect that the entropy of the contents of the room increases slightly if you run the film in one direction and decreases in the other, but that doesn't actually help because entropy can decrease locally without violating the second law.  For example, you can freeze water in a freezer or by leaving it out in the cold.  Its entropy decreases, but that's fine because entropy overall is still increasing, one way or another (for example, a refrigerator produces more entropy by dumping heat into the surrounding environment than it removes in cooling its contents).  If you watch a film of ice melting, there may not be any clear cues to tell you that you're not actually watching a film of ice freezing, running backward.  But time passes regardless of whether entropy is increasing or decreasing in the local environment.
  • Most importantly, though, in an example like a film running, we're only able to say "That film of a vase shattering is running backward" because we ourselves perceive time passing.  We can only say the film is running backward because it's running at all.  By "backward", we really mean "in the other direction from our perception of time".  Likewise, if we measure the entropy of a refrigerator and its contents, we can only say that entropy is increasing as time as we perceive it increases.
In other words, entropy increasing is a way that we can tell time is passing, but it's not the cause of time passing, any more than a mile marker on a road makes your car move.  In the example of the box of marbles, we can only say that the box went from a less symmetrical to more symmetrical state because we can say it was in one state before it was in the other.

If you printed a diagram of each arrangement of marbles on opposite sides of a piece of paper, you'd have two diagrams on a piece of paper.  You couldn't say one was before the other, or that time progressed from one to the other.  You can only say that if the state of the system undergoes random changes over time, then the system will get more symmetrical over time, and in particular the less symmetrical arrangement (almost certainly) won't happen after the more symmetrical one.  That is, entropy will increase.

You could even restate the second law as something like "As a system evolves over time, all state changes allowed by its current state are equally likely" and derive increasing entropy from that (strictly speaking you may have to distinguish identical-looking potential states in order to make "equally likely" work correctly -- the rigorous version of this is the ergodic hypothesis).  This in turn depends on the assumptions that systems have state, and that state changes over time.  Time is a fundamental assumption here, not a by-product.

In short, while you can use the second law to demonstrate that time is passing, you can't appeal to the second law to answer questions like "Why do we remember the past and not the future?"  It just doesn't apply.

Saturday, September 12, 2020

What part of consciousness is social?

I think a lot of questions about consciousness fall into one of two categories:

  • What is it, that is, what features does it have, what states of consciousness are there, what are reasonable tests of whether something is conscious or not (given that we can't directly experience any consciousness but our own)?
  • How does it happen, that is, what causes things (like us, for example) to have conscious experiences?
Reading that over, I'm not sure it really captures the distinction I want to make.  The first item deals in experiments people know how to do right now, and there has been quite a lot of exciting work on the first type of question, falling under rubrics like "cognitive science" and "neural correlates of consciousness".

I mean for the second item to represent "the hard problem of consciousness", the "Why does anyone experience anything at all?" kind of question.  It's not clear whether one can conduct experiments about questions like this at all and, as far as I know, no one has an answer to that isn't ultimately circular.

For example, "We have consciousness because we have a soul" by itself doesn't answer "What is a soul?" and "How does it give us consciousness?" or clearly suggest an experiment that could confirm or refute it.  Instead, it states a defining property (typically among others): A soul is something which gives us consciousness.  The discussion doesn't necessarily end there, but if there's an answer to How does consciousness happen in it, it's not in the mere assertion that souls give us consciousness.

Similarly, if we substitute more mechanistic terms like "quantum indeterminacy" or "chaos of non-linear systems" or whatever else for "soul" in "We have consciousness because ...", we haven't explained why that leads to the subjective experience of consciousness or provided a way to test the assertion.  We may well be able to demonstrate that some aspect or another of consciousness is associated with some structure -- some collection of neurons, one might expect -- where quantum indeterminacy or chaos plays a significant role, but that doesn't explain why that structure correlates with consciousness rather than being just another structure along with the gall bladder, earlobe or whatever else.

If we were able to pinpoint some complex of neural circuits that fire exactly when a person is conscious, or perhaps more realistically, in a particular state of waking consciousness, or consciousness of a particular experience, it would be tempting, then, to say "Aha! We've found the neural circuits that cause consciousness," but that's not really accurate, for a couple of reasons.

First, correlation doesn't imply cause, which is why we speak of neural correlates of consciousness, not causes.  Second, even if there's a good case that the neural pattern we locate really is a cause -- for example, maybe it can be demonstrated that if the pattern is disrupted the person loses consciousness, as opposed to the other way around -- we still don't know what is causing a person to have the subjective experience of consciousness.  We can talk with some confidence about patterns of neurons firing, or even of subjects reporting particular experiences, but we can't speak with confidence about people actually experiencing things.

If we didn't already know that subjective experiences existed (or, at least, I know my subjective experiences exist), there's nothing about the experiment that would tell us that they did, much less why.  All we know is that if neurons are firing in such-and-such a state, the subject reports conscious experiences.

Since we do experience consciousness, it's blindingly obvious to us that the subject must be as well, but again that just shifts the problem back a level: We're convinced that we have found something that causes the subject to experience what we experience, but that doesn't explain why we experience anything to begin with.  If we were all "philosophical zombies" that exhibited all the outward signs of consciousness without actually experiencing it, the experiment would run exactly the same -- except that no one would actually experience it happening.

That's more than I meant to say about the second bullet point.  I actually meant to explore the first one, so let's try that.

Suppose you're hanging out in your hammock on a pleasant afternoon (note to self: how did I let the summer go by without that?).  You hear the wind in the trees, maybe birds chirping or dogs barking or kids playing, or cars going by, or whatever.  You are alone with your own thoughts, but for a while even those die down and you're just ... being.  Are you conscious?  Unless you've actually drifted off to sleep, I think most people would answer yes.  If someone taps you on your shoulder or shouts your name, you'll probably respond, though you might be a bit slow to come back up to speed.  If it starts to rain, you'll feel it.  If something makes a loud noise and you manage to regain your meditative state, you're still liable to remember the noise.

On the other hand, it's something of a different state of consciousness than much of our usual existence.  There's nothing verbal going on.  There's no interaction with other people, none of the constant evaluation  (much of which we're generally not aware of) concerning what people might be thinking, or whether they heard or understood you, or whether you're understanding them, or what their motives might be, or their opinions of you or others around, or what they might be aware of or unaware of.  You're not having an inner conversation with yourself or that jerk who cut you off at the intersection, and there's little to no self consciousness, if you're only focusing on the sensory experience of the moment (indeed, this is a major reason people actively seek such a meditative state).

I've become more and more convinced over time that we often underestimate how conscious other beings are.  I don't subscribe to the sort of literal panpsychism that holds that a brick has a consciousness, that "It is something (to a brick) to be a brick".  I doubt this is a particularly widely held position anyway, so much as the anchor at one end of a spectrum between it and "nothing is actually conscious at all".  However, I am open to the idea that anything with a certain minimum complement of capabilities which can be measured fairly objectively, including particularly senses and memory, has some sort of consciousness, and, as a corollary, that there are many different kinds or components of consciousness that different things have at different times.

For example, a hawk circling over a field waiting for a mouse to pop out of its burrow likely has some sort of experience of doing this, and if it spots a mouse, it has some sort of awareness of there now being prey to pursue with the goal of eating it or, if there are no mice, an awareness of being hungry.  This wouldn't be awareness on a verbal, reflective level we experience when we notice we are hungry and tell someone about it, but something more akin to that "I'm relaxing in a hammock and things are just happening" kind of awareness.  I also wouldn't claim that this awareness is serving any particular purpose.  Rather, it's a side effect of having the sort of mental circuitry a hawk has and being embodied in a universe where time exists -- another mystery that may well be deeply connected to the hard problem of consciousness.

I think this is in some sense the simplest hypothesis, given that we have the same general kind of neural machinery as hawks and that we can experience things happening.  It still presupposes that there's some sort of structural difference between things with at least some subjective experiences and things with no such experiences at all, but that "something" becomes a fairly general and widely-shared capacity for sensing the world and retaining some memory of it rather than a specialized facility unique to us.  The difference between us and a hawk is not that we're conscious and hawks aren't, but that we have a different set of experiences from hawks.  For the most part this would be a larger set of experiences, but, if you buy the premise of hawks having experiences at all, there are almost certainly some that they have but we don't.

Which leads me back to the title of this post.

I suspect that if you polled a bunch of people about consciousness in other animals, you'd see more "yes" answers to "is a chimpanzee conscious" or "is a dog conscious" than to "is a hawk conscious" or "is a salmon conscious".  Some of this is probably due to our concept of intelligence in other animals.  Most people probably think that chimps and dogs are "smart animals", while hawks and salmon are "just regular animals".

However, I think our judgment of that is strongly colored by chimps and dogs being more social animals than hawks or fish (even fish that school are probably not social in the same way we are -- I'd go into why I think that, but this post is already running a bit long).  It doesn't take much observation of chimps and dogs interacting with their own species and with humans to conclude that they have some awareness of individual identities and social structure, the ability to persuade others to do what they want (or at least try), and other aspects of behavior that are geared specifically toward interaction with those around them.  Other animals do interact with each other, but social animals like chimps, dogs and humans normally do so on a daily basis as a central part of life.

This social orientation produces its own set of experiences beyond "things are happening in the physical world" experiences like hunger and an awareness that some potential food just popped out of a burrow.  I think it's this particular kind of experience that we tend to gravitate toward when we think of conscious experience.  More specifically, self-awareness is often held out as the hallmark of "true consciousness", and I think there's a good case that self-awareness is closely connected to the sort of "what is that one over there thinking and what do they want" calculation that comes of living as a social animal.

To some extent this is a matter of definition.  If you define consciousness as self-awareness, then it's probably relatively rare, even if several species are able to pass tests like the mirror test (Can the subject tell that the animal in the mirror is itself?).  However, if you define consciousness as the ability to have subjective experiences, then I think it's hard to argue that it's not widespread.  In that formulation, self-awareness is a particular kind of subjective experience limited to relatively few kinds of being, but only one kind of experience among many.

Tuesday, March 10, 2020

Memory makes you smarter

Another sidebar working up to talking about the hide-and-seek demo.

Few words express more exasperation than "I just told you that!", and -- fairly or not -- there are few things that can lower someone's opinion of another person's cognitive function faster than not remembering simple things.

Ironically for systems that can remember much more data much more permanently and accurately than we ever could, computers often seem to remember very little.  For example, I just tried a couple of online AI chatbots, including one that claimed to have passed a Turing test.  The conversations went something like this:
Me: How are you?
Bot: I'm good.
Me: That's great to hear.  My name is Fred.  My cousin went to the store the other day and bought some soup.
<a bit of typical AI bot chat, pattern-matching what I said and parroting it back, trying stock phrases etc.>
Me: By the way, I just forgot my own name.  What was it?
<some dodge, though one did note that it was a bit silly to forget one's own name>
Me: Do you remember what my cousin bought the other day?
<some other dodge with nothing to do with what I said>
The bots are not even trying to remember the conversation, even in the rudimentary sense of scanning back over the previous text.  They appear to have little to no memory of anything before the last thing the human typed.

Conversely, web pages suddenly got a lot smarter when sites started using cookies to remember state between visits and again when browsers started to be able to remember things you'd typed in previously.  There's absolutely nothing anyone would call AI going on, but it still makes the difference between "dumb computer" and "not so bad".

When I say "memory" here, I mean the memory of things that happen while the program is running.  Chess engines often incorporate "opening books" of positions that have occurred in previous games, so they can play the first few moves of a typical game without doing any calculation.  Neural networks go through a training phase (whether guided by humans or not).  One way or another, that training data is incorporated into the weightings that determine the networks behavior.

In some sense, those are both a form of memory -- they certainly consume storage on the underlying hardware -- but they're both baked in beforehand.  A chess engine in a tournament is not updating its opening book.  As I understand it, neural network-based chess engines don't update their weights while playing in a tournament, but can do so between rounds (but if you're winning handily, how much do you really want to learn from your opponents' play?).

Likewise, a face recognizer will have been trained on some particular set of faces and non-faces before being set loose on your photo collection.  For better or worse, its choices are not going to change until the next release (unless there's randomization involved).

Chess engines do use memory to their advantage in one way: they tend to remember a "cache" of positions they've already evaluated in determining previous moves.  If you play a response that the engine has already evaluated in detail, it will have a head start in calculating its next move.  This is standard in AB engines, at least (though it may be turned off during tournaments).  I'm not sure how much it applies for NN engines.   To the extent it does apply, I'd say this absolutely counts as "memory makes you smarter".

Overall, though, examples of what we would typically call memory seem to be fairly rare in AI applications.  Most current applications can be framed as processing a particular state of the world without reference to what happened before.  Recognizing a face is just recognizing a face.

Getting a robot moving on a slippery surface is similar, as I understand it.  You take a number of inputs regarding the position and velocity of the various members and whatever visual input you have, and from that calculate what signals to send to the actuators.  There's (probably?) a buffer remembering a small number of seconds worth of inputs, but beyond that, what's past is past (for that matter, there's some evidence that what we perceive as "the present" is basically a buffer of what happened in the past few seconds).

Translating speech to text works well enough a word or phrase at a time, even if remembering more context might (or might not) help with sorting out homonyms and such.   In any case, translators that I'm familiar with clearly aren't gathering context from previous sentences.  It's not even clear they can remember all of the current sentence.

One of the most interesting things about the hide-and-seek demo is that its agents are capable of some sort of more sophisticated memory.  In particular, they can be taught some notion of object permanence, usually defined as the ability to remember that objects exist even when you can't see them directly, as when something is moved behind an opaque barrier.  In purely behavioral terms, you might analyze it as the ability to change behavior in response to objects that aren't directly visible, and the hide-and-seek agents can definitely do that.  Exactly how they do that and what that might imply is what I'm really trying to get to here ...

Sunday, March 1, 2020

Intelligence and intelligence

I've been meaning for quite a while to come back to the hide-and-seek AI demo, but while mulling that over I realized something about a distinction I'd made in the first post.  I wanted to mention that brief(-ish-)ly in its own post, since it's not directly related to what I wanted to say about the demo itself.

In the original post, I distinguished between internal notions of intelligence, concerning what processes are behind the observed behavior, and external notions which focus on the behavior itself (note to self: find out what terms actual cogsci/AI researchers use -- or maybe structural and functional would be better?).

Internal definitions on the order of "Something is intelligent if it's capable of learning and dealing with abstract concepts" seem satisfying, even self-evident, until you try to pin down exactly what is meant by "learning" or "abstract concept".  External definitions are, by construction, more objective and measurable, but often seem to call things "intelligent" that we would prefer not to call intelligent at all, or call intelligent in a very limited sense.

The classic example would be chess (transcribing speech and recognizing faces would be others).  For quite a while humans could beat computers at chess, even though even early computers could calculate many more positions than a human, and the assumption was that humans had something -- abstract reasoning, planning, pattern recognition, whatever -- that computers did not have and might never have.  Therefore, humans would always win until computers could reason abstractly, plan, recognize patterns or whatever else it was that only humans could do. In other words, chess clearly required "real intelligence".

Then Deep Blue beat Kasparov through sheer calculation, playing a "positional" style that only humans were supposed to be able to play.  Clearly a machine could beat even the best human players at chess without having anything one could remotely call "learning" or "abstract concepts".  As a corollary, top-notch chess-playing is not a behavior that can be used to define the kind of intelligence we're really interested in.

This is true even with the advent of Alpha Zero and similar neural-network driven engines*. Even if we say, for the sake of the argument, that neural networks are intelligent like we are, the original point still holds.  Things that are clearly unintelligent can play top-notch chess, so "plays top-notch chess" does not imply "intelligent like we are".  If neural networks are intelligent like we are, it won't be because they can play chess, but for other reasons.

The hide-and-seek demo is exciting because on the one hand, it's entirely behavior based.  The agents are trained on the very simple criterion of whether any hiders are visible to the seekers.  On the other hand, though, the agents can develop capabilities, particularly object permanence, that have been recognized as hallmarks of intelligence since before there were computers (there's a longer discussion behind this, which is exactly what I want to get to in the next post on the topic).

In other words, we have a nice, objective external definition that matches up well with internal definitions.  Something that can
  • Start with only basic knowledge and capabilities (in this case some simple rules about movement and objects in the simulated environment)
  • Develop new behaviors in a competition against agents with the same capabilities
is pretty clearly intelligent in some meaningful sense, even if it doesn't seem as intelligent as us.

If we want to be more precise about "develop new behaviors", we could either single out particular behaviors, like fort building or ramp jumping, or just require that any new agent we're trying to test starts out by losing heavily to the best agents from this demo but learns to beat them, or at least play competitively.

This says nothing about what mechanisms such an agent is using, or how it learns.  This means we might some day run into a situation like chess where something beats the game without actually appearing intelligent in any other sense, maybe some future quantum computer that can simultaneously try out all a huge variety of possible strategies.  Even then, we would learn something interesting.

For now, though, the hide-and-seek demo seems like a significant step forward, both in defining what intelligence might be and in producing it artificially.

* I've discussed Alpha Zero and chess engines in general at length elsewhere in this blog.  My current take is that the ability of neural networks to play moves that appear "creative" to us and to beat purely calculation based (AB) engines doesn't imply intelligence, and that the ability to learn the game from nothing, while impressive, doesn't imply anything like what we think of as human intelligence, even though it's been applied to a number of different abstract games.  That isn't a statement about neural networks in general, just about these particular networks being applied to the specific problem of chess and chess-like games.  There's a lot of interesting work yet to be done with neural networks in general.

Sunday, February 23, 2020

What good is half a language?

True Wit is Nature to advantage dress'd
What oft was thought, but ne'er so well express'd
-- Alexander Pope

How did humans come to have language?

There is, to put it mildly, a lot we don't know about this.  Apart from the traditional explanations from various cultures, which are interesting in their own right, academic fields including evolutionary biology, cognitive science and linguistics have had various things to say about the question, so why shouldn't random bloggers?

In what follows, please remember that the title of this blog is Intermittent Conjecture.  I'm not an expert in any of those three fields, though I've had an amateur interest in all three for years and years.  Real research requires careful gathering of evidence and checking of sources, detailed knowledge of the existing literature, extensive review and in general lots of time and effort.  I can confidently state that none of those went into this post, and anything in here should be weighed accordingly.  Also, I'm not claiming any original insight.  Most likely, all the points here have already been made, and better made, by someone else already.

With that said ...

In order to talk about how humans came to have language, the first question to address is what does it mean to have language at all.  Language is so pervasive in human existence that it's surprisingly hard to step back and come up with an objective definition that captures the important features of language and doesn't directly or indirectly amount to "It's that thing people do when they talk (or sign, or write, or ...) in order to communicate information."

We want to delimit, at least roughly, something that includes all the ways we use language, but excludes other activities, including things that we sometimes call "language", but that we somehow know aren't "really" language, say body language, the language of flowers or, ideally, even computer languages, which deliberately share a number of features with human natural languages.

Since language is often considered something unique to humans, or even something that makes us human, it might be tempting to actively try to exclude various ways that other animals communicate, but it seems better to me just to try to pin down what we mean by human language and let the chips fall where they may when it comes to other species.

For me, some of the interesting features of language are
  • It can communicate complex, arbitrary structures from one mind to another, however imperfectly.
  • It is robust in the face of noise and imperfection (think of shouting in a loud music venue or talking with someone struggling with a second language).
  • It tolerates ambiguity, meaning that (unlike in computer languages and other formal systems) ambiguity doesn't bring a conversation to a halt.  In some cases it's even a useful feature.
  • Any given language provides multiple ways to express the same basic facts, each with its own particular connotations and emphasis.
  • Different languages often express the same basic facts in very different ways.
  • Related to these, language is fluid across time and populations.  Usage changes over time and varies across populations.
  • It can be communicated by a variety of means, notably speech, signing and writing.
  • From an evolutionary point of view, it has survival value.
I'd call these functional properties, meaning that they relate mainly to what language does without saying anything concrete about how it does it.  Structurally (from here on I'll tend to focus on spoken/written language, with the understanding that it's not the whole story),
  • Language is linear.
That is, whatever the medium, words are produced and received one at a time, though there can be a number of "side channels" such as pitch and emphasis, facial expressions and hand gestures.
  • The mapping between a word and its meaning is largely arbitrary (though you can generally trace a pretty elaborate history involving similar words with similar meanings).
  • Vocabulary is extensible.
We can coin words for new concepts.  This is true only for certain kinds of words, but where it can happen it happens easily.
  • Meaning is also extensible
We can apply existing words in new senses and again this happens easily.
  • The forms used adjust to social conditions.
You speak differently with your peers after work than you would to your boss at work, or to your parents as a child, or to your prospective in-laws, and so forth
  • The forms used adjust to particular needs of the conversation, for example which details you want to emphasize (or obscure).
  • Some concepts seem to more tightly coupled to the structure of a particular language than others.
For example, when something happened or will happen in relation to when it is spoken of is generally part of the grammar, or marked by a small, closed set of words, or both.
  • On the other hand, there is wide variety in exactly how such things are expressed.
Different languages emphasize different distinctions.  For example, some languages don't specially mark singular/plural, or past/present, though of course they can still express that there was more than one of something or that something happened yesterday rather than today.  Different languages use different devices to convey basic information like when something happened or what belongs to whom.
  • Syntax, in the form of word order and inflection (changing the forms of words, as with changing dog to dogs or bark to barked or barking), collectively seem to matter in all languages, but the exact way in which they matter, and the degree to which each matters, seem to be unique to any given language.  Even closely related languages generally differ in the exact details.
There are plenty of other features that could each merit a separate post, such as honorifics (Mr. Hull) and diminutives (Davey), or how accent and vocabulary are such devastatingly effective in-group markers, or how metaphors work, or what determines when and how we choose to move words around to focus on a topic, or why some languages build up long words that equate to whole sentences of short words in other languages, or why in some languages directional words like to and of take on grammatical meaning, or why different languages break down verb tenses in different ways, or can use different words for numbers depending on what's being counted, and so on and so on ...

Many of these features of language have to do with the interplay between cognition -- how we think -- and language -- how we express thoughts.  The development of cognition must have been both a driver and a limiting factor in the development of language, but we are almost certainly still in the very early stages of understanding this relationship.

For example, languages generally seem to have a way of nesting one clause inside another, as in The fence that went around the house that was blue was red.  How would this arise?  In order to understand such a sentence, we need some way of setting aside The fence while we deal with that went around the house that was blue and then connecting was red back to the fence in order to understand that the fence is red and the house is blue.  To a compugeek, this means something like a stack, a data structure for storing and retrieving things such that the last thing stored is the first thing retrieved.

Cognitively, handling such a sentence is like veering of a path on some side trip and returning to pick up where you left off, or setting aside a task to handle some interruption and then returning to the original task.  Neither of these abilities is anywhere near unique to humans, so they must older than humanity, even though we are the only animals that we know of that seem to use them in communication.

These cognitive abilities are also completely separate from a large number of individual adaptations of our vocal apparatus, which do seem to be unique to us, notably fine control of breathing and of the position of the tongue and shape of the mouth.  While these adaptations are essential to our being able to speak as fluently as we do, they don't have anything to do with what kinds of sentences we can express, just how well we can do so using spoken words.  Sign languages get along perfectly well without them.

In other words, it's quite possible we were able to conceive of structures like "I saw that the lion that killed the wildebeest went around behind that hill over there" without being able to put them into words, and that ability only came along later.  There's certainly no shortage, even in modern humans, of things that are easy to think but hard to express (I'd give a few examples, but ...).  The question here, then, is not "How did we develop the ability to think in nested clauses?" but "How did we come to use the grammatical structures we now see in languages to communicate such thoughts?"

There's a lot to evolution, and it has to be right up there with quantum mechanics as far as scientific theories that are easy to oversimplify, draw unwarranted conclusions from, or to get outright wrong, so this next bit is even less precise than what I've already said.  For example, I'm completely skirting around major issues of population genetics -- how a gene (useful or not) spreads (or doesn't) in a population.

Let's try to consider vocabulary in an evolutionary context.  I pick vocabulary to start with because it's clearly distinct from grammar.  Indeed one of the useful features of a grammar is that you can plug an arbitrary set of words into it.  Conversely, one requirement for developing language as we know it is the ability to learn and use a large and expandable vocabulary.  Without that, and regardless of the grammatical apparatus, we do not account for the way people actually use language.

Suppose some animal has the ability to make one kind of call when a it spots particular predator and a different call for another predator, in such a way that is conspecifics (animals of the same species) can understand and react appropriately.  That's two calls (three if you count not making any call) and it's easy to see how that could be useful in not getting eaten.  Again, this is far from unique to us (see here, and search for "vervets" in the post, for example).

Now suppose some particular animal is born with the ability to make a third call for some other hazard, say a large branch falling (this is more than a bit contrived, but bear with me).  A large branch falls, the animal cries out ... and no one does anything.  The ability to make new calls isn't particularly useful without the ability to understand new calls.  But suppose that nobody did anything because they didn't know what the new call meant, but they were able to connect "that oddball over there made a funny noise" with "a big branch fell".  The next time a big branch falls and our three-call-making friend cries out, everyone looks out and scatters to safety.  Progress.

I'm more than a bit skeptical that the ability to make three calls rather than two would arise by a lucky mutation, but I think there are still two valid points here:

First, the ability to comprehend probably runs ahead of the ability to express, and certainly new ways to express are much less likely to catch on if no one understands what they mean.  Moreover, comprehension is useful in and of itself.  Whether or not my species is able to make calls that signal specific scenarios, being able to understand other species' calls is very useful (when a vervet makes a predator call, other species will take appropriate action as well), as is the ability to match up new calls with their meanings from context and examples.

In other words, the ability to understand a large vocabulary is liable to develop even without the ability to express a large vocabulary.  For a real-life example, at least some domestic dogs can understand many more human words than (as far as anyone can tell) they can produce distinct barks and similar sounds, and certainly more human words than they can themselves produce.

Second, this appears to be a very common pattern in evolution.  Abilities that are useful in one context (distinguishing the different calls of animals around you) become useful in other contexts (developing a system of specialized calls within your own species).  The general pattern is known as exaptation (or cooption, or formerly and more confusingly as pre-adaptation).

Let's suppose that the local population of some species can potentially understand, say, dozens of distinct calls (whether their own or those of other species), but its ability to produce distinct calls is limited.  If some individual comes along with the gift of being able to produce more distinct calls, then that will probably increase that individual's chances of surviving -- because its conspecifics will learn the new calls and so increase everyone's chance of survival -- and at least potentially its chances of reproducing, if only because there will be more potential mates around if fewer of them get eaten. 

If that particular individual fails to survive and reproduce, the conditions are still good for some other individual to come along with the ability to produce a bigger vocabulary, perhaps through some entirely different mechanism.  This in turn doesn't preclude some future individual from being born with the ability to produce a larger vocabulary through yet a third mechanism, or either of the original two.  If there are multiple mechanisms for doing something advantageous, the chances of it taking hold in the long run are better (I'm pretty sure, but I don't know if an actual biologist would agree.  Also, this isn't particular to vocabulary.).

If the community as a whole develops the tendency to find larger vocabularies attractive, so much the better, though the math starts to get hairy at this point.  Sexual selection is a pretty good way of driving traits to extremes -- think peacocks and male walruses -- so it's quite plausible that a species that starts to develop larger and larger vocabularies of calls could take this quite far, past the point of immediate usefulness.  You then have a population with a large vocabulary ready for an environment where it makes more of a difference.

In short, even some ability to produce distinct calls for different situations is useful, and it's no surprise many animals have it.  The ability to produce a large and expandable variety of distinct calls for different situations also looks useful, but also seems harder to evolve, considering that it's fairly rare.  Taking this a step further, we appear to be unique in our ability to produce and distinguish thousands of distinct vocabulary items, though as always there's quite a bit we still don't know about communication in other species.

It's clear that other animals can distinguish, and in some cases produce, non-trivial vocabularies, even if it's not particularly common.  How do you get from there to our as-far-as-we-know-unique abilities?  The usual answer for how complex traits evolve is "a piece at a time".

In order to find a (very hypothetical) evolutionary pathway from an extensible collection of specialized calls to what we call language today, we want to find a series of small steps that each add something useful to what's already there without requiring major restructuring.  Some of those, in no strict order except where logically necessary, might be:
  • The ability to refer to a class of things without reference to a particular instance
This is one aspect of what one might call "abstract concepts".  As such, it doesn't require any new linguistic machinery beyond the ability to make and distinguish a reasonably large set of calls (which I'll call words from here on out), but it does require a cognitive shift.  The speaker has to be able to think of, say, wolf without referring to a particular wolf trying to sneak up.  The listener has to realize that someone saying "wolf" may not be referring to a wolf currently sneaking up on them. Instead, if the speaker is pointing to a set of tracks it might mean "a wolf went here", or if pointing in a particular direction, maybe "wolves come from over there".

This may seem completely natural to us, but it's not clear who, if anyone else besides us, can do this.   Lots of animals can distinguish different types of things, but being able to classify is different from being aware that classes exist.  An apple-sorting machine can sort big from small without understanding "big" or "small".  I say "it's not clear" because devising an experiment to tell if something does or doesn't understand some aspect of abstraction is difficult, in no small part because there's a lot of room for interpretation of the results.

[Re-reading this and taking Earl's comment into account, I think I've conflated two different kinds of abstraction.  Classifying "wolf" as opposed to "a wolf" is probably more basic than I have it here.  For example, a vervet will give the call for a particular kind of predator.  It doesn't have to develop a call for each individual animal, and a good thing, because that would only work for individuals that it had seen before.  A behavioralist would probably say this is all stimulus-response based on particular characteristics -- give the leopard call in response to the smell or sound of a leopard, and so on, and fair enough.  

Classification, then, would be more a matter of connecting particular attributes -- sound, smell, shape or whatever -- to physical things with those attributes.  The progression, as Earl suggests, would be leopard smell/sound --> leopard --> the leopard that just disappeared into the trees over there.  That is, it requires more cognitive machinery to be able to conceive of leopard as any animal that smells this way and/or makes this sound and/or has pointy ears and a tail or whatever, and it requires a different piece of machinery  -- a sort of object permanence -- to conceive of an absent leopard and connect that thing that was here but isn't any more to leopard and get that leopard that was here but isn't any more -- D.H 28 Oct. 2021]
  • The ability to designate a quality such as "big" or "red" without reference to any particular thing with that quality.
This is similar to the previous item, but for adjectives rather than nouns.  From a language standpoint it's important because it implies that you can mix and match qualities and things (adjectives and nouns).  A tree can be big, a wolf can be big and a wolf can be gray without needing a separate notion of "big tree", "big wolf" and "gray wolf".  An adjective is a predicate that applies to something rather than standing alone as a noun does.

As I understand it, the widely-recognized stages of language development in humans are babbling, single words, two-word sentences and "all hell breaks loose".  A brain that can handle nouns and predicates is ready for two-word sentences consisting of a predicate and something it applies to.  This is a very significant step in communication and it appears to be quite rare, but linguistically it's nearly trivial.  A grammar to describe it has one rule and no recursion (rules that refer, directly or indirectly, to themselves).

As a practical matter, producing a two-word sentence means signifying a predicate and an object that it applies to (called an argument).  Understanding it means understanding the predicate, understanding the argument and, crucially, understanding that the predicate applies to the argument.  If you can distinguish predicates from objects, order doesn't even matter.  "Big wolf!" is just as good as "Wolf big!" or even a panicked sequence of "Wolf wolf big wolf big big wolf!" (which, to be fair, would require recursion to describe in a phrase-structure grammar).

From a functional point of view, the limiting factor to communicating such concepts is not grammar but the ability to form and understand the concepts in the first place.

Where do we go from predicate/argument sentences to something resembling what we now call language?  Some possible next steps might be
  • Predicates with more than one argument.
The important part here is that you need a way to distinguish the arguments.  In wolf big, you know that big is the predicate and wolf is the argument and that's all you need, but in see rabbit wolf, where see is the predicate and rabbit and wolf are arguments, how do we tell if the wolf sees the rabbit or the rabbit sees the wolf?  There are two solutions, given that you're limited to putting words together in some particular order

Either the order of words matters, so see rabbit wolf means one thing and see wolf rabbit means the other, or there's a way of marking words according to what role they play, so for example see wolf-at rabbit means the rabbit sees the wolf and see wolf rabbit-at means the wolf sees the rabbit.  There are lots of possible variations, and the two approaches can be combined.  Actual languages do both, in a wide variety of ways.

From a linguistic point of view, word order and inflection (ways of marking words) are the elements of syntax, which (roughly speaking) provides structure on top of a raw stream of words.  Languages apply syntax in a number of ways, allowing us to put together complex sentences such as this one, but you need the same basic tools even for simple three-word sentences.  Turning that around, if you can solve the problem of distinguishing the meaning of a predicate and two arguments, you have a significant portion of the machinery needed for more complex sentences.
  • Pronouns, that is, a way to designate a placeholder for something without saying exactly what that something is, and connect it with a specific meaning separately.
Cognitively, pronouns imply some form of memory beyond the scope of a simple sentence. Linguistically, their key property is that their meaning can be redefined on the fly.  A noun like wolf might refer to different specific wolves at different times, but it will always refer to some wolf.  A pronoun like it is much less restrained.  It could refer to any noun, depending on context.

Pronouns allow for more compact sentences, which is useful in itself since you don't have to repeat some long descriptive phrase every time you want to say something new about, say, the big red house across the street with the oak tree in the yard.  You can just say that house or just it if the context is clear enough.

More than this, though, by equating two things in separate sentences they allow linear sequences of words to describe non-linear structures, for example I see a wolf and it sees me.  By contrast, in I see a wolf and a wolf sees me it's not clear whether it's the same wolf and we don't necessarily have the circular structure of two things seeing each other.
  • The ability to stack up arbitrarily many predicates: big dogbig red dogbig red hairy dog, etc.
I left this for last because it leads into a bit of a rabbit hole concerning the role of nesting and recursion in language.  I'm going to dig into that a bit here by way of arguing that some of the analytic tools commonly used in analyzing language may not be particularly relevant to its development.  Put another way, "how did language develop" is not the same question as "how did the structures we work with in analyzing language develop".

A common analysis of phrases like big red hairy dog uses a recursive set of rules like

a noun phrase can be a noun by itself, or
a noun phrase can be an adjective followed by a noun phrase

This is much simpler than a full definition of noun phrase in a real grammar, and it's not the only way to analyze noun phrases, but it shows the recursive pattern that's often used in such an analysis.  The second definition of noun phrase refers to noun phrase recursively.  The noun phrase on the right-hand side will be smaller, since it has one less adjective, so there's no infinite regress.  The example, big red hairy dog, breaks down to big modifying red hairy dog, which breaks down to red modifying hairy dog, which breaks down to hairy modifying dog, and dog is a noun phrase by itself.  In all there are four noun phrases, one by the first rule and three by the second.

On the other hand, if you can conceive of a dog being big, red and hairy at the same time, you can just as well express this with two-word sentences and a pronoun:  dog big. it red. it hairy.  The same construction could even make sense without the pronouns: dog big. red. hairy.  Here a listener might naturally assume that red and hairy have to apply to something, and the last thing we were talking about was a dog, so the dog must be red and hairy as well as big.

This is not particularly different from someone saying I saw the movie about the duck.  Didn't like it, where the second sentence clearly means I didn't like it and you could even just say Didn't like and still be clearly understood, even if Didn't like by itself sounds a bit odd.

From a grammatical standpoint (at least for a constituency grammar) these all seem quite different.  In big red hairy dog, there's presumed to be a nested structure of noun phrases.  In dog big.  it red. it hairy you have three sentences with a simple noun-verb structure and in dog big. red. hairy. you have one two-word sentence and two fragments that aren't even sentences.

However, from the point of view of "I have some notion of predicates and arguments, and multiple predicates can apply to the same argument, now how do I put that in words?", they seem pretty similar.  In all three cases you say the argument and the predicates that apply to it and the listener understands that the predicates apply to the argument because that's what predicates do.

I started this post with the idea of exploring how language as we now know it could develop from simpler pieces such as those we can see in other animals.  The title is a nod to the question of What good is half an eye? regarding the evolution of complex eyes such as we see in several lineages, including our own and (in a different form) in cephalopods.  In that case, it turns out that there are several intermediate forms which provide an advantage even though they're not what we would call fully-formed eyes, and it's not hard to trace a plausible pathway from basic light-sensitive "eye spots" to what we and many other animals have.

The case of language seems similar.  I think the key points are
  • Cognition is crucial.  You can't express what you can't conceive of.
  • The ability to understand almost certainly runs ahead of the ability to express.
  • There are plausibly a number intermediate stages between simple calls and complex language (again, the account above is completely speculative and I don't claim to have identified the actual steps precisely or completely).
  • Full grammar, in the sense of nested structures described by recursive rules, may not be a particularly crucial step.
  • A purely grammatical analysis may even obscure the picture, both by failing to make distinctions (as with the jump from "wolf" meaning "this wolf right there" to it meaning "wolf" in the abstract) and by drawing distinctions that aren't particularly relevant (as with the various forms of big red hairy dog).