Saturday, August 10, 2013

Not in our lifetime vs. never in a million years

The great physicist Enrico Fermi once asked "Where is everybody?", by which he meant "It seems quite likely that there are other civilizations in the universe, so why haven't we seen convincing evidence?"

Without going into detail, I agree with Fermi that there's no convincing evidence that there are other civilizations in the universe.  However, despite the lack of smoking-gun evidence, I'm pretty well convinced there is life elsewhere in the universe, even in our own galaxy.  It seems reasonably likely that there is life within our neighborhood, and not out of the question that there is some form of life elsewhere in our solar system.

I also think it's pretty likely that there are intelligent civilizations (leaving aside exactly what that means) in our galaxy, and almost inevitable that there are such civilizations somewhere in the universe besides here.  So again, why haven't we heard from them?

When we use terms like "in our neighborhood", it's easy to forget that, when talking about astronomy, "neighborhood" is a very relative concept.  Here, I'll use "in our neighborhood" to mean "within 50 light-years, give or take a few percent".  That's close enough that we could send a signal and get a response in something on the order of a human lifetime.  It's also vastly farther than we have ever travelled, or could hope to travel with any kind of technology we know.  Within this 50 light-year radius there are about 2,000 stars.

Compared to our galaxy, this is a pretty cozy little corner.  Our galaxy is much bigger, on the order of 100,000 light-years with hundreds of billions of stars.  The observable universe is much, much bigger still, with some hundreds of billions of galaxies, depending on how you define "galaxy" and "the universe", each with a huge number of stars.

Summarizing: If you talk about "the galaxy", you are talking about on the order of a hundred million times more stars than our neighborhood, and if you're talking about "the universe" you're talking about on the order of a hundred billion times more stars than the galaxy, or on the order of ten quintillion times more stars than our neighborhood.  If only one in a million stars harbors an intelligent civilization, then there are almost certainly no others in our neighborhood, but some hundreds in the galaxy, and tens of trillions in the universe.



That one in a million figure is just for the sake of illustration.  At this point, we really don't know how likely or unlikely life is, if only because we don't have a lot of data points to go on.  We know for sure there's life on Earth.  It looks pretty unlikely that there's life on the Moon, or Mercury.  If there's life on the surface of Venus, it's got to be pretty bad-ass, but the best guess is probably not.  The jury's still out on Mars; we're pretty sure it had liquid water, but not at all sure either way about the life part.  Quite possibly there used to be but isn't any more.

There are a couple of other possibilities.  Jupiter's moon Europa probably has considerably more liquid water than we do, Saturn's moon Enceladus appears to have a large subsurface ocean, and Saturn's moon Titan has a dense atmosphere and pools of liquid.  Methane and ethane, that is, at a temperature of about -180C (-292F).  Could life develop in either of those environments?  We really don't know, but, maybe.  Certainly not a definite "no way", particularly since we've discovered life on earth in all kinds of extremely harsh environments where we used to think life had no business being.

It's also possible that there is some form of life in the clouds of the gas giants or floating in Venus's thick atmosphere, or on some less likely-looking moon than Enceladus, Europa or Titan, but at this point, Mars, Enceladus, Europa and Titan look like the best bets.

By that reckoning we have, in our solar system, one place that definitely has life and four others that plausibly might have, or might have had.  From what we know, our sun is not a particularly unusual star for our purposes here.  There are plenty of other main sequence stars of similar mass and age, and from recent discoveries, it looks like there are plenty of planets outside our solar system.  There are also plenty of stars not like our sun, but still with planets that might plausibly hold life.

Again, we don't know what the real odds are, but we can try to break things down more finely.  We might consider that any planet or moon with a large amount of liquid water is "favorable to life".  In our solar system, that would mean us, Europa or Enceladus (so far ... the jury is still out on Jupiter's moons Ganymede and Callisto).  Before too long we might have a good guess at how common such situations are.  For the sake of the argument, let's say that one in ten stars has such places.  Likewise, we could guess that there's a 50% chance that a place favorable to life actually develops life.  So that's one star in 50, or about 40 in our neighborhood.   And so forth.

This exercise of taking wild guesses at probabilities and multiplying them together goes by the formal name of the Drake equation (though it probably originates with Fermi).  Writing a formal equation doesn't reduce the wide error bars on our guesses about, say, how likely life is to develop or what portion of planets have favorable conditions, but it does give a well-defined framework for talking about such things.   That's helpful, but if you hear a statement like "According to the Drake equation there are N other technological civilizations in our galaxy," whether N is zero or a million ... um, no.  All that means is that under someone's particular set of guesses, there would be N technological civilizations.

You could make a reasonable argument that we're not just guessing at the numbers to plug into the Drake equation, we're guessing about what some of the terms even mean.  What is life, after all?  Does "technology" mean essentially the same thing for all possible kinds of life?



I suppose at some point I should explain the title of this post.  Why not start now?

Suppose that there's a planet fifty light-years away orbiting a star identical to the sun and with the exact same history and technology as us.  Could we detect signs of intelligent life from it?

The Earth (and so, therefore, Twin Earth) has been pumping out radio signals for about a century.  This means it's at least physically possible that we could pick up Twin Earth's broadcast signals from fifty light-years away.  Right now we would be picking up Twin Earth radio and TV shows from the early 1960s.

Before getting too excited about that, keep in mind that Twin Earth's radio signal is going to be very, very faint at that distance and right next to a much brighter radio source, namely Twin Sun (which is still pretty faint compared to most things we can pick up with radio telescopes).  Radio telescopes, even the really big lots-of-dishes-hooked-together kind, have considerably lower resolution than optical telescopes, and as far as I know we're not even close to being able to distinguish Twin Earth from Twin Sun in the radio frequencies, even if they were similarly bright.

Maybe we could, with a few more advances in technology and after careful observations, figure out that something unusual was going on around Twin Sun, but we're not just going to point a radio dish at Twin Earth and tune in to The Beverly Hillbillies.  Which may be just as well.

At this stage in our development, we are at the beginning of being able to contemplate detecting something like a civilization similar to ours orbiting a star in our immediate neighborhood.  Our telescopes (optical and radio) will doubtless improve, and we'll figure out ways of squeezing more and more information of of the signals they provide, but at the same time something else is going on: Earth, and therefore Twin Earth, is liable to go dark.

I'm not talking about civilization ending in the near future, or humanity morphing into some sort of cyber-species with no need for physical bodies.  Whatever you think the odds of those things may be, we're probably not going to spend too much more time spewing radio waves into empty space simply because it's wasteful.  Ultimately, it reduces bandwidth.  Even now, we can listen to the radio and watch TV over land-based connections.  That's probably just going to get more and more prevalent.  There's a good chance that through sheer technological progress we'll stop sending out whatever faint signal we've been sending.

So say that we, and thus Twin Earth, spend about 200 years sending out a radio signal indicating intelligence.  There are other ways we might detect Twin Earth and deduce that it has life, but only through a structured signal like our radio transmissions would it be clear that it was intelligent life -- OK, we could also look for Dyson spheres and such, but let's not go there just now.  It's also possible that Twin Earth could decide to deliberately send out a signal, permanently, to every star in its neighborhood, after it stops using high-powered radio broadcasts.  But "permanently" to creatures such as us is "momentarily" on planetary time scales.

To get a feel for what that last statement means, suppose that Twin Earth is like us in every way, except that it formed just a little bit earlier or later.  Say a tenth of a percent earlier or later.  Earth is about 4.54 billion years old.  A tenth of a percent of that is 4.54 million years.  If Twin Earth formed a tenth of a percent earlier than us, then we're several million years too late to pick up the brief flash of detectable signals of intelligent life it put out.  If it formed a tenth of a percent later, we won't have a chance to detect its hairless, tool-making social primates for millions of years yet.

This is why Drake's equation has a factor for how long we guess that an intelligent civilization would put out a detectable signal.  Obviously, the Twin Earth scenario is a gross simplification compared to the possibilities of life in the universe, but I think it sheds some light on the Fermi paradox.  There's a decent chance that everybody's out there, or will be at some point in the future, but we just plain missed them or they're not even here yet.



Electromagnetic radiation, which is all we currently know how to detect from other star systems, follows the inverse square law.  Twice as far away, a signal is four times fainter, three times farther away, nine times fainter, and so on.  That means that a star system 20 light-years away would have to produce a signal four times as strong as one 10 light-years away in order to be equally detectable.

Earth has has broadcast some sort of radio signal into space for the past hundred years or so, but at first that signal was very weak -- just a single small transmitter.  Eventually it grew, and quite likely it will eventually fade out.  The amount of time we've spent transmitting our brightest signal is shorter than the time we've spent broadcasting half that signal, and so forth.

Just so, the amount of time that a given civilization spends putting out a signal that we could detect decreases with distance.  At some point, probably well within our galaxy, it becomes effectively zero.  There could be civilizations 1,000 light-years away (again, the Milky Way is about 100,000 light-years across) that never have and never will put out a signal bright enough for us to detect.

In short, the search for extraterrestrial intelligent life probably boils down to watching a few thousand star systems in our immediate neighborhood for signatures of intelligence.  It seems quite plausible that some number of those planets harbor life, and that of those, a not-too-much smaller number of those have developed or will develop intelligent life and at some point in their histories, and for a brief time, put out something we could detect.

If that's the case, then some portion of them have already passed their detectable phase.  Maybe there are a dozen out there that have yet to announce themselves.  As always, a wild guess.  There might be none at all.  There might be hundreds, but probably not much more -- there are only so many stars close enough.  If there are such a dozen, and we can keep listening, we'll eventually spot one.  But "eventually" here likely means millions of years, not decades.

Are we likely to detect signals from civilizations around other stars?  Not in our lifetime, I'd say.  But some time in the next million years?  Maybe.



Postscript: I forget where I saw this mentioned, but another problem is that as our radio communications get more and more efficient, the signal we put out gets hard and harder to tell from noise.  Faint as it may be, a morse code radiotelegraph signal is clearly non-random and statistically unlike anything known to be naturally produced.  A compressed digital television signal looks much like random noise, which can come from any number of sources.  Mix together all the digital television signals currently broadcast on earth and you get something even more like noise.  Probably the only way we could detect a signal from Twin Earth and tell that it was a signal from something intelligent, would be for them to be sending a signal directly at us.  But if they're like us, they're only beaming signals to a few stars, and for relatively short periods of time.

Post-postscript: Randall Munroe's What If makes most of the same points as here, with a lot fewer words (and a couple of pictures).

Wednesday, August 7, 2013

Colonies and organisms

Is an ant colony an organism?  Strictly speaking, no.  Individual ants are organisms.  An ant colony is ... something else, a something else called a "colony" or "superorganism" or some similar term.

Why even ask?  My purpose here is to try to pin down what "colony" and "organism" might mean.  As with most terms, there are quite a few choices, once you start looking.

If someone speaks of, say, a city as an organism, there's a strong element of metaphor.  Yes, a city can be said to collectively eat, and breathe, and even make decisions, but a city isn't actually an organism.  It just has enough of the features of one to make for interesting comparisons and analogies.

On the other end, there are stands of aspen (and other plant species) that appear to be individual organisms, but are actually connected by a common root system and are genetically identical.  Technically, this is a clonal colony, but we would generally think of each tree as an individual organism.

We might similarly think of a cluster of mushrooms as consisting of several organisms, but in fact mushrooms are just reproductive organs.  It's the mycelium, a web of root-like structures in the soil, that carries on the day-to-day activities of a fungus, whether or not any mushrooms are evident.  Since mushrooms are temporary structures, analogous to flowers on plants, and don't survive on their own, it seems reasonable to think of the mycelium and any attached mushrooms taken together as an organism.

On the other hand, trees are permanent structures and it's normal (depending on the species) to find a tree living independently, or next to other trees of the same species that aren't genetically identical.  This probably makes it less intuitive to say that a stand of aspen is a single organism, so we hedge and say clonal colony.

Banyan trees are an interesting case.  As their branches spread, they drop aerial roots, which eventually grow into the soil and support the further spread of the branches.  Banyans can grow to cover several hectares (or several acres, if you prefer).  Since everything is connected in plain sight, it's easy to speak of a single large tree, even though it may not be immediately obvious that all the "trunks" in what might seem to be a grove of youngish trees are actually roots of a single tree.  If the aerial roots and the low branches they drop from were below the ground, though, would it then be a clonal colony?

To a large extent this is just a mater of nomenclature.  What matters more is whether the pieces are connected or not, and whether they are genetically identical or not.  All four combinations are possible:
  • A banyan tree or aspen grove is connected, and the parts are genetically identical
  • The trees in an apple orchard are separate but genetically identical.  That is, they are clones (strictly speaking they collectively make up a clone -- we've been genetically engineering plants for millennia).  It's also possible for a clonal colony like an aspen grove to be split into disconnected parts.
  • Lichen -- which Wikipedia calls a "compound organism" -- is a symbiosis of a fungus and a photosynthetic partner, generally either an alga or a cyanobacterium.  They are physically intertwined and the one could not survive without the other, but they are quite different genetically.
  • Typical stands of forest consist of physically and genetically distinct trees, and this is the normal pattern for plants and animals that we distinguish as individuals.
Where does that leave our ant colony?  Clearly ants are physically distinct.  Genetically, the picture is a bit more complex.  Ants, along with other hymenoptera and a few other species, are haplodiploid.  Males carry only one set of chromosomes, rather than the usual two, while females carry both, because males develop from unfertilized eggs.  Further, the queen of a colony generally mates with only one male over a given time period, and only one female in a colony (the queen) is fertile (or at least only a small portion of females are fertile).  This has a number of interesting consequences:
  • A male gets 100% of his genes from his mother
  • A male has no father and cannot have sons, but does have a grandfather and can have grandsons (This one is worth working through in slow motion.  All the clues are in the paragraph above)
  • A female gets 50% of her genes from her mother and 50% from her father, as usual, but has 75% of her genes from the same source as her sisters and only 25% from the same source as her brothers.
  • Lethal and highly harmful genes get weeded out quickly, since they'll kill off the males that carry them.  With only one set of chromosomes, there's no place to hide.
"From the same source" is distinct from "the same".  If the mother and father carry the same version of a gene -- the same allele -- then it doesn't matter which source it comes from.  But if (to take a human example) mom has blond hair and dad has brown hair but a blond mother, then on average half the kids will have mom's blond hair, with a blond gene from both parents, and half will have dad's brown hair, with a blond gene from mom.  They all have dad as the source of one of their sets of hair genes, but they don't all have the same hair genes from dad.

Selection cares about the variations, so it will tend to act the same on genetically identical individuals, and more and more differently on less related individuals.  Workers in an ant colony are much closer to identical than ordinary siblings.  This probably helps explain why ants and related species tend to be eusocial, that is, so socially cooperative that individuals will routinely act against their direct self-interest.

In particular, eusocial species typically have entire castes of sterile individuals.  This makes no sense in the narrow sense of individuals competing to pass on genes, but more sense when you look at the overall picture of which genes are liable to survive.  It's not as simple as a sterile soldier ant dying to save two of her sisters, though.  If the sisters are also sterile, this makes no direct difference to which genes ultimately survive.

Probably being 75% related to one's sister makes it more likely that an altruistic behavior will take hold.  That is, an instinct to protect the queen and eggs is more likely to work if one's relatives in the colony share it.  Seems plausible, but the details are complex, and I haven't looked up what real biologists have to say on the topic.  The question here is: If a fertile female has a large number of offspring, significantly more closely related than normal siblings, under what conditions are the queen's genes (and her consort's) more likely to be passed on by children who mostly forego reproducing in favor of one or a few fertile siblings, as opposed to by children who look after themselves?

In any case, haplodiploid genetics don't explain naked mole rats, which are genetically normal rodents, but eusocial nonetheless.   But there can be multiple causes for the same effect.  Naked mole rats are the only known eusocial mammals (or non-insects, I believe).  Perhaps they just happened to be the one diploid organism that developed eusocial behavior far enough for it to remain stable.

Besides being head-hurtingly counter-intuitive to reason about, haplodiploidy, or anything that tends to make behavior more uniform and focused on protecting a small group of fertile individuals and their eggs, tends to make the group look less like a bunch of individuals and more like a single organism.  And I think that's probably where we have to leave the original question.  An ant colony is just that: a colony of individuals which, collectively, has some qualities analogous to those of an organism, and has more of those qualities than groups in many other species.  It is not, however, an organism per se.



But just what is an organism?  In particular, what is a multi-celled organism?  Leaving aside the question of the microbiome -- the microbes living on and inside us that are nearly as different from us genetically as can be, and collectively outnumber our own cells handily -- a multicellular organism is a collection of individual cells, genetically identical (with exceptions like the germ cells -- sperm and egg -- which have a single set of chromosomes instead of a pair).

Most individual cells have specialized roles, and most of these cells are limited reproductively.  In most cases they can divide and reproduce, but not without limit, or at least not in a healthy organism.  Real reproduction, at least in sexually-reproducing organisms. is handled by a small set of germ cells which the other cells, it may be said, act to protect.

You don't have to squint very hard to see this as similar to the case of a eusocial colony.  To be sure, there are some important differences.  Cells in a multicellular organism are basically 100% related.  They are generally unable to survive on their own for any significant length of time.  They tend to reproduce in a fairly well-established pattern.  That is, the organism grows coherently, and consistently from generation to generation.

Should we consider a multi-celled organism really to be a colony of one-celled organisms?  Well, that's one way to look at it, but because those cells act so coherently and consistently, and because they're simple units (Shh!  Don't mention mitochondria and other organelles!), and they're not viable on their own, and I'm sure for a number of other reasons, it's not useful to push this too far, much less claim that's "really" what's going on.

Nonetheless, I think it's still a useful comparison to study.

Saturday, April 6, 2013

Big Numbers

Warning: This post is fairly long and contains a lot more mathematical notation and jargon than I generally like to include, even when the topic is mathematical.  In this particular case it seems pretty near unavoidable, so I'm hoping that the extra work to the reader is worth it.

A wise man once said of numbers "There are too many of them" (or words to that effect).  To which I'd add "and most of them are too big."

In our minds, we have some concept of numbers beyond merely seeing two apples and two oranges and knowing there is one apple for each orange and vice versa.  If I showed you a pile of coins, say, and asked if there were a dozen or so, a hundred or so, or a thousand or so in the pile, you could probably make a reasonable guess as to which was the case, even without knowing the exact number.  Along with a concept of numbers, we have a rough concept of their size.  Which makes sense.

Mathematicians have made the concept of number rigorous in a variety of ways.  I've described some of them, particularly the natural numbers, in a previous post on counting.  The natural numbers are 0, 1, 2 and so forth on and on forever.  They answer the question "How many (of some kind of discrete object)?", which requires a whole number that may be zero but can't be negative.  They generally can't answer "How much (of some substance)?", which need not come out to a whole number of whatever unit you're using, or "What was my score for the first nine holes?" which will be a whole number of strokes but might be negative.

In some sense, the naturals the are the simplest kinds of numbers.  All other numbers can be defined in terms of them.  They certainly seem friendly and familiar at first blush.  I aim here to show that the natural numbers we're comfortable dealing with -- 0, 1, 2, 42, 1024, 14 trillion or whatever -- are an insignificant part of the mathematical picture.  Almost all numbers are far, far to big for us to comprehend in any meaningful way.

Big numbers for puny humans

Let's start with human-sized big numbers.  A thousand might seem like a big number, but if you think about it, it's pretty small, even in puny human terms:
  • You can count to 1000 in a few minutes
  • Crowds of a thousand are commonplace
  • You've probably met more than a thousand people in your life
  • Communities of a thousand or more are commonplace
  • If you can read this, you're almost certainly more than a thousand days old
  • This post has over a thousand words
and so forth.

A million is big, but still not all that big.  It is, though, probably the biggest round number that most of us can relate to directly, even if it may take a little effort.
  • If you spray a reasonably large picture window with water, there are likely a million or so droplets visible on it.
  • You can see see an individual pixel on a megapixel display.
  • Crowds of a million people or more have gathered on many occasions.
  • Many people, though certainly not most, will make $1 million in their lifetimes.
  • $1 million in $100 bills doesn't take a lot of space.  Even in dollar bills, it will fit in one room of a house.
  • Many people have practiced some basic skill a million times.  For example, a typical professional basketball player has almost certainly made a million baskets (counting practice shots); I once met a baseball scout who quite plausibly claimed to have driven a million miles; I've probably written millions of words, all told.
A billion is probably not readily comprehensible, but examples are not rare
  • Human population is a few billion.  Two countries have populations over a billion.
  • A billion dollars is about ten dollars per US household.
  • RAM is currently measured in gigabytes.
A trillion is probably the biggest with well-known commonplace examples
  • Large economies are measured in trillions of dollars.
  • You can buy terabyte disks at the store.
  • You have trillions of cells in your body (and even more bacterial cells).
It's perfectly possible to distinguish a million from billion from trillion, but it requires conscious computation ... oh, a gigabyte of disk will hold a thousand photos of a megabyte each.

A trillion, probably even a billion, is beyond the human scale.  We can speak of trillions of cells or bacteria in the human body, but these are abstractions.  No one has actually counted them, whereas a moderate team of people could count millions of objects (as happens during elections, for example).

Astronomical numbers

The physical universe goes well beyond this scale.  There's a reason we talk of "astronomical" numbers.
  • The observable universe (using "comoving coordinates") is around 1,000,000,000,000,000,000,000,000,000 meters in diameter (one octillion, in US terms as I'll use here, or one thousand quadrillion in "long scale" terms).  When dealing with numbers this big, we generally just count the digits, and say, for example, 1027.
  • There are 6×1023 (Avogadro's number) atoms in a gram of hydrogen (assuming it's purely the light isotope, which it won't be).
  • There are somewhere around 1057 atoms in the sun
  • The smallest distance that can possibly be measured, even in principle, according to quantum theory, is called the Planck length (after Max Planck).  In practice, no one has come even remotely close to directly measuring anything at that small a scale.  The volume of the observable universe in Planck volumes (cubic Planck lengths), that is, the biggest volume we know of measured in the smallest measurable volume units, is on the order of 10184.
Consider that last number.  Ten is a nice, familiar number.  One hundred eighty-four is not intimidating.  So 10184 can't be that big a deal, right?  Well, let's try to put it in terms we can understand.  I've claimed a million is about as big a number as we can really grasp, but maybe we can build up from there.  I have well over a thousand digital pictures, and on average there are a few million pixels in each, so it's not too hard to imagine what a billion pixels would look like.  You can find pictures of a million people gathering, so imagine that each one of them has a similar image collection.  That's a quadrillion pixels -- a thousand million million.  Not bad.

With a little more thought, one could probably put together images for, say, a million trillion or a billion billion, but even my image for a billion pixels is stretching.  I can plausibly say that if I'm looking at a computer monitor I can imagine that each of the pixels is its own unit (it's probably not coincidence that the human eye has megapixel resolution, more or less).  If I'm imagining a thousand photos, though, I'm not imagining each of their pixels as a separate unit.  The unit is photos, not pixels.  At best there's the implication that I could mentally switch scales and think of pixels, at which point the photos fade into abstraction.

If I'm imagining a crowd of a million people with a thousand megapixel photos each, it's harder to argue that I'm really keeping the whole image in my head.  The technique of imagining collections of collections of collections gets less and less useful as we approach the limits of human short-term memory, typically seven or so items.  Maybe, maybe, a person could claim to comprehend a million million million million million million million things.  Maybe.

But 10184 is ten thousand million million million million million million million million million million million million million million million million million million million million million million million million million million million million million million.


Now we're ready to talk about some big numbers.

Archimedes' Sand Reckoner

In The Sand Reckoner, Archimedes set out to do the same kind of measurement as above, of the biggest  known volume in the smallest units.  He wanted to measure the number of sand grains it would take to fill the universe as he understood it.  This was a universe with the Sun (not the Earth) at the center, and the fixed stars in a sphere far enough out that they did not seem to move as the Earth went around the Sun.  In modern terms, it was about a light-year in radius, not too shabby coming from someone from a world with no telescopes or motorized transportation.

The number system in use at the time could count up to myriad, or 10,000.  Archimedes called the numbers from 1 to myriad the first numbers and called myriad the unit of the first numbers.  The second numbers were myriad, two myriads, three myriads and so on up to a myriad myriads (one hundred million), which he called the unit of the second numbers.  Likewise, the unit of the third numbers was that many myriads (one trillion), and so on up to the unit of the myriadth numbers.

This is a decent-sized number.  It's a one with 800 million zeroes after it, much bigger than the size of the modern universe in Planck volumes.  But Archimedes went further yet.  He called the numbers up to this number the first period, and the number itself the unit of the first period.  From that, of course, you can define the second period, and so on.  Archimedes went on to the myriad-myriadth period, ending with 108×1016, or a one with 80 quadrillion zeroes after it.  That's a huge number, but you could still print it if you had, say, enough paper to cover the surface of the Earth (and a printer to match).

As it turns out, this was overkill.  Archimedes calculated the volume of his universe in sand grains to be  1063, a number so small I can write it right here:

1,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000

Archimedes was trying to make a point.  The grains of sand are not beyond reckoning, even if no one person could possibly count them.  They are not infinite in number.

Building big numbers

To build his number system, Archimedes used a simple but powerful approach:
  • Start with some set of numbers
  • Define a rule for making bigger numbers
  • Apply that rule repeatedly to get a larger set of numbers
  • Repeat, using the new, larger set as a starting point
A slightly different way of looking at this is
  • Start with some set of numbers
  • Define a rule for making bigger numbers
  • Define a new rule, namely applying the first rule repeatedly, to get a second rule
  • Define a third rule from the second rule, and so on, repeatedly
For example, start with the numbers from 1 to N, and the simple rule of adding one
  • Starting with N and adding 1 N times gives you N + N, so the new rule is "add N"
  • Starting with N and adding N N times gives you N x N, so the new rule is "multiply by N"
  • Starting with N and multiplying by N N times gives you NN, so the new rule is "take to the Nth power"
There are various names and notations for what happens next, repeatedly taking to the Nth power, but it will get you very big numbers from small numbers very quickly.  You can actually do this two different ways:
  • First take NN, then take that to the N, and so forth. If N is 3, you get (33)3, which is 273, or 19683
  • Take NN, and then take N to that power, and so forth. If N is 3, then you get 333, which is 327, which is 7625597484987
Since the second way gets bigger numbers faster, let's do it that way.

4(44) is 

13,407,807,929,942,597,099,574,024,998,205,846,127,479,365,820,592,393,377,723,561,443,721,764,030,073,546,976,801,874,298,166,903,427,690,031,858,186,486,050,853,753,882,811,946,569,946,433,649,006,084,096

That's big, but 4444 is 4 times itself that many times. The number of digits in the result is itself a number with 154 digits.  This is already much, much, much bigger than Archimedes' big number.  Much too big to write down, even if we used the universe for paper and subatomic particles for ink.  And this is just from applying the third rule on the list to a small number.

Let's not even talk about 55555, or trying to do the same thing with myriad.

[In an earlier version, I'd taken the first option.  It still ended up with big numbers, but not as dramatically.  (44)4 is 2564, or 4,294,967,296, rather than the monster above, and 4,294,967,2964 is 340,282,366,920,938,463,463,374,607,431,768,211,456.  This is astronomically large, but not too-big-to-fit-in-the-known-universe large.  Even (((55)5)5)5 has only 437 digits --D.H. Mar 2022]

But of course, you can repeat the rule building process itself as much as you like.  What if I start with 4444 -- let's call it Fred -- and apply the Fredth rule to it.  Call that number Barney.  Now apply the Barneyth rule to Barney.  And so on.  Just as you can always add one to a number to get a bigger one, you can repeat any hairy big-number-building process some hairy big number of times to get an unimaginably hairier big-number-building process.  Or rather, you can define a process for defining processes and so forth.  You could never, ever come close to actually writing out that process.

(If all this seems similar to the process for building ordinal numbers that I described in the counting post, that's because it is similar.)

Ackermann's function

Ackermann's function, generally denoted A(m,n), boils this whole assembly down to three very simple rules which can generate mind-bogglingly big numbers much more quickly than what I've described so far.  Since we're in full-on math mode in this post, here's the usual definition (other variants are also used, but this one is as monstrous as any of them):

A(m, n) =
  • n + 1, if m = 0
  • A(m - 1, 1), if m > 0 and n = 0
  • A(m - 1, A(m, n - 1)) otherwise
We're just adding and subtracting one.  How bad can it be?

Ackermann's function normally takes two numbers and produces a number from them, but you can easily define A'(n) = A(n,n).  A'(4) is 22265536 - 3, which is vastly bigger than any number I've mentioned so far in this post except for Barney.  Howard Friedman (more from him below) has this to say about A (I've modified the notation slightly to match what's in this post):
I submit that A'(4) is a ridiculously large number, but it is not an incomprehensibly large number. One can imagine a tower of 2’s [that is, a tower of exponents] of a large height, where that height is 65,536, and 65,536 is not ridiculously large.
However, if we go much further, then a profound level of incomprehensibility emerges. The definitions are not incomprehensible, but the largeness is incomprehensible. These higher levels of largeness blur, where one is unable to sense one level of largeness from another.
For instance, A(4, 5) is an exponential tower of 2’s of height A'(4).  It seems safe to assert that, say, A'(5) = A(5, 5) is incomprehensibly large. We propose this number as a sort of benchmark. 
In other words, A'(4) is, as I've argued, far, far too big to comprehend, calculate fully or even write down, but at least we can more or less understand in principle how it could be constructed.  The recipe for constructing A'(5) contains so many levels of repetition that we can't even really understand how to construct it, much less the final  result.

Combinatorial explosions

Everything I've mentioned so far is constructive.  That is, each number is defined by stating exactly how to construct it from smaller numbers by some set of operations, ultimately by starting with zero and adding one repeatedly.  It's also possible to specify numbers non-constructively, that is, without saying exactly how one might construct them.

The field of combinatorics is particularly notorious for defining huge numbers.  Combinatorics deals with questions such as enumerating objects with some particular set of properties.  A simple example would be "How many ways can two six-sided dice show seven spots?" For bonus points, list them exactly, or at least describe their general form.  In this case, it's easy.  There are six, assuming you can tell the two dice apart: {(1,6), (2,5), (3,4), (4,3), (5,2), (6,1)}.

A slightly different kind of combinatorial question is "How large can a collection of objects subject to some particular constraint get before it has to have some property?"  For example, how many natural numbers less than 5 can you put in a set before there must be two numbers in the set that differ by 3?  So {0, 1} is fine but {1, 4} isn't since 4 - 1 = 3.  The answer in this case is 3: {0, 1, 2}, has no members that differ by three, but any larger set must.  In this case, we can easily check because there are only five sets of four natural numbers less than five.  For a larger version, say the same question but for the numbers less than 1000, there's no way to check all the combinations and you'll need a real proof.

Friedman gives a somewhat more involved example that produces large numbers astonishingly quickly.  Consider strings of letters from a given alphabet.  A string of length n has what I'll call Friedman's property if, for at least one i and j up to n/2, with i < j, the portion of the string from positions i to 2i appears in the portion from j to 2j.  In other words, cut the string into overlapping portions:
  • The first and second letters (two total)
  • The second through fourth letters (three total)
  • The third through sixth letters (four total) ...
  • ... and so forth, starting one letter later and going one letter longer each time
Friedman's property says at least one of those is contained in at least one of the later ones, and the question is, how long does a string have to be before you know it must have this property (for whatever reason, Friedman actually phrases the question as how long can a string be and not have this property, but this way seems clearer and is more in line with Ramsey Theory in general).

If the alphabet in question is just one letter, say a, then it's a simple problem:
  • Start with aa.  That's the one and only string with two letters, using our alphabet, and there is only one portion to look at (the whole string).
  • Add another a to get aaa.  There's still only the one portion, so we're still good.
  • Add another a to get aaaa.  Now there are two portions to look at 1-2 (aa) and 2-4 (aaa).  The first one is contained in the second, so we're done.  Any four-letter string (that is, the one and only four-letter string), using one letter, must have Friedman's property.
If the alphabet is just the two letters a and b, then
  • ababba, for example, has the property, because the portion from position 1 to 2 (ab) appears in the portion from 2 to 4 (bab)
  • (taking an example with three letters) aabcabbca also has the property, because abc from positions 2 to 4 appears in abbca from positions 5 to 9.  The letters in the first string don't have to be consecutive in the second one, but they do have to all appear in order.
  • abbbaaaaaaa on the other hand, does not have Friedman's property: ab doesn't appear in bbb, bbaa, baaaa or aaaaaa, and more generally, none of those portions appears in any of the ones after it.
  • If you add either an a or a b to the end of abbbaaaaaaa, though, the result has to have Friedman's property.  If you add an a then aaaaaa (positions 5-10) appears in aaaaaaa (positions 6-12).  If you add a b, then ab (positions 1-2) appears in aaaaaab (positions 6-12).
  • With some more fiddling (at the worst, trying out all 4096 12-letter strings), you can determine that any string, using two letters, 12 or more letters long has Friedman's property.
What if we use three letters, say a, b and c?  You need a somewhat longer string before you can no longer avoid Friedman's property.  How long?  Friedman finds a lower bound (the real answer may be bigger), of A(7198, 158386).  Recall that we've already established that A(5,5) is deeply incomprehensible.  Adding one to either parameter of A just kicks things up another unfathomable notch, and here we're doing so hundreds of thousands of times.

For four letters, Friedman gives a lower bound obtained by applying A' over and over again, starting with A'(1) = 3.  How many times?  A'(187196) times.

Friedman goes on to define a similar construction on trees (groups of objects with parent/child/sibling relationships, more or less like family trees but without fun stuff like double cousins).  Again, things start small but then go completely haywire.  If n is the number of letters you can use in labeling the objects in a tree, and Tree(n) is the largest sequence of trees you can have (subject to a couple of restrictions) before there must be at least one tree that's contained in a later one, then
  • Tree(1) = 1
  • Tree(2) = 3
  • Tree(3) is much, much larger than the result we got above for strings using three letters.  It's much, much larger than the result above for four letters.  It's so large that you have to dive deeply into the foundations of mathematics to be able even to describe it.  Maybe you can.  I can't.
And, of course, once you've got an out-of-control function like Tree, you can use it as grist for the constructive mill.  What's A'(Tree(Tree(Barney))) taken to its own power Fred times?  Can't tell you, but it's certainly big, and we can keep this tomfoolery up all day.

(How do we know that there is a limit on how big a set of strings or trees can get before it has to have Friedman's property?  Friedman gives a proof.  It's a pretty standard proof, but I didn't understand it and didn't take the time to dive into and figure it out.)


It's all just so ... big

As Friedman says, such gigantic numbers (whether defined constructively or non-constructively) are not at all like the numbers we're familiar with.  I can easily tell you that 493 has a remainder of one when divided by three (4+9+3 = 16, 1+6 = 7, 7 divided by three has one left over).  I can't easily tell you whether Fred has a remainder of one or two.  I know there will be a remainder, since at no point in constructing it did we multiply by three, but to figure out which remainder we'd have to carefully examine the definition of Fred and track the remainders all the way through.

Likewise I know that 17 is a prime number and 39 isn't.  With a little effort I can figure out that 14,351 isn't prime (it's 113x127), but is A'(4) - Fred a prime number?  No idea.  Probably not, as the odds of any number that size being prime are very small (though not quite zero), but I have no idea how to go about proving it.

For that matter, it's not always immediately obvious whether one huge number is larger than another.  If they were constructed by different means -- say, one uses Ackermann's function and another came out stacking up exponents some particular way and repeatedly applying three different big-making functions in turn, then there may be no feasible way to compare them.  It may even be possible to prove that the number of steps needed to make the comparison would itself be huge.


We're used to simplifying numbers when we do calculations with them.  If I tell you a room is four meters by three meters with a one meter by two meter closet cut out of it, you can easily tell me the area of the room is ten square meters.  If I did a similar calculation with four arbitrary huge numbers, quite likely all anyone can say is the answer is ab - cd, if a, b, c and d are the numbers in question.

Nonetheless, such numbers are just as mathematically real as any others.  If you make a claim about "all natural numbers", you're not just saying it's true for numbers you can easily manipulate.  You're saying it's true for really big ones as well, and keep in mind that for every huge number we can describe there are vastly many more of similar size that we have no hope of ever describing.  Fermat's last theorem doesn't just say there are no cases where a3 + b3 = c3.  It says that there are no cases where aBarney + bBarney = cBarney, and so forth for every ridiculous number I've mentioned, and all the others as well.

It's also important to point out that there is nothing particularly special about most of these numbers, unless, like Tree(3), they are the solution to some specific problem.  The number of numbers we know to be special for one reason or another is limited by our human capacity to designate things as special, which runs out long before we get to the astronomical realm, to say nothing of the territory Friedman explores.

If a number is too big to relate to our everyday experience, or to compare with other numbers, or to comprehend how it might be constructed, about the only thing we can really say about it is that it's a number, and it's big.

And that's all we can say about almost all numbers.

Tuesday, January 29, 2013

We're #1. So what?

On the radio today I heard that a certain statistic was at its highest (or lowest) level in seventeen months.  Certainly sounds impressive, but what does it mean?  Without having followed the history of the statistic, I'd have know way of knowing.

For example, if it's 100 now, and it was 99 seventeen months ago and 98 for the other months (including last month), it may not mean much at all.  On the other hand, if the sequence had been more like 99, 82, 64, 57, 43, 51, 46 ... 54, 47, 100, that jump from 47 to 100 might be very significant, particularly if the original fall from 99 to the 40s and 50s had been significant.


Suppose I'm part of a community of gamers in which each gamer has a numerical rating.  Last month I had the 1523rd-highest rating.  This month I'm 1209th.  I've just rocketed 314 places up the rankings. Pretty awesome, huh?

Well, maybe.  Suppose there are  704 people with a rating of 98, 313 people with a rating of 99 and 1208 people with higher ratings.  The top rating is 106.  Last month my rating was 98, so I was one of the 704 tied for 1523rd - 2226th.  This month, by virtue of a one-point improvement, I'm now one of the proud 313 tied for 1209th - 1522nd.  Last month I was good, though not quite as good as the best.  This month I got a little closer to the top.  Maybe not so impressive.

On the other hand, suppose there are three million or so players.  Most of them have fairly unremarkable ratings, but once you get to the top ranks the scores start to increase dramatically.  The 1523rd best ranking is 12,096, the 1209th is 451,903 and the top player has an unbelievable 75,419,223.  I've made really amazing strides in the last month, but I'm still far, very far, from the top.


Ok, that's a lot of made-up numbers for just four paragraphs.  What's going on here?

First, any measurement is meaningless without context.  I originally said "a statistic" instead of "measurement", but the whole point of statistics, that is, pulling (abstracting) concise metrics out of a pile of data, is to provide context.  If I say that the mass of a sample is 153 grams, that doesn't tell me much, but if you tell me that the average (mean) mass of past samples is 75 grams and the standard deviation is 8 grams, I know I'm dealing with an extremely rare high-mass sample.  Or my scale is broken, or I'm actually measuring a completely different kind of sample, or something else significant is going on.  The mean and standard deviation statistics provide context for knowing what I'm dealing with.

Simply saying "highest in seventeen months" or "jumped 314 places in the rankings" doesn't provide any meaningful context.  Either or both of those could be highly significant, or nothing in particular.

Second, citing rankings like highest, 1209th and so forth implies that something noteworthy about a ranking is also noteworthy about the underlying measurement that's being ranked.  But this is misleading.  Depending on how the rating is distributed, a large change in rating could mean a small change in ranking, or a large one, and likewise for "highest in N time periods."  Technically, ranking can be highly non-linear.

Rankings are not entirely useless.  For example, there have been many more record high temperatures than record low temperatures in recent decades.  Given that short term temperature fluctuations over more than a few days are fairly random (or at least, chaotic), this strongly suggests that temperatures overall are rising.  More sophisticated measurements bear this out, but the simple comparison of record highs versus record lows quickly suggests a trend in the climate as a whole.  Even then, though, it's the careful measurement of the temperatures themselves that tells what's really going on.  Looking at record highs and lows just points us in a useful direction.

In general, when someone cites a ranking or a record extreme, it's good to ask what's going on with the quantity being ranked.

Wednesday, December 12, 2012

Adventures in hyperspace

The hypercube.  A geekly rite of passage, at least for geeks of a certain age.  The tesseract.   The four-dimensional cube.  Because what could be cooler than three dimensions?  Four dimensions!  Cue impassioned discussion over whether time is "the" fourth dimension, or such.

Cool concept, but can you visualize one, really "see" a hypercube in your mind's eye?  We can get hints, at least.  It's possible to draw or build a hypercube unfolded into ordinary three-dimensional space, just as you can unfold a regular cube flat into two-dimensional space.  Dali famously depicted such an unfolded 4-cube.  You can also depict the three-dimensional "shadow" of a 4-cube, and even -- using time as an extra dimension -- animate how that shadow would change as the 4-cube rotated in 4-space (images courtesy of Wikipedia, of course).

That's all well and good, but visualizing shadows is not the same as visualizing the real thing.  For example, imagine an L-shape of three equal-sized plain old 3D cubes.  Now another L.  Lay one of them flat and rotate the other so that it makes an upside-down L with one cube on the bottom and the other two arranged horizontally on the layer above it.  Fit the lower cube of that piece into the empty space inside the L of the first piece, so that the first piece is also fitting into the empty space of that piece.

What shape have you made?  Depending on how natural such mental manipulation is for you and how clear my description was, you may be able to answer "A double-wide L" or something similar.  Even if such things make your head hurt, you probably had little trouble at least imagining the two individual pieces.

Now do the analogous thing with 4-cubes.  What would the analogue of an L-shape even be in 4-space?  How many pieces would we need? Two?  Three?  Four?  Very few people, I expect, could answer a four-dimensional version of the question above, or even coherently describe the process of fitting the pieces together.

Our brains are not abstract computing devices.  They are adapted to navigating a three-dimensional world which we perceive mainly (but not exclusively) by processing a two-dimensional visual projection of it.  Dealing with a four-dimensional structure is not a simple matter of allocating more mental space to accommodate the extra information.  It's a painstaking process of working through equations and trying to make sense of their results.

That's not to say we're totally incapable of comprehending 4-space.  We can reason about it to a certain extent.  People have even developed four-dimensional, and even up to seven-dimensional (!) Rubik's Cubes using computer graphics.  It's not clear if anyone has ever solved a 7-cube, but a 3x3x3x3x3 cube definitely has been solved.

Even so, it's pretty clear that the solvers are not mentally rotating cube faces in four or five dimensions, but dealing with a (two-dimensional representation of) a three-dimensional collection of objects that move in prescribed, if complicated, ways.

From a mathematical point of view, on the other hand, dealing in four or five or more dimensions is just a matter of adding another variable.  Instead of (x,y) coordinates or (x,y,z), you have (w,x,y,z) or (v,w,x,y,z) coordinates and so forth.  Familiar formulas generally apply, with appropriate modifications.  For example, the distance between two points in 5-space is given by

d2 = v2 + w2 + x2 + y2 + z2

if v, w, etc. are the distances in each of the dimensions.  This is just the result of applying the pythagorean theorem repeatedly.

Abstractly, we can go much, much further.  There are 10-dimnsional spaces, million-dimensional spaces, and so on for any finite number.  There are infinite-dimensional spaces.  There are uncountably infinite-dimensional spaces (I took a stab at explaining countability in this post).

Whatever intuition we may have in dealing with 3- or 4-space can break down completely when there are many dimensions.  For example, if you imagine a 3-dimensional landscape of hills and valleys, and a hiker who tries to get to the highest point by always going uphill when there is a chance to and never going downhill, it's easy to imagine that hiker stuck on the top of a small hill, unwilling to go back down, never reaching the high point.  If the number of dimensions is large, though, there will almost certainly be a path the hiker could take from any given point to the high point (glossing over what "high" would mean).  Finding it, of course, is another matter.

You can't even depend on things to follow a consistent trend as dimensions increase, as we can in the case of a path being more and more likely to exist as the number of dimensions increases.  A famous example is the problem of finding a differentiable structure on a sphere.

Since we can meaningfully define distance in any finite number of dimensions, it's easy to define a sphere as all points a given distance from a given center point (it's also possible to do likewise in infinite dimensions).  If you really want to know what a differentiable structure is, have fun finding out.  Suffice it to say that the concepts involved are not too hard to visualize in two or three dimensions.  Indeed, the whole field they belong to has a lot to do with making intuitive concepts like "smooth" and "connected" mathematically rigorous.   Even without knowing any of the theory (I've forgotten what little I knew years ago), it's not hard to see something odd is going on if I tell you there is:
  • exactly one way to define a differentiable structure on a 1-sphere (what most of us would call a circle)
  • likewise on a 2-sphere (what most of us would just call a sphere)
  • and the 3-sphere (what some would call the 3-dimensional surface of a hypersphere)
  • and the 5-sphere (never mind)
  • and the 6-sphere
Oh ... did I leave out the 4-sphere?  Surely there can only be one way for that one too, right?

Actually no one knows.  There is at least one.  There may be more.  There may even be an infinite number (countable or uncountable).

Fine.  Never mind that.  What happens after 6 dimensions?
  • there are 28 ways on a 7-sphere
  • 2 on an 8-sphere
  • 8 on a 9-sphere
  • 6 on a 10-sphere
  • 992 on an 11-sphere
  • exactly one on a 12-sphere
  • then 3, 2, 16256, 2, 16, 16, 523264, and 24 as we go up to 20 dimensions
See the pattern?  Neither do I, nor does anyone else as far as I know. [The pattern of small, small, small, big-and-(generally)-getting-bigger continues at least up to 64 dimensions, but the calculations become exceedingly hairy and even the three-dimensional case required solving one of the great unsolved problems in mathematics (the Poincaré conjecture).  See here for more pointers, but be prepared to quickly be hip-deep in differential topology and such.]   In the similar question of differential structures on topological manifolds, there is essentially only one answer for any number of dimensions except four.  There are uncountably many differential structures on a four-dimensional manifold.  So much for geometric intuition.

It's worth pondering to what extent we can really understand results like these.  Certainly not in the same way that we understand how simple machines work, or that if you try to put five marbles in four jars, at least one jar will have more than one marble in it.

Statements like "there are 992 differentiable structures on an 11-sphere" are purely formal statements, saying essentially that if you start with a given set of definitions and assumptions, there are 992 ways to solve a particular problem.  The proofs of such statements may use various structures that we can visualize,  but that's not the same as being able to visualize an 11-dimensional differentiable structure.  Even if we happen to be able to apply this result to something in our physical world, we're really just mechanically applying what the theorems say should happen in the real world.   Doing so doesn't give us a concrete understanding of an eleven-dimensional differentiable structure.  

That, we're just not cut out to do.  In fact, we most likely don't even visualize three complete dimensions.  We're fairly finely tuned to judging how big things are, how far away they are and what's behind what (including things we can't see at the moment) and what's moving in what direction how fast, but we don't generally visualize things like the color of surfaces we can't see.  A truly three dimensional mental model would include that, but ours don't.  Small wonder a hypercube is a mind-boggling structure, to say nothing of some of the oddities listed above.


Monday, November 19, 2012

If language isn't an instinct, what is it?

Steven Pinker's The Language Instinct makes the case that humans, and so far as we know only humans, have an innate ability to acquire language in the sense we generally understand it.  Further, Pinker asserts that using this ability does not require conscious effort.  A child living with a group of people will normally come to learn their language, regardless of whether the child's parents spoke that language, or what particular language it is.  This is not limited to spoken languages.  A child living among sign language users will acquire the local sign language.  There are, of course, people who are unable to learn languages, but they are the rare exceptions, just as there are people who are completely unable to see colors.

There is, on the other hand, no innate ability to speak any particular language. A child of Finnish-speaking parents will not spontaneously speak Finnish if there is no one around speaking Finnish, and the same can be said of any human language.

This is noteworthy, even it if might seem obvious, because human languages vary to an impressive degree.  Some have dozens of distinct sounds, some only a handful.  Some have rich systems of inflections, allowing a single word to take thousands of different forms.  Some (like English, and Mandarin even more so) have very little inflection.  Some have large vocabularies and some don't (though any language can easily extend its vocabulary).  The forms used to express common concepts like "is", or whether something happened yesterday, is happening now or might never happen, can be completely different in different languages.

At first glance this variation may seem completely arbitrary, but it isn't.  There are rules, even if our understanding of them is very incomplete.  There are no known languages where, say, repeating a word five times always means the opposite of repeating that word four times.  There's no reason in principle there couldn't be such a language, but there aren't, and the probable reasons aren't hard to guess.

There's a more subtle point behind this: There is no one such thing as "communication", "signaling" or "language".  Rather, there are various frameworks for communication.  For example, "red means stop and green means go" is a system with two signals, each with a fixed meaning.  Generalizing this a bit, "define a fixed set of signs each with a fixed meaning" is a simple framework for communication.  A somewhat more complex framework would allow for defining new signs with fixed meanings -- start with "red means stop and green means go", but now add "yellow means caution".

Many animals communicate within one or the other of these frameworks.  Many, many species can recognize a specific set of signs.  Dogs and great apes, among others, can learn new signs.  Human language, though, requires a considerably more complex framework.  We pay attention not only to particular signs, but to the order in which they are communicated.  In English "dog bites man" is different from "man bites dog".  Even in languages with looser word order, order still matters.

Further, nearly all, if not all, human languages have a concept of "subordinate clause", that is, the ability to fold a sentence like "The boy is wearing a red shirt" into a sentence like "The boy who is wearing the red shirt kicked the ball."  These structures can nest deeply, apparently limited by the short-term memory of the speaker and listener and not by some inherent rule.  Thus we can understand sentences like I know you think I said he saw her tell him that.  As far as we can tell, no other animal can do this sort of thing.

This not to say that communication in other animals is simple.  Chimpanzee gestures, for example, are quite elaborate, and we're only beginning to understand how dolphins and other cetaceans communicate.  Nonetheless, there is reasonable evidence that we're not missing anything on the order of human language.   It's possible in principle that, say, the squeaks of mice are carrying elaborate messages we haven't yet learned how to decode, but mice don't show signs of behavior that can only be explained by a sophisticated signaling system.   Similarly, studies of dolphin whistles suggest that their structure is fundamentally less complex than human language -- though dolphins are able to understand ordered sets of commands.

In short, human languages are built on a framework unique to us, and we have an innate, species-universal, automatic ability to learn and use human languages within that framework.  Thus the title The Language Instinct.   Strictly speaking The Instinct to Acquire and Use Language would be more precise, but speaking strictly generally doesn't sell as many books.


This all seems quite persuasive, especially as Pinker puts it forth, but primatologist and developmental psychologist Michael Tomasello argues otherwise in his review of Pinker, straightforwardly titled Language is not an Instinct  (Pinker's book titles seem to invite this sort of response).  Tomasello is highly respected in his fields and knows a great deal about how human and non-human minds work.  I cited him as an authority in a previous post on theories of mind, for example.  Linguistics is not his area of specialization, but he is clearly more than casually familiar with the literature of the field.

Tomasello agrees that people everywhere develop languages, and that human languages are distinct from other animal communication systems, albeit perhaps not quite so distinct as we would like to think.  However, he argues that there does not need to be any language-specific capability in our genes in order for this to be so.  Instead, the simplest explanation is that language falls out as a natural consequence of other abilities, such as the ability to reason in terms of objects, actions and predicates.

To this end, he cites Elizabeth Bates' analogy that, while humans eat mostly with their hands, this does not mean there is an innate eating-with-hands capability.  People need to eat, eating involves moving food around and our hands are our tool of choice for moving things around in general.  Just because everyone does it doesn't mean that there is a particular instinct for it.  Similarly, no other species is known to cook food, but cooking food is clearly something we learn, not something innate.  Just because only we do it doesn't mean that we have a particular instinct for it.

This is a perfectly good point about logical necessity.  If all we know is that language is universal to humans and specific to humans, we can't conclude that there is a particular instinct for it.  But Tomasello goes further to assert that, even when you dig into the full evidence regarding human language, not only is there no reason to believe that there is a particular language instinct, but language is better explained as a result of other instincts we do have.


So how would we pick between these views?  Tomasello's review becomes somewhat unhelpful here.  First, it veers into criticism of Pinker personally, and linguists of his school of thought in general, as being unreceptive to contrary views, prone to assert his views as "correct" and "scientific" when other supportable views exist, and overly attached to the specialized jargon of their field.  A certain amount of this seems valid.  Pinker is skilled in debate, a useful skill that can cut both ways, and this can give the air of certainty regardless of how certain things actually are.  There is also mention of Pinker's former advisor, the famed linguistic pioneer and polemicist Noam Chomksy, but Pinker's views on cognition and language are not necessarily those of Chomsky.

Second, and one would have to assume as a result of the first point, the review takes on what looks suspiciously like a strawman.   In Tomasello's view Pinker, and those claiming a "language instinct" that is more than the natural result of human cognition and the general animal ability to signal, are generally concerned with mathematical elegance, and in particular the concept of generative grammar.

Generative grammar breaks language down into sentences which are in turn composed of a noun phrase and a verb phrase, which may in turn be composed of smaller parts in an orderly pattern of nesting.  This is basically the kind of sentence diagramming you may have learned in school [when I wrote this I didn't realize that there are several ways people are taught to analyze sentences, so I wrote "you learned in school", assuming everyone had had the same experience..  But of course there are several ways.  In some schemes the results look more like dependency graphs than parse trees, which sent me down a fairly eye-opening path.  So, sorry about that, but at least I ended up learning something].

Linguistic theories along these lines generally add to this some notion of "movement rules" that allow us to convert, say, The man read the book into The book was read by the man.  Such systems are generally referred to as transformational generative grammars, to emphasize the role of the movement rules, but I'll go with Tomasello here and drop the "transformational" part.  Keep in mind, though, that if a field is "basically" built on some familiar concept, that's just the tip of the iceberg.

A generative grammar, by itself, is purely syntactic.  If you call flurb a noun and veem a verb, then "Flurbs veem." is a grammatically correct sentence (at least according to English grammar) regardless of what, if anything, flurb and veem might actually mean.  Likewise, you can transform Flurbs veem into Veeming is done by flurbs and other such forms purely by moving grammatical parts around.

Tomasello questions whether the structures predicted by generative grammar even exist in all languages.  Generative grammar did happen to work well when first applied to English, but that's to be expected.  The techniques behind it, which come from Latin "grammar school" grammar by way of computing theory, were developed to analyze European languages, of which English is one.  Likewise, much of the early work in generative grammar was focused on a handful of the world's thousands of languages, though not necessarily only European ones.  There is an obvious danger in such situations that someone familiar with generative grammar will tend to find signs of it whether it is there or not.  If all you have is a hammer, the whole world looks like a nail.

From what I know of the evidence though, all known languages display structures that can be analyzed reasonably well in traditional generative grammar terms.  Tomasello asserts, for example, that Lakota (spoken by tribes in the Dakotas and thereabouts) has "no coherent verb phrase".   A linguist whom I consulted, who is familiar with the language, tells me this is simply not true.  The Jesuit Eugene Buechel was apparently also unaware of this when he wrote A Grammar of Lakota in 1939.

But perhaps we're a bit off in the weeds at this point.  What we really have here, I believe, is a set of interrelated assertions:
  • Human language is unique and universal to humans.  This is not in dispute.
  • Humans acquire language naturally, independent of the language.  Also not in dispute.
  • Human languages vary significantly.  Again, not in dispute.
  • Human language is closely related to human cognition.  This is one of Tomasello's main points, but I doubt that Pinker would dispute it, even though Tomasello seems to think so.
  • Generative grammar predicts structures that are actually seen in all known languages.  Tomasello disputes this while Pinker asserts it.  I think Pinker has the better case.
  • Generative grammar describes the actual mechanisms of human language.
That last is subtly different from the one before it.  Just because we see noun phrases and verb phrases, and the same sentence can be expressed in different forms, doesn't mean that the mind actually generates parse trees (the mathematical equivalent of sentence diagrams) or that in order to produce "The book was read by the man" the mind first produces "The man read the book" and then transforms it into passive voice.  To draw an analogy, computer animators have models that can generate realistic-looking plants and animals, but no one is claiming that explains how plants and animals develop.

Personally, I've never been convinced that generative grammars are fundamental to language.  Attempts to write language-processing software based on this theory have ended in tears, which is not a good sign.  Generative grammar is an extremely good fit for computers.  Computer languages are in fact based on a tamer version of it, and the same concepts turn up repeatedly elsewhere in computer science.  If it were also a good fit for natural languages, natural language processing ought to be considerably further along than it is.  There have been significant advances in language processing, but they don't look particularly like pure generative grammar rendered in code.  Peter Norvig has a nice critique on this.

Be that as it may, I don't see that any of this has much bearing on the larger points:
  • Human language has features that are distinct from other human cognitive functions.
  • These features (or some of them) are instinctive.
In putting forth an alternative to generative grammar, drawn from work elsewhere in the linguistic community, Tomasello appears to agree on the second point, if not the first.  In the alternative view, humans have a number of cognitive abilities, such as the ability to form categories, to distinguish objects, actions and actors and to define a focus of attention.  There is evolutionary value in being able to communicate, and a basic constraint that communication consists of signals laid out sequentially in time (understanding that there can be multiple channels of communication, for example saying "yes" while nodding one's head).

In this view, there are only four basic ways of encoding what's in the mind into signals to be sent and received:
  • Individual symbols (words)
  • Markers on symbols (for example, prefixes and suffixes -- "grammatical morphology")
  • Ordering of symbols (syntactic distinctions like "dog bites man" vs. "man bites dog")
  • Prosody (stress and intonation)
Language, then, would be the natural result of trying to communicate thoughts under these constraints.



It's quite possible that our working within these constraints would result in something that looks a lot like generative grammar, which is another way of saying that even if language looks like it can be described by generative grammar, generative grammar may not describe what's fundamentally going on.

On the other hand, this sort of explanation smacks of Stephen Jay Gould's notion that human intelligence could be the result of our having evolved  a larger brain as a side-effect of something else.  While evolution can certainly act in such roundabout ways, this pretends that intelligence isn't useful and adaptive on its own, and it glosses over the problem of just how a bigger brain is necessarily a smarter brain, as opposed to, say, a brain that can control a larger body without any sophisticated reasoning, or a brain less likely to be seriously injured from a blow to a head.

Likewise, we can't assume that our primate ancestors, having vocal cords, problem-solving ability and the need to communicate, would necessarily develop, over and over again, structurally similar ways of putting things into words.  Speaking, one could argue, is a significantly harder problem than eating with one's hands, and might require some further, specialized ability beyond sheer native intelligence.

There could well have been primates with sophisticated thoughts to express, who would have hunted more effectively and generally survived better had they been able to communicate these thoughts, but nonetheless just couldn't do it.  This would have given, say, a group of siblings that had a better way of voicing their thoughts a significant advantage, and so we're off to the races.  Along those lines, it's quite possible that some or all of the four encoding methods are both instinctive and, in at least some aspects, specific to language as opposed to other things the brain does.


Looking at the list of basic ways of encoding:

Associating words with concepts seems similar to the general problem of slotting mental objects into schemas, for example having a "move thing X from point A to point B" schema that can accept arbitrary X, A and B.  Clearly we and other animals have some form of this.

However, that doesn't seem quite the same as associating arbitrary sounds or gestures with particular meanings.  In the case of "move thing X from point A to point B", there will only be one particular X, Y or B at any given time.  Humans are capable of learning hundreds of thousands of "listemes" (in Pinker's terminology), that is sign/meaning pairs.  This seems completely separate from the ability to discern objects, or fit them into schemas.  Lots of animals can do that, but it appears that only a few can learn new associations between signs and meanings, and only humans can handle human-sized vocabularies.

Likewise, morphology -- the ability to modify symbols in arbitrary, conventional ways, seems very much language-specific, particularly since we all seem to distinguish words and parts of words without being told how to do so.  The very idea of morphology assumes that he sings is two words, not he and sing and s.

Ordering of symbols is to some extent a function of having to transmit signals linearly and both sides having limited short-term memory.  Related concepts will tend to be nearby in time, for example.  This is not a logical necessity but a practical one.  One could devise schemes where, say, all the nouns from a group of five sentences are listed together, followed by all the verbs with some further markers linking them up, but this would never work for human communication.

But to untangle a sentence like I know you think I said he saw her tell him that, it's not enough to know that, say I next to know implies that it's me doing the knowing.  We have to make a flat sequence of words into a nested sequence of clauses, something like I know (you think (I said (he saw (her tell him that)))).  Different languages do this differently, and it can be done different ways in the same language, depending on which wording we choose.  (He saw (her tell him that)), I know (you think (I said)).

Finally, prosody is probably closely tied to expressions of emotion.  Certainly SHOUTING WHEN ANGRY is related to other displays of aggression, and so forth.  Nonetheless, prosody can also be purely informational, as in distinguishing "The White House is large" from "The white house is large."  This informational use of prosody might well be specific to language use.
In each of these cases, it's a mistake to equate some widely-shared capability with a particular facility in human language.  There is more to vocabulary, morphology, syntax and prosody than simply distinguishing objects, changing the form of symbols, putting symbols in a linear sequence or speaking more or less loudly, and this, I believe, is where Tomasello's argument falls down.


Similarly, the ability to map some network of ideas (she told him that, you think I said it, etc.) into a sequence of words seems distinct from the ability to conceive such thoughts.  At the very least, there would need to be some explanation of how the mind is able to make such a mapping.  Perhaps that mechanism can be built from parts used elsewhere.  It wouldn't be the first time such a re-purposing has evolved.  Or it might be unique to language.

Most likely, then, language is not an instinct, per se, but a combination of pieces, some of which are general-purpose, some of which are particular to animal language, and some of which may well be unique to human language.  The basic pieces are innate, but the particular way they fit together to form a given language is not.

Try fitting that on a book cover.

Tuesday, October 2, 2012

What is or isn't a theory of mind? Part 2: Theories of mind and theories of mind

I said a while ago I'd return to theories of mind after reading up a bit more on primate cognition.  I have, now, and so I shall.  But first, the one that got away.

When I first got started on this thread, I ran across a paper discussing what kind of experiment might show other primates to have a theory of mind.  There were two points: First, that existing experiments (at the time) could be explained away by simple rules like "Non-dominant chimps can follow the gaze of their dominants and associate taking food while the dominant is looking with getting the snot beat out of them."  Second, that there were experiments, in principle, in which positive results could not be plausibly explained away by such rules.

I haven't been able to dig up the paper, but as I recall the gist of the improved experiments was to produce, as I mentioned before, a combinatorial explosion of possibilities, which could be explained more simply by the primate subject having a theory of mind than by a large number of ad-hoc rules.  "Explained more simply" are, of course, three magic words in science.

The scenario involved a dominant chimp, a non-dominant one, food to be stolen, and various doors and windows by which the non-dominant chimp would be able to see what the dominant saw, with or without the dominant seeing that it saw it.

Um, yeah.

For example, the two chimps are in separate rooms, with a row of compartments between them.  The compartments have doors on both sides.  The experimenter places the food in the third of five compartments.  Both doors are open, both chimps can see the food and they can see each other.  The doors are closed (the dominant's first), and the food is moved to compartment one.  The non-dominant sees its door to compartment one open and then the other door.  Does it rush for the food or hang back?  What if it sees that the other door to compartment one was already open?

If we try several different randomly-chosen scenarios with such a setup, a subject with a theory of mind should behave differently with one without, and due to the sheer number of possibilities a "mindless" explanation would have to be hopelessly contrived.

Something like that.  I probably have the exact setup garbled, but the point was to go beyond "Did I see the dominant looking at the food?" to "Where does the dominant think the food is?"  Interesting stuff.


While trying to track that paper down again, I ran across a retrospective by Josep Call and Michael Tomasello entitled Does the chimpanzee have a theory of mind? 30 years later.  I'm not familiar with Call, but Tomasello turns up again and again in research on developmental psychology and the primate mind.  He also has quite a bit to say about language and its development in humans, but that's for a different post (or several).

As with well-written papers in general, the key points of the retrospective are in the abstract:
On the 30th anniversary of Premack and Woodruff’s seminal paper asking whether chimpanzees have a theory of mind, we review recent evidence that suggests in many respects they do, whereas in other respects they might not. Specifically, there is solid evidence from several different experimental paradigms that chimpanzees understand the goals and intentions of others, as well as the perception and knowledge of others. Nevertheless, despite several seemingly valid attempts, there is currently no evidence that chimpanzees understand false beliefs. Our conclusion for the moment is, thus, that chimpanzees understand others in terms of a perception–goal psychology, as opposed to a full-fledged, human-like belief–desire psychology.
In other words, chimps appear to understand that others (both other chimps and those funny-looking furless creatures) can have goals and knowledge, but they don't appear to understand that others have beliefs based on knowledge behind those goals.  For example, if a human is reaching for something inaccessible, a chimp will reach it down for them (at least if the human has been a reliable source of goodies).  If a human is flips a light switch by foot -- an unusual act for a creature lacking prehensile toes -- a chimp is likely to try the same, unless the human's hands were full at the time, suggesting that the chimp is aware that the human had to use their foot in the second case but wanted to in the first.

Naturally, any of the ten cases given in the paper could be explained by other means.  Call and Tomasello argue that the simplest explanation for the results taken together is that chimps understand intention.

Likewise, there is good evidence that chimps understand that others see and know things.  For example, they are more likely to gesture when someone (again, normal or furless) is looking, and they will take close account of who is looking where when trying to steal food.

On the other hand, chimps fail several tests that young humans pass in largely similar form.  For example, if there are two boxes that may contain food, and a dominant and a non-dominant subject both see food placed in it, the subject will not try to take food from the box with food in it.  Makes sense.  The subject knows the dominant knows where the food is.

If they both see the food moved to a second box, the subject will still leave it alone.  If the dominant doesn't see the food moved but the subject does, the subject ought to know that the dominant thinks the food is still in the first box and that it's safe to go for the second (the experiment is set up so that the dominant can effectively only guard one box).

However, it doesn't go for the box with the food in it.  This and other experiments suggest that the subject doesn't know that the dominant doesn't know what it knows.

Um, yeah.

In other words, the subject appears to assume that that, because it knows the food is in the second box, so does the dominant.  "Because" might be a bit strong here ... try again: Chimps understand that others may have their own intentions different from theirs, and that others can know things, but not that others can have knowledge different from theirs.


Call and Tomasello conclude:
It is time for humans to quit thinking that their nearest primate relatives only read and react to overt behavior.   Obviously, chimpanzees’ social understanding begins with the observation of others’ behavior, as it does for humans, but it does not end there. Even if chimpanzees do not understand false beliefs, they clearly do not just perceive the surface behavior of others and learn mindless behavioral rules as a result. All of the evidence reviewed here suggests that chimpanzees understand both the goals and intentions of others as well as the perception and knowledge of others. 
[...]
In a broad construal of the phrase ‘theory of mind’, then, the answer to Premack and Woodruff’s pregnant question of 30 years ago is a definite yes, chimpanzees do have a theory of mind. But chimpanzees probably do not understand others in terms of a fully human-like belief–desire psychology in which they appreciate that others have mental representations of the world that drive their actions even when those do not correspond to reality [I'd argue for "their own perception of reality" here].  And so in a more narrow definition of theory of mind as an understanding of false beliefs, the answer to Premack  and Woodruff’s question might be no, they do not.
I suppose this might sound wishy-washy (chimps have a theory of mind, except no, they don't), but for my money it's insightful, not just concerning the minds of chimps, but the notion that there can be more than one kind of theory of mind.