Showing posts with label probability. Show all posts
Showing posts with label probability. Show all posts

Thursday, March 27, 2025

Losing my marbles over entropy

In a previous post on Entropy, I offered a garbled notion of "statistical symmetry." I'm currently reading Carlo Rovelli's The Order of Time, and chapter two laid out the idea that I was grasping at concisely, clearly and -- because Rovelli is an actual physicist -- correctly.

What follows is a fairly long and rambling discussion of the same toy system as the previous post, of five marbles in a square box with 25 compartments. It does eventually circle back to the idea of symmetry, but it's really more of a brain dump of me trying to make sure I've got the concepts right. If that sounds interesting, feel free to dive in. Otherwise, you may want to skip this one.


In the earlier post, I described a box split into 25 little compartments with marbles in five of the compartments. If you start with, say, all the marbles on one row (originally I said on one diagonal, but that just made things a bit messier) and give the box a good shake, the odds that the marbles all end up in the same row that they started in are low, about one in 50,000 for this small example. So far, so good.

But this is really true for any starting configuration -- if there are twenty-five compartments in a five-by-five grid, numbered from left to right then top to bottom, and the marbles start out in, say, compartments 2, 7,  8, 20 and 24, the odds that they'll still be in those compartments after you shake the box are exactly the same, about one in 50,000.

On the one hand, it seems  like going from five marbles in a row to five marbles in whatever random positions they end up in is making the box more disordered. On the other hand, if you just look at the positions of the individual marbles, you've gone from a set of five numbers from 1 to 25 ... to a set of numbers from 1 to 25, possibly the one you started with. Nothing special has happened.

This is why the technical definition of entropy doesn't mention "disorder". The actual definition of entropy is in terms of microstates and macrostates. A microstate is a particular configuration of the individual components of a system, in this case, the positions of the marbles in the compartments. A macrostate is a collection of microstates that we consider to be equivalent in some sense.

Let's say there are two macrostates: Let's call any microstate with all five marbles in the same row lined-up, and any other microstate scattered.  In all there are 53,130 microstates (25 choose 5). Of those, five have all the marbles in a row (one for each row), and the other 53,125 don't. That is, there are five microstates in the lined-up microstate and 53,125 in the scattered microstate.

The entropy of a macrostate is related to the number of microstates consistent with that macrostate (for more context, see the earlier post on entropy, which I put a lot more care into). Specifically, it is the logarithm of the number of such states, multiplied by a factor called the Boltzmann constant to make the units come out right and to scale the numbers down, because in real systems the numbers are ridiculously large (though not as large as some of these numbers), and even their logarithms are quite large. Boltzman's constant is 1.380649×10−23 Joules per Kelvin.

The natural logarithm of 5 is about 1.6 and the natural logarithm of 53,125 is about 10.9. Multiplying by Boltzmann's constant doesn't change their relative size: The scattered macrostate has about 6.8 times the entropy of the lined-up macrostate.

If you start with the marbles in the low-entropy lined-up macrostate and give the box a good shake, 10,625 times out of 10,626 you'll end up in the higher-entropy scattered macrostate. Five marbles in 25 compartments is a tiny system, considering that there are somewhere around 10,800,000,000,000,000,000,000,000 molecules in a milliliter of water. In any real system, except cases like very low-temperature systems with handfuls of particles, the differences in entropy are large enough that "10,625 times out of 10,626" turns into "always" for all intents and purposes.


This distinction between microstates and macrostates gives a rigorous basis for the intuition that going from lined-up marbles to scattered-wherever marbles is a significant change, while going from one particular scattered state to another isn't.

In both cases, the marbles are going from one microstate to another, possibly but very rarely the one they started in. In the first case, the marbles go from one macrostate to another. In the second, they don't. Macrostate changes are, by definition, the ones we consider significant, in this case, between lined-up and scattered. Because of how we've defined the macrostates, the first change is significant and the second isn't.


Let's slice this a bit more finely and consider a scenario where only part of a system can change at any given time. Suppose you don't shake up the box entirely. Instead, you take out one marble and put it back in a random position, including, possibly, the one it came from. In that case, the chance of going from lined-up to scattered is 20 in 21, since out of the 21 positions the marble can end up in, only one, its original position, has the marbles all lined up, and in any case it doesn't matter which marble you choose.

What about the other way around? Of the 53,120 microstates in the scattered macrostate, only 500 have four of the five marbles in one row. For any microstate, there are 105 different ways to take one marble out and replace it: Five marbles times 21 empty places to put it, including the place it came from.

For the 500 microstates with four marbles in a row, only one of those 105 possibilities will result in all five marbles in a row: Remove the lone marble that's not in a row and put it in the only empty place in the row of four. For the other 52,615 microstates in the scattered macrostate, there's no way at all to end up with five marbles lined up by moving only one marble.

So there are 500 cases where the scattered macrostate becomes lined-up, 500*104 cases where it might but doesn't, and 52,615*105 cases where it couldn't possibly. In all, that means that the odds are 11,153.15 to one against scattered becoming lined-up by removing and replacing one marble randomly.

Suppose that the marbles are lined up at some starting time, and every time the clock ticks, one marble gets removed and replaced randomly. After one clock tick, there is a 104 in 105 chance that the marbles will be in the high-entropy scattered state. How about after two ticks? How about if we let the clock run indefinitely -- what portion of the time will the system spend in the lined-up macrostate?

The there are tools to answer questions like this, particularly Markov chains and stochastic matrices (that's the same Markov Chain that can generate random text that resembles an input text). I'll spare you the details, but the answer requires defining a few more macrostates, one for each way to represent the number five as the sum of whole numbers: [5], [4, 1], [3, 2], [3, 1, 1], [2, 2, 1], [2, 1, 1, 1] and [1, 1, 1, 1, 1].

The macrostate [5] comprises all microstates with five marbles in one row, the macrostate [4, 1] comprises all microstates with four marbles in one row and one in another row, the macrostate [2, 2, 1] comprises all microstates with two marbles in one row, two marbles in another row and one marble in a third one, and so forth.

Here's a summary

MacrostateMicrostatesEntropy
[5]51.6
[4,1]5006.2
[3,2]2,0007.6
[3,1,1]7,5008.9
[2,2,1]15,0009.6
[2,1,1,1]25,00010.1
[1,1,1,1,1]3,1258.0

The Entropy column is the natural logarithm of the Microstates column, without multiplying by Boltzmann's constant. Again, this is just to give a basis for comparison. For example [2,1,1,1] is the highest-entropy state, and [2,2,1] has four times the entropy of [5]. 

It's straightforward, but tedious, to count the number of ways one macrostate can transition to another. For example, of the 105 transitions for [3,2], 4 end up in [4,1], 26 end up back in [3,2] (not always by putting the removed marble back where it was), 30 end up in [3, 1, 1] and 45 end up in [2, 2, 1]. Putting all this into a matrix and taking the matrix to the 10th power (enough to see where this is converging) gives

Macrostate% time% microstates
[5].0094.0094
[4,1].94.94
[3,2]3.83.8
[3,1,1]1414
[2,2,1]2828
[2,1,1,1]4747
[1,1,1,1,1]5.95.9

The second column is the result of the tedious matrix calculations. The third column is just the size of the macrostate as the portion of the total number of microstates. For example, there are 500 microstates in [4,1], which is 0.94% of the total, which is also the portion of the time that the matrix calculation says system will spend in [4, 1]. Technically, this means the system is ergodic, which means I didn't have to bother with the matrix and counting all the different transitions.

Even in this toy example, the system will spend very little of its time in the low-entropy lined-up state [5], and if it ever does end up there, it won't stay there for long.


Given some basic assumptions, a system that evolves over time, transitioning from microstate to microstate, will spend the same amount of time in any given microstate (as usual, that's not quite right technically), which means that the time spent in each macrostate is proportional to its size. Higher-entropy states are larger than lower-entropy states, and because entropy is a logarithm, they're actually a lot larger.

For example, the odds of an entropy decrease of one millionth of a Joule per Kelvin are about one in e(1017). That's a number with somewhere around 40 quadrillion digits. To a mathematician, the odds still aren't zero, but to anyone else they would be.

For all but the tiniest, coldest systems, the chance of entropy decreasing even by a measurable amount are not just small, but incomprehensibly small. The only systems where the number of microstates isn't incomprehensibly huge are are small collections of particles near absolute zero.

I'm pretty sure I've read about experiments where such a system can go from a higher-entropy state to a very slightly lower-entropy state and vice versa, though I haven't had any luck tracking them down. Even if no one's ever done it, such a system wouldn't violate any laws of thermodynamics, because the laws of thermodynamics are statistical (and there's also the question of definition over whether such a system is in equilibrium).

So you're saying ... there's a chance? Yes, but actually no, in any but the tiniest, coldest systems. Any decrease in entropy that could actually occur in the real world and persist long enough to be measured would be in the vicinity of 10−23 Joules per Kelvin, which is much, much too small to be measured except under very special circumstances.

For example, if you have 1.43 grams of pure oxygen in a one-liter container at standard temperature and pressure, it's very unlikely that you know any of the variables involved -- the mass of the oxygen, its purity, the size of the container, the temperature or the pressure, to even one part in a billion. Detecting changes 100,000,000,000,000 times smaller than that is not going to happen.



But none of that is what got me started on this post. What got me started was that the earlier post tried to define some sort of notion of "statistical symmetry", which isn't really a thing, and what got me started on that was my coming to understand that higher-entropy states are more symmetrical. That in turn was jarring because entropy is usually taken as a synonym for disorder, and symmetry is usually taken as a synonym for order.

Part of the resolution of that paradox is that entropy is a measure of uncertainty, not disorder. The earlier post got that right, but evidently that hasn't stopped my for hammering on the point for dozens more paragraphs and a couple of tables in this one, using a slightly different marbles-in-compartments example.

The other part is that more symmetry doesn't really mean more order, at least not in the way that we usually think about it.

From a mathematical point of view, a symmetry of an object is something you can do to it that doesn't change some aspect of the object that you're interested in. For example, if something has mirror symmetry, that means that it looks the same in the mirror as it does ordinarily.

It matters where you put the mirror. The letter W looks the same if you put a mirror vertically down the middle of it -- it has one axis of symmetry. The letter X looks the same if you put the mirror vertically in the middle, but it also looks the same if you put it horizontally in the middle -- it has two axes of symmetry.

Another way to say this is that if you could draw a vertical line through the middle of the W and rotate the W out of the page around that line, and kept going for 180 degrees until the W was back in the page, but flipped over, it would still look the same. If you chose some other line, it would look different (even if you picked a different vertical line, it would end up in a different place). That is, if you do something to the W -- rotate it around the vertical line through the middle -- it ends up looking the same. The aspect you care about here is how the W looks.

To put it somewhat more rigorously: if f is the particular mapping that takes each point to its mirror image across the axis, then f takes the set of points in the W to the exact same set of points. Any point on the axis maps to itself, and any point off the axis maps to its mirror image, which is also part of the W. The map f is defined for every point on the plane and it moves all of them except for the axis. The aspect we care about, which f doesn't change, is whether a particular point is in the W.

If you look at all the things you can do to an object without changing the aspect you care about, you have a mathematical group. For a W, there are two things you can do: leave it alone and flip it over. For an X, you have four options: leave it alone, flip it around the vertical axis, flip it around the horizontal axis, or do both. Leaving an object alone is called the identity transformation, and it's always considered a symmetry, because math. An asymmetrical object has only that symmetry (it's symmetry group is trivial).

In normal speech, saying something is symmetrical usually means it has the same symmetry group as a W -- half of it is a mirror image of the other half. Technically, it has bilateral symmetry. In some sense, though, an X is more symmetrical, since its symmetry group is larger, and a hexagon, which has 12 elements in its symmetry group, is more symmetrical yet.

A figure with 19 sides, each of which is the same lopsided squiggle, would have a symmetry group of 19 (rotate by 1/19 of a full circle, 2/19 ... 18/19, and also don't rotate at all). That would make it more symmetrical than a hexagon, and quite a bit more symmetrical than a W, but if you asked people which was most symmetrical, they would probably put the 19-sided squigglegon last of the three.

Our visual system is mostly trained to recognize bilateral symmetry. Except for special situations like reflections in a pond, pretty much everything in nature with bilateral symmetry is an animal, which is pretty useful information when it comes to eating and not being eaten. We also recognize rotational symmetry, which includes flowers and some sea creatures, also useful information.

It would make sense, then, that in day to day life, "more symmetrical" generally means "closer to bilateral symmetry". If a house has an equal number of windows at the same level on either side of the front door, we think of it as symmetrical,  even though the windows may not be exactly the same, the door itself probably has a doorknob on one side or the other and so forth, so it's not quite exactly symmetrical. We'd still say it's pretty symmetrical, even though from a mathematical point of view it either has bilateral symmetry or it doesn't (and in the real world, nothing we can see is perfectly symmetrical).

That should go some way toward explaining why, along with so many other things, symmetry doesn't necessarily mean the same thing in its mathematical sense as it does ordinarily. The mathematical definition includes things that we don't necessarily think of as symmetry.

Continuing with shapes and their symmetries, you can think of each shape as a macrostate. You can  associate a microstate with each mapping (technically, in this case, any rigid transformation of the plane) that leaves the shape unchanged. The macrostate W has two microstates: one for the identity transformation, which leaves the plane unchanged, and one for the mirror transformation around the W's axis.

The X macrostate has four microstates, one for the identity, one for the flip around the vertical axis, one for the flip around the horizontal axis, and one for flipping around one axis and then the other (in this case, it doesn't matter what order you do it in). The X macrostate has a larger symmetry group, which is the same as saying it has more entropy.

In this context, a symmetry is something you can do to the microstate without changing the macrostate. A larger symmetry group -- more symmetry -- means more microstates for the same macrostate, which means more entropy, and vice-versa. They're two ways of looking at the same thing.

In the case of the marbles in a box, a symmetry is any way of switching the positions of the marbles, including not switching them around at all. Technically, this is a permutation group.

For any given microstate,  some of the possible permutations just switch the marbles around in their places (for example, switching the first two marbles in a lined-up row), and some of them will move marbles to different compartments. For a microstate of the lined-up macrostate [5], there are many fewer permutations that leave the marbles in the same macrostate (all in one row, though not necessarily the same row) than there are for [2, 1, 1, 1]. Even though five marbles in a row looks more symmetrical, since it happens to have bilateral visual symmetry, it's actually a much less symmetrical macrostate than [2, 1, 1, 1], even though most of its microstates will just look like a jumble.


In the real world, distributing marbles in boxes is really distributing energy among particles, generally a very large number of them. Real particles can be in many different states, many more than the marble/no marble states in the toy example, and different states can have the same energy, which makes the math a bit more complicated. Switching marbles around is really exchanging energy among particles, and there are all sorts of intricacies about how that happens.

Nonetheless, the same basic principles hold: Entropy is a measure of the number of microstates for a given macrostate, and a system in equilibrium will evolve toward the highest-entropy macrostate available, and stay there, simply because the probability of anything else happening is essentially zero.

And yeah, symmetry doesn't necessarily mean what you think it might.

Saturday, July 21, 2018

Fermi on the Fermi paradox

One of the pleasures of life on the modern web is that if you have a question about, say, the history of the Fermi paradox, there's a good chance you can find something on it.  In this case, it didn't take long (once I thought to look) to turn up E. M. Jones's "Where is Everybody?" an  Account of Fermi's Question.

The article includes letters from Emil Konopinski, Edward Teller and Herbert York, who were all at lunch with Enrico Fermi at Los Alamos National Laboratory some time in the early 1950s when Fermi asked his question.  Fermi was wondering specifically about the possibility that somewhere in the galaxy some civilization had developed a viable form of interstellar travel and had gone on to explore the whole galaxy, and therefore our little blue dot out on one of the spiral arms.

Fermi and Teller threw a bunch of arguments at each other, arriving at a variety of probabilities.  Fermi eventually concluded that probably interstellar travel just wasn't worth the effort or perhaps no civilization had survived long enough to get to that stage (I'd throw in the possibility that they came by millions of years ago, decided nothing special was going on and left -- or won't come by for a few million years yet).

Along the way Fermi, very much in the spirit of "How many piano tuners are there in Chicago?" broke the problem down into a series of sub-problems such as "the probability of earthlike planets, the probability of life given an earthlike planet" and so forth.  Very much something Fermi would have done, (indeed, this sort of exercise goes by the name "Fermi estimation") and very similar to what we now call the Drake equation.

In other words, Fermi and company anticipated much of the subsequent discussion on the subject over lunch more than fifty years ago and then went on to other topics (and presumably coffee).  There's been quite a bit of new data on the subject, particularly the recent discovery that there are in fact lots of planets outside our solar system, but the theoretical framework hasn't changed much at all.

What's a Fermi paradox?

So far, we haven't detected strong, unambiguous signs of extraterrestrial intelligence.  Does that mean there isn't any?

The usual line of attack for answering this question is the Drake equation [but see the next post for a bit on its origins --D.H Oct 2018], which breaks the question of "How many intelligent civilizations are there in our galaxy?" down into a series of factors that can then be estimated and combined into an overall estimate.

Let's take a simpler approach here.

The probability of detecting extraterrestrial intelligence given our efforts so far is the product of:
  • The probability it exists
  • The probability that what we've done so far would detect it, given that it exists
(For any math geeks out there, this is just the definition of conditional probability)

Various takes on the Fermi paradox (why haven't we seen anyone, if we're pretty sure they're out  there?) address these two factors
  • Maybe intelligent life is just a very rare accident.  As far as we can tell, Earth itself has lacked intelligent life for almost all of its history (one could argue it still does, so feel free to substitute "detectable" for "intelligent").
  • Maybe intelligent life is hard to detect for most of the time it's around (See this post for an argument to that effect and this one for a bit on the distinction between "intelligent" and "detectable").  A particularly interesting take on this is the "dark forest" hypothesis, that intelligent civilizations soon figure out that being detectable is dangerous and deliberately go dark, hoping never to be seen again.  I mean to take this one on in a bit, but not here.
  • One significant factor when it comes to detecting signs of anything, intelligent or otherwise: as far as we know detectability drops with the square of distance, that is, twice as far away means four times harder to detect.  Stars are far away.  Other galaxies are really far away.
  • Maybe intelligent life is apt to destroy itself soon after it develops, so it's not going to be detectable for very long and chances are we won't have been looking when they were there .  This is a popular theme in the literature.  I've talked about it here and here.
  • Maybe the timing is just wrong.  Planetary time scales are very long.  Maybe we're one of the earlier ones and life won't develop on nearby planets for another million or billion years (basically low probability of detection again, but also an invitation to be more rigorous about the role of timing)

At first blush, the logic of the Fermi paradox seems airtight: Aliens are out there.  We'd see them if they were out there.  We haven't seen them.  QED.  But we're not doing a mathematical proof here.  We're dealing in probabilities (also math, but a different kind).  We're not trying to explain a mathematically impossible result.  We're trying to determine how likely it is that our observations are compatible with life being out there.

I was going to go into a longish excursion into Bayesian inference here, but ended up realizing I'm not very adept at it (note to self: get better at Bayesian inference).  So in the spirit of keeping it at least somewhat simple, let's look at a little badly-formatted table with, granted, a bunch of symbols that might not be familiar:


We see life
(S)
We don't see life (¬S)
Life exists (L) P(L ∧ S) P(L ∧ ¬S) P(L)
No life (¬L) P(¬L ∧ S) P(¬L ∧ ¬S) P(¬L)

P(S) P(¬S) 100%

P is for probability.  P(L) is the probability that there's intelligent life out there we could hope to detect as such, at all.  P(S) is the probability that we see evidence strong enough that the scientific community (whatever we mean by that, exactly) agrees that intelligent life is out there.  The ¬ symbol means "not" and the ∧ symbol means "and".  The rows sum to the right, so
  • P(L ∧ S) + P(L ∧ ¬S) = P(L) (the probability life exists is the probability that life exists and we see it plus the probability it exists and we don't see it)
  • P(S + ¬S) = 100% (either we see life or we don't see it)
Likewise the columns sum downward.  Also "and" means multiply (as long as the two probabilities are independent; they are here, since we allow for false positives), so P(L ∧ S) = P(L)×P(S).  This all puts restrictions on what numbers you can fill in.  Basically you can pick any three and those determine the rest.

Suppose you think it's likely that life exists, and you think that it's likely that we'll see it if it's there.  That means you think P(L) is close to 100% and P(L ∧ S) is a little smaller but also close to 100% (see conditional probability for more details) .  You get to pick one more.  It actually turns out not to matter that much, since we've already decided that life is both likely and likely to be detected.  One choice would be P(¬L ∧ S), the chance of a "false positive", that is, the chance that there's no life out there but we think we see it anyway.  Again, in this scenario we're assuming false positives should be unlikely overall, but choosing exactly how unlikely locks in the rest of the numbers.

It's probably worth calling out one point that kept coming up while I was putting this post together: The chances of finding signs of life depend on how much we've looked and how we've done it.  A lot of SETI has centered around radio waves, and in particular radio waves in a fairly narrow range of frequencies.  There are perfectly defensible reasons for this approach, but that doesn't mean that any actual ETs out there are broadcasting on those frequencies.  In any case we're only looking at a small portion of the sky at any given moment, our current radio dishes can only see a dozen or two light years out and there's a lot of radio noise from our own technological society to filter out.

I could model this as a further conditional probability, but it's probably best just to keep in mind that P(S) is the probability of having detected life after everything we've done so far, and so includes the possibility that we haven't really done much so far. 


To make all this concrete, let's take an optimistic scenario: Suppose you think there's a 90% chance that life is out there and a 95% chance we'll see it if it's out there.  If there's no chance of a false positive, then there's an 85.5% chance that we'll see signs of life and so a 14.5% chance we won't (as is presently the case, at least as far as the scientific community is concerned).  If you think there's a 50% chance of a false positive, then there's a 90.5% chance we'll see signs of life, including the 5% chance it's not out there but we see it anyway.  That means a 9.5% chance of not seeing it, whether or not it's actually there.

This doesn't seem particularly paradoxical to me.  We think life is likely.  We think we're likely to spot it.  So far we haven't.  By the assumptions above, there's about a 10% chance of that outcome.  You generally need 99.99994% certainty to publish a physics paper, that is, a 0.00006% chance of being wrong.  A 9.5% chance isn't even close to that

Only if you're extremely optimistic and you think that it's overwhelmingly likely that detectable intelligent life is out there, and that we've done everything possible to detect it do we see a paradox in the sense that our present situation seems very unlikely.  But when I say "overwhelmingly likely" I mean really overwhelmingly likely.  For example, even if you think both are 99% likely, then there's still about a 1-2% chance of not seeing evidence of life, depending on how likely you think false positives are.  If, on the other hand, you think it's unlikely that we could detect intelligent life even if it is out there, there's nothing like a paradox at all.


My personal guess is that we tend to overestimate the second of the two bullet points at the beginning.  There are good reasons to think that life on other planets is hard to detect, and our efforts so far have been limited.  In this view,  the probability that detectably intelligent life is out there right now is fairly low, even if the chance of intelligent life being out there somewhere in the galaxy is very high and the chance of it being out there somewhere in the observable universe is near certain.

As I've argued before, there aren't a huge number of habitable planets close enough that we could hope to detect intelligent life on them, and there's a good chance that we're looking at the wrong time in the history of those planets -- either intelligent life hasn't developed yet or it has but for one reason or another it's gone dark.

Finding out that there are potentially habitable worlds in our own solar system is exciting, but probably doesn't change the picture that much.  There could well be a technological civilizations in the oceans of Enceladus, but proving that based on what molecules we see puffing out of vents on the surface many kilometers above said ocean seems like a longshot.

With that in mind, let's put some concrete numbers behind a less optimistic scenario.  If there's a 10% chance of detectable intelligent life (as opposed to intelligent life we don't currently know how to detect), and there's a 5% chance we'd have detected it based on what we've done so far and a 1% chance of a false positive (that is, of the scientific community agreeing that life is out there when in fact it's not), then it's 98.6% likely we wouldn't have seen clear signs of life by now.   That seems fine.


While I'm conjecturing intermittently here, my own wild guess is that it's quite likely that some kind of detectable life is out there, something that, while we couldn't unequivocally say it was intelligent, would make enough of an impact on its home world that we could hope to say "that particular set of signatures is almost certainly due to something we would call life".   I'd also guess that it's pretty likely that in the next, say, 20 or 50 or 100 years we would have searched enough places with enough instrumentation to be pretty confident of finding something if it's there.  And it's reasonably likely that we'd get a false positive in the form of something that people would be convinced it was a sign of life when there in fact wasn't -- maybe we'd figure out our mistake in another 20 or 50 or 100 years.

Let's say life of some sort is 90% likely, there's a 95% chance of finding it in the next 100 years if it's there and a 50% chance of mistakenly finding life when it's not there, that is, a 50% chance that at some point over those 100 years we mistakenly convince ourselves we've found life and later turn out to be wrong.  Who knows?  False positives are based on the idea that there's no detectable life out there, which is another question mark.  But let's go with it.

I actually just ran those numbers a few paragraphs ago and came up with a 9.5% chance of not finding anything, even with those fairly favorable odds.

All in all, I'd say we're quite a ways from any sort of paradoxical result.


One final thought occurs to me:  The phrase "Fermi paradox" has been in the lexicon for quite a while, long enough to have taken on a meaning of its own.  Fermi himself, being one of the great physicists, was quite comfortable with uncertainty and approximation, so much so that the kind of "How many piano tuners are there in Chicago?" questions given to interview candidates are meant to be solved by "Fermi estimation".

I should go back and get Fermi's own take on the "Fermi paradox".  My guess was he wasn't too bothered by it and probably put it down to some combination of "we haven't really looked" and "maybe they're not out there".

If I find out I'll let you know.

[As noted above, I did in fact come across something --D.H Oct 2018]

Friday, July 6, 2018

Are we alone in the face of uncertainty?

I keep seeing articles on the Drake equation and the Fermi Paradox on my news feed, and since I tend to click through and read them, I keep getting more of them.  And since I find at least some of the ideas interesting, I keep blogging about them.  So there will probably be a few more posts on this topic.  Here's one.

One of the key features of the Drake equation is how little we know, even now, about most of the factors.  Along these lines, a recent (preprint) paper by Anders Sandberg, Eric Drexler and Toby Ord claims to "dissolve" the Fermi Paradox (with so many other stars out there why haven't we heard from them?), claiming to find "a substantial ex ante probability of there being no other intelligent life in our observable universe".

As far as I can make out, "ex ante" (from before) means something like "before we gather any further evidence by trying to look for life".  In other words, there's no particular reason to believe there should be other intelligent life in the universe, so we shouldn't be surprised that we haven't found any.

I'm not completely confident that I understand the analysis correctly, but to the extent I do, I believe it goes like this (you can probably skip the bullet points if math makes your head hurt -- honestly, some of this makes my head hurt):
  • We have very little knowledge of the some of the factors in the Drake equation, particularly fl (probability of life on a planet that might support life) fi (probability of a planet with life developing intelligent life) and L (the length of time a civilization produces a detectable signal)
  • Estimates of those range over orders of magnitude.
    • Estimates for L range from 50 years to a billion or even 10 billion years.
    • The authors do some modeling and come up with a range of uncertainty of 50 orders of magnitude for fl.  That is, it might be close to 1 (that is, close to 100% certain), or it might be more like 1 in 100,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000.  Likewise they take fi to range over three orders of magnitude, from near 1 to 1 in 1,000.
  • Rather than assigning a single number to every term, as most authors do, it makes more sense to assign a probability distribution.  That is, instead of saying "the probability of life arising on a suitable planet is 90%", or 0.01% or whatever, assign probability for each possible value (the actual math is a bit more subtle, but that should do for our purposes).  Maybe the most likely probability of life developing intelligence is 1 in 20, but there's a possibility, though not as likely, that it's actually 1 in 10 or 1 in 100, so take that into account with a probability distribution..
  • (bear in mind that the numbers were looking at are themselves probabilities, so we're assigning a probability that the probability is a given number -- this is the part that makes my head hurt a bit)
  • Since we're looking very wide ranges of values, a reasonable distribution is the "log normal" distribution -- basically "the number of digits fits a bell curve".
  • These distributions have very long tails, meaning that if, say, 1 in a thousand is a likely value for the chance of life evolving into intelligent life, then (depending on the exact parameters) 1 in a million may be reasonably likely, 1 in a billion not too unlikely and 1 in trillion is not out of the question.
  • The factors in the Drake equation multiply, following the rules of probability, so it's quite possible that the aggregate result is very small.
    • For example if it's reasonably likely that fl is 1 in a trillion and fi is 1 in a million, then we can't ignore the chance that the product of the two is 1 in a quintillion.
    • Numbers like that would make it unlikely that there is any life in our galaxy's few hundred billion stars and that ours just happened to get lucky.
  • Putting it all together, they estimate that there's a significant chance that we're alone in the observable universe.

I'm not sure how much of this I buy.

There are two levels of probability here.  The terms in the Drake equation represent what has actually happened in the universe.  An omniscient observer that knew the entire history of every planet in the universe (and exactly what was meant by "life" and "intelligent") could count the number of planets, the number that had developed life and so forth and calculate the exact values of each factor in the equation.

The probability distributions in the paper, as I understand it, represent our ignorance of these numbers.  For all we know, the portion of "habitable" planets with intelligent life is near 100%, or near 1 in a quintillion or even lower.  If that's the case, then the paper is exploring to what extent our current knowledge is compatible with there being no other life in the universe.  The conclusion is that the two are fairly compatible -- if you start with what (very little) we know about the likelihood of life and so forth, there's a decent chance that the low estimates are right, or even too optimistic, and there's no one but us.

Why?  Because low probabilities are more plausible than we think, and multiplying probabilities increases that effect.  Again, the math is a bit subtle, but if you have a long chain of contingencies, any one of them failing breaks the whole chain.  If you have several unlikely links in the chain, the chances of the chain breaking are even better.


The conclusion -- that for all we know life might be extremely rare -- seems fine.  It's the methodology that makes me a bit queasy.

I've always found the Drake equation a bit long-winded.  Yes, the probability of intelligent life evolving on a planet is the probability of life evolving at all multiplied by the probability of life evolving into intelligent life, but does that really help?

On the one hand, it seems reasonable to separate the two.  As far as we know it took billions of years to go from one to the other, so clearly they're two different things.

But we don't really know the extent of our uncertainty about these things.  If you ask for an estimate of any quantity like this, or do your own estimate based on various factors, you'll likely* end up with something in the wide range of values people consider plausible enough to publish (I'm hoping to say more on this theme in a future post).  No one is going to say "zero ... absolutely no chance" in a published paper, so it's a matter of deriving a plausible really small number consistent given our near-complete ignorance of the real number -- no matter what that particular number represents or how many other numbers it's going to be combined with.

You could almost certainly fit the results of surveying several good-faith attempts into a log-normal distribution.  Log-normal distributions are everywhere, particularly where the normal normal distribution doesn't fit because the quantity being measured has something exponential about it -- say, you're multiplying probabilities or talking about orders of magnitude.

If the question is "what is the probability of intelligent life evolving on a habitable planet?" without any hints as to how to calculate it, that is, one not-very-well-determined number rather than two, then the published estimates, using various methodologies, should range from a small fraction to fairly close to certainty depending on the assumptions used by the particular authors.  You could then plug these into a log normal distribution and get some representation of our uncertainty about the overall question, regardless of how it's broken down.

You could just as well ask "What is the probability of any self-replicating system arising on a habitable planet?", "What is the probability of a self-replicating system evolving into cellular life?"  "What is the probability of cellular life evolving into multicellular life?" and so forth, that is, breaking the problem down into several not-very-well-determined numbers.  My strong suspicion is that the distribution for any one of those sub-parts will look a lot like the distribution for the one-question version, or the parts of the two-question version, because they're basically the same kind of guess as any answer to the overall question.  The difference is just in how many guesses your methodology requires you to make.

In particular, I seriously doubt that anyone is going to cross-check that pulling together several estimates is going to yield the same distribution, even approximately, as what's implied by a single overall estimate.  Rather, the more pieces you break the problem into, the more likely really small numbers become, as seen in the paper.


I think this is consistent with the view that the paper is quantifying our uncertainty.  If the methodology for estimating the number of civilizations requires you to break your estimate into pieces, each itself with high uncertainty, you'll get an overall estimate with very high uncertainty.  The conclusion "we're likely to be alone" will lie within that extremely broad range, and may even take up a sizable chunk of it.  But again, I think this says much more about our uncertainty than about the actual answer.

I suspect that if you surveyed estimates of how likely intelligent life is using any and all methodologies*, the distribution would imply that we're not likely to be alone, even if intelligent life is very rare.  If you could find estimates of fine-grained questions like "what is the probability of multicellular life given cellular life?" you might well get a distribution that implied we're an incredibly unlikely fluke and really shouldn't be here at all.  In other words, I don't think the approach taken in the paper is likely to be robust in the face of differing methodologies.  If it's not, it's hard to draw any conclusions from it about the actual likelihood of life.

I'm not even sure, though, how feasible it would be to survey a broad sample of methodologies.  The Drake formulation dominates discussion, and that itself says something.  What estimates are available to survey depends on what methods people tend to use, and that in turn depends on what's likely to get published.  It's not like anyone somehow compiled a set of possible ways to estimate the likelihood of intelligent life and prospective authors each picked one at random.

The more I ponder this, the more I'm convinced that the paper is a statement about the Drake equation and our uncertainty in calculating the left hand side from the right.  It doesn't "dissolve" the Fermi paradox so much as demonstrate that we don't really know if there's a paradox or not.  The gist of the paradox is "If intelligent life is so likely, why haven't we heard from anyone?", but we really have no clear idea how likely intelligent life is.


* So I'm talking about probabilities of probabilities about probabilities?

Thursday, October 24, 2013

Arising by chance

Suppose you had a billion dice.  How many times would you expect to roll them before you got all sixes?  That would be six to the billionth power, or about ten to the 780 millionth, that is, a one with 780 million zeroes after it.  As big numbers go, that's bigger than astronomical, but still something you could print out, if only in tiny digits on a very big sheet of paper.  It's smaller than the monstrously big numbers I've discussed previously.  Archimedes' system could have handled it (see this post on big numbers  for more details on all that).

"Bigger than astronomical" means that there's essentially no chance that anyone will ever see a billion dice randomly come up all sixes, even if, say, we set every person alive to rolling a die over and over again, and on through the generations, even if we somehow colonized the galaxy with hordes of dice-rolling humans.

Now suppose that instead of rolling all the dice repeatedly, we just re-roll the ones that didn't come up sixes.  In that case, a bit more than 100 rolls will do.  Why?  With the first roll, about a sixth of the dice -- around 167 million, will come up sixes.  On the second roll, around a sixth of the 833 million or so remaining, or about 139 million, will come up sixes, leaving about 694 million.  Since we're rolling random dice here, these numbers won't be exact, but because we're rolling a whole bunch of dice, they'll be pretty close, percentage-wise.  With each roll there are about 5/6 as many dice left to roll as with the roll before.

At some point, you can no longer assume that close to 1/6 of the dice will come up sixes, but after 100 rolls you should be down to about a dozen, and it won't take too long to get the rest.

One more game before I explain what I'm up to:  Same billion dice, but this time, after an initial roll, you pick one die at random and roll it if it's not a six.  How many times do you have to do this pick-and-roll (sorry) before you have a complete set of sixes?

At the beginning, you have about 833 million non-sixes and it will take about seven tries before you change one of them to a six.  As more and more dice get changed to sixes, it gets harder and harder to find one that isn't already there.  The last die will take about 6 billion tries -- you'll need to roll it about six times, but you'll only get a chance to one in a billion tries.  All told, according to Wolfram Alpha's handy sum calculator, it will take about 20 billion tries before you get all your sixes.  That's not something you could do in an afternoon.  If you could do one try every second, it would take somewhat more than 600 years.  Not really feasible, but not unimaginable.


If we want to talk about something arising by a random process, it matters, and it matters a lot, what kind of random process we're talking about.  In a purely random process, where everything is re-done from scratch at every step, most interesting results will be completely, beyond-astronomically unlikely.  But a process can proceed randomly and still produce a highly-ordered result with very high probability, as long as there is some sort of state preserved from one step to the next.

For example, when sugar crystalizes out of sugar water to make rock candy, it is for all practical purposes completely random which sugar molecule sticks to which part of the growing crystal at any given point.  And yet, the crystal will grow, and grow in a highly, though not completely, predictable fashion, all without violating any laws of thermodynamics.

The end result will be something that would be completely implausible if sugar molecules behaved completely randomly, but they don't.  They behave essentially randomly when drifting around in a solution, but not when near a regular surface of other sugar molecules that's already there.  With each molecule added to the crystal, it's that much easier for the next one to find a place to attach (until enough sugar has crystalized out that the system reaches equilibrium).


Put another way, there is no single such thing as a random process.  There are infinitely many varieties of random process, some with more or less non-random state than others.  It's not meaningful to ask whether something could have arisen at random without specifying what kind of random processes we're talking about.