Saturday, April 27, 2019

Notes on the EHT black hole image

Following on to the recent post on the other blog about the imaging of a black hole in a nearby galaxy, here are my lightly cleaned up notes from watching Katie Bouman's lecture on how the images were captured using the Event Horizon Telescope (EHT), which is actually a collection of several radio telescopes whose individual data sets are combined using very sophisticated image processing techniques to emulate a single Earth-sized telescope.

I have no great insights to offer, but there are some pretty interesting details in the original lecture.  I've thrown in some commentary here and there, without taking any real care to distinguish it from the source material. The full lecture is on Caltech's YouTube channel.  As always, any errors or distortions are mine.

Technically, the image gathered by the telescopes working in combination is in spatial frequency, basically how quickly bright changes to dark as you scan over the actual image.  For example, a square divided into a white rectangle and a black rectangle has a lower spatial frequency than one divided into thin black and white stripes.  It's possible to reconstruct a visual image from information about the spatial frequency, and the more information you have, the better the reconstruction.  This sort of reconstruction is widely used in many fields, including astronomy, medical imaging and the image compression that lets us stream video over the internet.

The exact spatial frequencies being measured depend on the distance (baseline) between the telescopes.  When the telescopes are spread across distant locations on the Earth, the image technique is called Very Long Baseline Interferometry (VLBI).  Interferometry refers to measuring how the signals from different telescopes interfere with each other, that is, whether they reinforce each other or cancel each other out when combined.  Very Long Baseline refers to, um ...

In this case there were eight radio telescopes at six different sites, taking data together on four different nights, scanning for a few minutes at a stretch.  This provides 21 different pairs of telescopes, which would mean 21 different baselines, that is, 21 different spatial frequencies to combine, except that the baselines change as the Earth rotates.  This is good, because it provides samples of more frequencies.  There is at least some talk of adding telescopes in low Earth orbit at some later stage.

As mentioned in the other post, each scope collected hundreds of terabytes (hundreds of trillions of bytes), enough data that it was easier to fly the drives with the data to the site where it would be processed, rather than try to send it over the internet (and the South Pole data had to be flown out in any case).

The actual signal, coming from about 50 million light-years away, is quite weak, and noisy.  There is a lot of matter at the center of a galaxy, radiating at various frequencies and blocking various frequencies.  The radio band was chosen to minimize these effects, but there is still noise to contend with, along with noise from a variety of other sources, including atmospheric noise and imperfections in the telescopes themselves.

The various telescopes are looking through different amounts of atmosphere, depending on whether the black hole is overhead from their location or on the horizon, and this also affects how long it takes the signal to arrive at each telescope.  It takes about 20 milliseconds longer for the signal to reach a scope that has the black hole on the horizon that it does for one where it's overhead.  For a 1GHz signal, that means a difference of 20 million cycles -- and again, this is varying as the earth rotates.

All this means there is a lot of calibration to be done to make sure you're lining up the right data from the various telescopes and accounting for all the various sources of noise and distortion.  In all, there were hundreds of thousands of calibration parameters to work with, though the team ended up using ten thousand or so.

The reconstructed image is meant to represent what an Earth-sized telescope would see, if one could be built, but there are actually infinitely many such images that would match up with the limited data that was collected from a few scattered points on Earth.   Fortunately, having multiple telescopes means you can cross-check and isolate errors.  There are also "closure quantities" that should be the same regardless of the exact details of calibration (I didn't really understand the details of closure quantities). 

The team used two different methods of calibration, and they split into four subteams, two for each method.  The subteams worked independently for several weeks to reconstruct an image from the raw data.  Bouman's own work involved getting the human-tuned method (CLEAN) to work without human tuning.

In addition to the work by the subteams, there was a lot of work done up front to try to make sure that the image processing algorithms they were tuning were robust.  Much of this involved taking test images, calculating what signals the telescopes would get and then reconstructing the test images from the calculated signals.  They did this not only with images of discs and rings that one would typically expect in astronomical images, but also with a variety of non-astronomical images.

The team actively tried to break the image processing algorithms, for example by tuning the parameters to favor a solid disc rather than a ring, and verifying that a ring still came out as a ring.  Python fans will be glad to know that at least some of the code involved was written in Python.

The public image is a combination of the images from the subteams.  Since it's meant to represent only what we can be confident about regarding the actual black hole, it's deliberately a bit blurry.  Even still, there are some significant results [I haven't yet re-checked the details here, so what follows may be garbled]:
  • There is, in fact, a ring as would be expected from a supermassive black hole.
  • The same ring was observable night after night.
  • Previously there had been two separate estimates of the mass of the black hole, one from the motions of stars around it (stellar dynamics) and one from the appearance of the gas flowing into the accretion disc.  The size of the ring in the image supports the first estimate.
  • One portion of the ring is brighter than the rest, due to the doppler shift of matter orbiting around the black hole.  This is to be expected, and the position of the bright spot matches up with other observations.
  • The position of the bright spot is shifting over time, consistent with a rotation period on the order of weeks.  Since the black hole in the milky way (Sgr A*) is much smaller, its rotation period is expected to be on the order of minutes.  Since the observation scans are also on the order of minutes, the image processing for Sgr A* will have to take this into account.
It's also possible to analyze the raw data directly as a cross-check.  A disk of such-and-such size should result in such-and-such radio signals arriving at the various telescopes, independent of any reconstructions produced from that data.  The raw signals are consistent with a ring of the size found in the visual reconstructions.  This is not a surprise, given how the reconstructions were done, but still good to know.

Despite the whole effort taking years and requiring expensive radio telescopes at widely dispersed locations, and employing dozens of people at various stages, the movie Interstellar still cost a lot more to make.  Its image of a black hole may be prettier (and, as I understand it, one of the parts of the movie where the science was actually plausible), the EHT image was both cheaper and more informative.

Again, really good stuff.

Sunday, April 14, 2019

Notes on watching (a chunk of) a computer chess tournament

A while ago I started a post trying to untangle what kind of a problem it is to play chess well, based on how well various kinds of players -- human, "traditional" computer chess engines and neural-network-based chess engines -- did against each other.  I still think it's an interesting question, but in the mean time I actually watched a bunch of computer chess and that was even more interesting.  In particular, I looked in on a fair bit of's "CCC 7: Blitz Bonanza".  Here are some rough impressions.

The Blitz Bonanza is one part of a long-running event comprising thousands of individual games in several different formats (the nice thing about chess engines is they don't get tired, so it's practical to have two of them play each other dozens of times in the course of a few days).  The "Deep Dive", with longer time controls and slightly different rules, is going on now and will be for several weeks and there may be more after that (much of this is from memory because I can only seem to find current results on and the summaries on other sites don't always contain the exact information I'm looking for).

In the Blitz Bonanza, each player has five minutes of time to start with, plus two seconds per move made.  There is no "opening book", meaning that the players must calculate every move.  This leads to the rather odd sight of a clock ticking down near or past the 4-minute mark on move one, which is pretty much inevitably something everyone plays anyway (1. d4 and 1. e4 predominate).

Play continues either to checkmate by one side (though it enters a "get it over with" or "bullet" mode if both engines evaluate the current position as highly favorable to one or the other side), to a draw by stalemate, threefold repetition, 50 moves without a pawn move or capture, or "adjudication", meaning that the position is in a standard database of positions with six or fewer pieces on the board that cannot be won if both sides play perfectly (this includes "insufficient force" scenarios like king and bishop against king).

This is rather different from human matches, where a player will typically resign if it becomes clear there's no way to avoid losing, and players typically agree to a draw when it's clear that neither of them has a realistic shot at winning unless the other one blunders.  It's sometimes fun to watch an actual checkmate.  The draws can be interesting in their own way.  More on that later.

Scoring was the usual one point for a win, no points for a loss and half a point to each player for a draw

The original field consisted of

  • The premier "traditional" (alpha/beta or AB) engine, Stockfish
  • Three neural network-based (NN) engines (Leela Chess Zero, aka LC0, Leelenstein and Antifish)
  • Everyone else
In the first round, each of these played each of the others several times, half of the time as white and half as black.  The top four would progress to the next round.  When the smoke cleared, the top four were LC0, the other two neural network engines, and Stockfish rounding out the field in fourth.

In the second round, each of the four played each of the others 100 times (50 as white and 50 as black), for a total of 600 games.  This went on around the clock for several days.   The final result had LC0 in first, Stockfish close behind, well ahead of Leelenstein, which was itself well ahead of Antifish.  LC0 had a small lead over Stockfish in their head-to-head games.

Looking at this with my somewhat defocused engineer's eyes, I'd say "LC0 and Stockfish were about equal, both were clearly better than Leelenstein, and Antifish was clearly the worst of the lot."  I'd consider LC0 and Stockfish essentially equal not only because of the small difference in score (5.5 points out of a possible 300), but because the margin between the two was steady for most of the play.  LC0 won some early games against Stockfish head-to-head, but after that each scored several wins against the other.  This looks much more like a statistical fluke than a real difference in strength.  The results in CCC 6, with slightly different rules and time controls, bear this out.  Stockfish beat LC0, but not by much, and, if I recall correctly, not head-to-head.

As to my long-running theme of "dumb is smarter", the "zero" in LC0 means that it was trained purely by playing against itself, without any external hints.  Antifish, which finished last and had a losing record against each of the others, including Stockfish, was trained specifically to exploit weaknesses in Stockfish.  If we take LC0's "no extra information to bias the results" approach as one kind of "dumb" and Stockfish's "bash out lots of positions using fairly straightforward (but highly-tuned) rules" as another, it's pretty clear that "dumb" dominated.

One other thing jumped out: Neural networks are not good at endgames, or at least they can be spectacularly bad at some endgames.

Endgames are interesting here because there are few pieces on the board and it's often possible to spell out an explicit plan for winning.  For example, if you have a rook and a king against a rook, you can use your rook and king in concert to pin the other king against the edge of the board and then checkmate with the rook.  In other cases, you want to promote a pawn and this hinges on getting your king to the right square and keeping the other king from stopping you.

Over and over I saw neural network engines flail about aimlessly before hitting the 50 move limit.  To be clear, there are a few endgames that can take more than 50 moves to win, but even in these you can see a clear progression.  To make up an example, after a series of ten checks the opposing king is now one square closer to the corner where it will eventually be checkmated.  This isn't what was happening.  In at one game I saw, the AB engine showed mate in 12, while the NN engine saw its winning chances as quite good.  The game continued

AB: Oh, now you've got mate in 14
NN: Yeah this looks good
AB: I don't see a mate, but I think you're going to promote that pawn (or whatever) and win

NN: Yeah this looks good
AB: Dang, mate in 14 again
NN: Yeah this looks good
AB: Mate in 13
NN: Yeah this looks good
AB: I don't see a mate any more, but ...
NN: Yeah this looks good

From the NN's point of view, it was going from one position with good winning chances to another.  But it wasn't actually winning.  When the game eventually ended in a draw, the NN gave away a half point that the AB engine or a good human player would have gotten, given the same position.

I was a little (but not a lot) surprised by this.  It doesn't seem like the problem is beyond the capability of neural networks.  They've discovered plenty of conventional tactics -- forks, skewers and such, or ladders in the game of go -- and they are clearly able to encode at least some of the usual winning patterns.  Give a neural network a king and queen against a queen and it will find a checkmate.  On the other hand, give it a queen, a pawn and a rook against a king and it will quite likely forcibly give away two of the three and win with what's left.  To a mathematician it makes sense (reducing to a previously-solved case), but to most chess players it looks like lunacy, if not deliberate trolling.

I found this behavior particularly interesting because neural networks are supposed to have more of that special sauce we call intelligence or understanding.  We can clearly say that Stockfish has no deep understanding of chess the way that a human grandmaster can intuitively say "White's pieces are active.  Go for a kingside attack," but at least it plays plausible moves, and often ones humans would miss.  NN engines often find wild moves that a human wouldn't consider and an AB might avoid because it doesn't show a clear win, leading us to feel it has some deeper understanding of chess.  I think that's a valid feeling, but there are cases where it completely falls apart.

Watching a neural network botch an endgame just looks like farce.  I'm perfectly comfortable saying that a program "understands" a concept in chess, or "has a concept of" pawn structure or whatever.  It doesn't mean that the program has some reflective layer thinking "Oh, that's a strong pawn chain, I like that", but that it behaves in such a way that, one way or another, it must recognize good and bad structures.

In cases like these, though, it's abundantly clear that the network has no concept, in any meaningful literal or metaphorical sense of the word, of how to win the position.  Nor is it going to be able to figure one out over the board, because the network only learns during the training phase (that's not necessarily a requirement, but it's how these engines work).

This doesn't seem like a permanent situation.  For example, you could have an NN engine train on endgame positions, and it would likely discover ways to win outright and ways to steer toward a win in positions where it now flails.  This is pretty much what humans do -- you learn how to promote pawns as one skill, how to checkmate with a rook and king as another, and so forth.

What's at least a bit surprising is that an engine can be frighteningly strong in most of the game and completely clueless in a few key spots.  This has to say something about intelligence and learning, or about what kind of a problem chess-playing is technically, or about all of the above, but I'm not quite sure what.

Sunday, December 30, 2018

Computer chess: Dumber and smarter at the same time?

[As usual, I've added inline comments for non-trivial corrections to the original text, but this one has way more than most.  I've come up to speed a bit on the current state of computer chess, so that's reflected here.  The result is not the most readable narrative, but I'm not going to try to rewrite it --D.H. Apr 2019]

One of the long-running tags on the other blog is "dumb is smarter".  The idea is that, at least in the world of software, it's perilous to build a lot of specialized knowledge into a system.  Specialized knowledge means specialized blind spots -- any biases or incorrect assumptions in the specialization carry over to the behavior of the system itself.  In cases like spam filtering, for example, a skilled adversary can exploit these assumptions to beat the system.

I will probably follow up on the other blog at some point on how valid the web.examples of this that I cited really were, and to what extent recent developments in machine learning have changed the picture (spoiler alert: at least some).  Here I'd like to focus on one particular area: chess.  Since this post includes Deep Mind, part of the Alphabet family, I should probably be clear that what follows is taken from public sources I've read.

For quite a while and up until recently, the most successful computer chess programs have relied mainly on brute force, bashing out millions and millions of possible continuations from a given position and evaluating them according to fairly simple rules [in actual tournament games, it's not unusual to see over a billion nodes evaluated for a single move].  There is a lot of sophistication in the programming, in areas like making good use of multiple processors, avoiding duplicate work, representing positions in ways that make good use of memory, generating possible moves efficiently and deciding how to allocate limited (though large) processing resources to a search tree that literally grows exponentially with depth, but from the point of view of a human chess player, there's nothing particularly deep going on, just lots and lots of calculation.

For a long time, a good human player could outplay a good computer program by avoiding tactical blunders and playing for positional advantages that could eventually be turned into a winning position that even perfect tactical play couldn't save.  If the human lost, it would generally be by missing a tactical trick that the computer was able to find by pure calculation.  In any case, the human was looking at no more than dozens of possible continuations and, for the most part, calculating a few moves ahead, while the computer was looking at vastly more positions and exploring many more of them in much greater depth than a person typically would.

The sort of positional play that could beat a computer -- having a feel for pawn structure, initiative and such -- comes reasonably naturally to people, but it's not easily expressed in terms of an evaluation function.  The evaluation function is a key component of a computer chess engine that reduces a position to a number that determines whether one position should get more attention than another.  Since there are millions of positions to evaluate, the evaluation function has to be fast.  A typical evaluation function incorporates a variety of rules of the kind you'll find in beginning chess books -- who has more material, who has more threats against the other's pieces, whose pieces have more squares (particularly central squares) to move to, are any pieces pinned or hanging, and so forth.

There's quite a bit of tuning to be done here -- is it more important to have a potential threat against the opponent's queen or to control two squares in the center? -- but once the tuning parameters are set, they're set, at least until the next release.  The computer isn't making a subtle judgment based on experience.  It's adding up numbers based on positions and alignments of pieces.

It's not that no one thought of writing a more subtle evaluation function, one that would allow the program to look at fewer, but better positions.  It's just that it never seemed to work as well.  Put a program that looks at basic factors but looks at scads and scads of positions against one that tries to distill the experience of human masters and only look at a few good moves, and the brute force approach has typically won.  The prime example here would be Stockfish, but I'm counting, engines such as earlier versions of Komodo as brute force since, as I understand it, they use the same alpha/beta search technique and examine large numbers of positions.  I'm having trouble finding exact numbers on that, though [If you look at the stats on's computer chess tournaments, it's very easy to tell who's who.  For a 15|5 time control, for example, you either see on the order of a billion nodes evaluated per move or on the order of a hundred thousand.]

[Houdini is another interesting example.  Its evaluation function puts more weight on "positional" factors such as active pieces and space, and it tries to be highly selective in which moves it devotes the most resources too.  This is, explicitly, trying to emulate the behavior of human players.  So it's not quite correct to say that programs that try to emulate humans have done worse than ones that just bash out continuations.  Houdini has done quite well.

However, from a software point of view, these programs are all largely similar.  There is an evaluation function that's explicitly coded as rules you can read, and this is used to decide how much attention to pay to what part of the search tree.

Alpha zero, by contrast, uses machine learning (aka neural network training, aka deep neural networks) to build an evaluation function that's totally opaque when considered purely as code.  There are techniques to examine what a neural network is doing, but they're not going to reduce to rules like "a bishop is worth three times as much as a pawn".  It also uses a Monte Carlo approach to examine the search tree probabilistically, which is a somewhat different way to use the evaluation function to guide the search.  As I understand it, this is not the usual practice, though it's certainly possible to incorporate a random element into a traditional chess engine.

In short, the narrative of "purely mechanical programs beat ones that try to emulate humans" is not really right, but Alpha Zero still represents a radically different approach, and one that is in some sense structurally more similar to what things-with-brains do.  --D.H. Feb 2019]

This situation held for years.  Computer hardware got faster, chess programs got more finely tuned and their ratings improved, but human grandmasters could still beat them by exploiting their lack of strategic understanding.  Or at least that was the typical explanation:  Humans had a certain je ne sais quoi that computers could clearly never capture, not that the most successful ones were particularly trying to.  Occasionally you'd hear a stronger version: computers could never beat humans since they were programmed by humans and could never know more than their programmers, or some such, but clearly there's a hole in that logic somewhere ...

Then, in 1997, Deep Blue (an IBM project not directly related to Deep Mind) beat world champion Garry Kasparov in a rematch, Kasparov having won the first match between the two.  It wasn't just that Deep Blue won the games.  It outplayed Kasparov positionally just like human masters had been outplaying computers, but without employing anything you could point at and call strategic understanding.  It just looked at lots and lots of positions in a fairly straightforward way.

This isn't as totally surprising as it might seem.  The ultimate goal in chess is to checkmate the opponent.  In practice, that usually requires winning a material advantage, or opening up lines of attack, or promoting a pawn in the endgame, or some other tactical consideration.  Getting a positional advantage is just setting things up to make that more likely.  Playing a simple tactical game but looking twenty plies (half-moves) ahead turns out to be quite a bit like playing a strategic game. [In fact, computer-computer chess games are almost entirely positional, since neither player is going to fall for a simple tactical trap]

Human-computer matches aren't that common.  At first, the contests were too one-sided.  There was no challenge in a human grandmaster playing a computer.  Then, as computers became stronger, few grandmasters wanted to risk being the first to lose to a computer (and Kasparov is to be commended for taking up the challenge anyway).  Now, once again, the contests are too one-sided.  Human players use computers for practice and analysis, but everyone loses to them head-to-head, even with odds given (the computer plays without some combination of pieces and pawns).

At this writing, current world champion Magnus Carlsen, by several measures the strongest player ever or at the very least in the top handful, stands less than a 2% chance of beating the current strongest computer program head to head.  With the question of "will computers ever beat the best human players at chess?" firmly settled, human players can now be compared on the basis of how often they play what the computer would play.  Carlsen finds the computer move over 90% of the time, but it doesn't take many misses to lose a game against such a strong player.

And now comes AlphaZero.

The "alpha" is carried over from its predecessor, AlphaGo which, by studying games between human players of the game of go and using machine learning to construct a neural network for evaluating positions, was able to beat human champion Lee Sedol.  This was particularly significant because a typical position in go has many more possible continuations than a typical chess position, making the "bash out millions of continuations" approach impractical.  Given that computer chess had only fairly recently reached the level of top human players and go programs hadn't been particularly close, it had seemed like a good bet that humans would continue to dominate go for quite a while yet.

AlphaGo, and machine learning approaches in general, use what could be regarded as a much more human approach, not surprising since they're based on an abstraction of animal nervous systems and brains rather than the classic Turing/Von Neumann model of computing, notwithstanding that they're ultimately still using the same computing model as everyone else.  That is, they run on ordinary computer hardware, though often with a specialized "tensor processor" for handling the math.  However, the algorithm for evaluating positions is completely different.  There's still an evaluation function, but it's "run the positions of the pieces through this assemblage of multilinear functions that the training stage churned out".  Unlike a traditional chess engine, there's nothing you can really point at and say "this is how it knows about pins" or "this is how it counts up material".

AlphaGo looks at many fewer positions [by a factor of around 10,000] than a conventional chess engine based program would, and it looks at them probabilistically, that is, it uses its evaluation function to decide how likely it is that a particular continuation is worth looking at, then randomly chooses the actual positions to look at based on that.  It's still looking at many more positions than a human would, but many fewer than a pure brute-force approach would.

The "zero" is because the training stage doesn't use any outside knowledge of the game.  Unlike AlphaGo, it doesn't look at games played by humans.  It plays against itself and accumulates knowledge of what works and what doesn't.  Very roughly speaking, AlphaZero in its training stage plays different versions of its network against each other, adjusts the parameters based on what did well and what did badly, and repeats until the results stabilize.

AlphaZero does this knowing only the rules of the game, that is, what a position looks like, what moves are possible, and how to tell if the game is won, lost, drawn (tied) or not over yet.  This approach can be applied to a wide range of games, so far go, chess and shogi (a chess-like game which originated in Japan).  In all three cases AlphaZero achieved results clearly better than the (previous) best computer players after a modest number of hours of training (though the Stockfish team makes a good case that AlphaZero had a hardware advantage and wasn't playing against the strongest configuration).  Notably, AlphaZero beat AlphaGo 60 games to 40.

In one sense, AlphaZero is an outstanding example of Dumb is Smarter, particularly in beating AlphaGo, which used nearly the same approach, but trained from human games.  AlphaZero's style of play has been widely described as unique.  Its go version has found opening ideas that had lain undiscovered for centuries.  Its chess version plays sacrifices (moves that give up material in hopes of a winning attack) that "classical" chess engines pass up because they can't prove that they're sound.  Being unbiased by exposure to human games or a human-developed evaluation function, AlphaZero can find moves that other programs would never play, and it turns out these moves are good enough to win, even against chess engines that never make tactical mistakes.

On the other hand, AlphaZero specifically avoids sheer brute force.  Rather than look at lots and lots of positions using a relatively simple evaluation function, it looks at many fewer, using a much more sophisticated evaluation function to sharply limit the number of positions it examines.  This is the same approach that had been tried in the past with limited success, but with two key differences:  The evaluation function is expressed as a neural network rather than a set of explicit rules, and that neural network is trained without any human input, based solely on what works in practice games against AlphaZero itself.

The Dumb is Smarter tag on Field Notes takes "dumb" to mean "no special sauce" and "smarter" to mean "gets better results".  The "smarter" part is clear.  The "dumb" part is more interesting.  There's clearly no special sauce in the training stage.  AlphaZero uses a standard machine learning approach to produce a standard neural network.

On the other hand, if you consider the program itself without knowing anything about the training stage, you have a generic engine, albeit one with a somewhat unusual randomized search algorithm, and an evaluation function that no one understands in detail.  It's all special sauce -- a set of opaque, magical parameters that somehow give the search algorithm the ability to find the right set of variations to explore.

I think it's this opaqueness that gives neural networks their particularly uncanny flavor (uncanny, etymologically, roughly means "unknowable").  The basic approach of taking some inputs and crunching some numbers on them to get an output is unequivocally dumb.  As I said above, "It's adding up numbers based on positions and alignments of pieces."  Except that those numbers are enabling it to literally make "a subtle judgment based on experience", a judgment we have no real choice but to call "smart".

Progress marches on.  At least one of the previous generation of chess engines (Komodo) has incorporated ideas from AlphaZero [Leela has open-sourced the neural network approach wholesale].  It looks like the resulting hybrid isn't dominant, at least not yet, but it does play very strong chess, and plays in a completely different, more human, style from the "classical" version.  That's interesting all by itself.

Saturday, October 6, 2018

Debugging servers and bodies

For quite a while I've wanted to write up some thoughts about the nature of cause and effect.  In actually trying to do so, I realized two things: First, I don't know that much about the main streams of thought on this, which go back millennia, and second, that this was material for several posts.  Rather than step back and formulate an overarching structure for a series, or deep dive into the philosophical literature, I decided to just start somewhere and call that Part I, with the intention of coming back to the subject, probably with unrelated posts in between, from time to time.   And then I realized that there was no need to call it Part anything.  I could just introduce a new tag, cause and effect and apply it where necessary.  So here we go ...

I press keys on the keyboard and words appear on the screen.  It seems pretty clear that one is the cause of the other.  I take a cough suppressant and my cough subsides.  Quite likely it subsided because of the medicine, but maybe I was getting better anyway.  I run an ad online and sales go up.  But they've been going up for months.  Did they go up more because of the ad?  (half of your advertising budget is wasted; the trick is to know which half).  I wear a lucky shirt and my team wins.  I may want to think the two are related, but realistically it's just a bit of fun.

I spend a lot of my time debugging, trying to figure out why something isn't working the way it was expected to, whether it's why the TV isn't working at home or why a server crashed at work.  If I'm trying to figure out what's wrong with something electronic at home, I generally just turn things off and on until everything's back in a sane state.  That's usually fine at home, but not such a good idea at work.  Even if restarting the server fixes the problem, you still want to know why, so next time you won't have to restart it.

How do you tell if one thing caused another?  Debugging is much older than computing.  For example, the practice of debugging human health, that is, medicine, developed a useful paradigm in 1884, known as Koch's postulates after Robert Koch, for determining whether a particular microorganism causes a particular disease:
  1. The microorganism must be found in abundance in all organisms suffering from the disease, but should not be found in healthy organisms.
  2. The microorganism must be isolated from a diseased organism and grown in pure culture.
  3. The cultured microorganism should cause disease when introduced into a healthy organism.
  4. The microorganism must be reisolated from the inoculated, diseased experimental host and identified as being identical to the original specific causative agent.
Koch actually abandoned the "should not be found" part of postulate 1 after discovering that some people could carry a disease without showing symptoms.

Abstracting a little bit with an eye toward applying these postulates to an ailing server, one might say
  1. The conditions you think are causing the problem should be present in affected servers but ideally not in healthy ones.  For example, the unhealthy servers are all deployed in sector A and the healthy ones aren't or, more commonly, the unhealthy ones are running one version of the code while the healthy ones are running the previous version.
  2. This is a bit tricky, but I think it boils down to: There has to be a well-defined way to induce the conditions you think are causing the problem.  For example, you can start a new instance of a server in sector A or update a healthy server to the new version you think is buggy.  It isn't always immediately obvious how to do this.  For example, if you think that the problem is some sort of bad request coming from random parts of the internet, you'll probably have to search through logs to find requests that correspond with problems.
  3. If you trigger the conditions, the problem occurs.
  4. Again doesn't apply directly, but in this context I think it means double-checking for evidence that the conditions you thought were triggered really were triggered.  When you brought up the test server and it fell over, was it actually in sector A?  When you sent the query-of-death you found in the logs, did the server that fell over actually log receiving it just before it fell over?
The kind of double-checking in postulate 4 is crucial in real debugging.  It's very common, for example, to think you restarted a server with a new setting that should cause or fix a given problem, only to find that you restarted a different instance, or accidentally restarted it with the old configuration instead of the new.  For example, as I was writing this paragraph I realized that the command I thought would send a problem request to an unwell server I'd been debugging had actually failed, explaining why I saw no evidence of the request having been handled.

There's also a distinction, in both medicine and software, between fixing the problem at hand -- curing the patient or getting the service back online -- and pinning down exactly what happened.  In my business a common course of action is to roll the production servers back to the last configuration that was known to work, then use a test setup to try to reproduce the problem without impacting the production system.  The ultimate goal is a "red test" that fails with the buggy code and then passes ("goes green") with the fix.

In medicine, as I understand it, the work of isolating causes and developing vaccines and drugs similarly goes on in a laboratory environment until everyone is quite certain that the proposed treatment will be safe, and hopefully effective, in real patients.  In the mean time, doctors mostly do their best with known treatments.

While Koch's postulates are fairly famous, the kind of thing you remember from high school biology years later, they're not actually what modern medicine goes by, just like modern economists don't consult The Wealth of Nations, influential though it was.  One more modern approach can be found in Hill's criteria, a set of nine criteria for determining if a given cause is responsible for a given effect, but there are many other, more recent paradigms.

Notably, Hill's criteria and its modern cousins are not nearly so crisp as Koch's postulates.  The very name "postulate" suggests that you can obtain a rigorous proof, while "criteria" suggests something more indirect: if you don't meet the criteria, then you don't have cause and effect.  The criteria themselves are of the form "more of this suggests causality is more likely", and the end result is an idea of the probability that something is causing something else.

As in many other areas, switching from a yes/no answer to a probability solves a lot of problems, particularly the problem of gray areas where there are reasons to say either yes or no.  It does so, however, at the cost of being able to say for certain "X caused Y".  In my world, you very often can say with confidence "The tweaks we made to parsing in change number 271828 caused the server to reject these requests", but in my world we have a high degree of control of the system.  I can roll the server back to just before and after change number 271828 and run it in a test environment where I can control exactly what data it's trying to parse (or just write a "unit test" that exercises the problem code directly without spinning up a server).

In the field of medicine, and much of the scientific world, however, that's generally not the case.  If we're trying to determine whether eating carrots for years causes freckles, we can't really make people eat carrots for years and count their freckles every week for the duration.  Medicine doesn't, and shouldn't, have the same level of control over patients as I have over a server.  That is, there's less control over the possible causes.

There's typically also less certainty about the effects.  Lots of people get freckles whether or not they eat carrots, so you need more subtle statistical techniques to see if there's even a correlation between carrot eating and freckle getting.  This sort of thing is a major reason that it's generally not a good idea to pin too much on any single medical study, even if it's careful with the data and its interpretation.

Nonetheless, medicine advances.  Some of this is because research has pinned down some causes and effects to a good degree of certainty.  There's no doubt that vaccines were effective against smallpox and continue to be effective against other diseases, albeit not always perfectly.  There's no doubt that antibiotics can be effective against bacterial infections, or that some bacteria have evolved defenses against them.  There's no realistic doubt that smoking causes a number of ill effects, up to and including lung cancer and emphysema.

But medicine is useful even in the absence of certainty.  If there's an 80% chance you've got condition X and the treatment is 90% effective with a 5% chance of major side effects, and condition X is 95% fatal if left untreated, you should probably go for the treatment.  If the treatment is 5% effective with a 90% chance of major side effects, and condition X is almost never fatal, you probably don't want to.  You don't need absolute certainty to make a decision like this, or even to know the exact causes and effects involved.

Sunday, September 9, 2018

When is a space elevator not a space elevator?

Canadian space and defense company Thoth Technology has recently announced plans for a "Space Elevator".  The idea is to use pressurized Kevlar to build a tower 15 kilometers high (nearly twice as high as Mount Everest), complete with hotel and observation decks, and to launch and land space planes using a rooftop runway.  It's an ambitious project, to say the least, but I wouldn't call it a space elevator -- thus the they-said-it-I-didn't quotation marks.

The term space elevator usually refers to a structure built around a very long cable with one end attached to a counterweight somewhere beyond geosynchronous orbit, around 36,000 kilometers above the surface of the Earth.  The cable remains taut due to centrifugal force*.  Climbers move up and down the structure, delivering payloads into space.  To be clear, we're not even close to being able to build such a thing.  For one thing, that cable would be under way more tension than any commercially available material can withstand (it's not theoretically impossible, just not practical with current technology).

One important point to keep in mind here is that to put something into orbit, it's not enough simply to lift it out of the atmosphere.  If you took a payload up a space elevator to an altitude of 200km, a typical distance for a low Earth orbit, and let it go, it would fall back to the ground.  It would start with a forward motion of around 50 km/h -- a typical speed limit for residential streets -- and lose most or all of it to wind resistance on the way down.  To go into orbit, it would need a forward motion of more like 28,000 km/h.  As Randall Munroe saysgetting to space is easy. The problem is staying there.

Since a space elevator cable is pulled taut by its counterweight, and it's rotating in sync with the Earth's rotation, the higher up you go, the faster that bit of the elevator is moving relative to the ground.  As a climber goes up the elevator, it will feel a slight force pushing it in the direction the cable is moving (technically a Coriolis force).  This is the force of the cable keeping the climber moving with it, faster and faster as you move farther out from Earth.

You also need less speed to go into orbit as you get higher -- the Moon orbits at less than 4000 km/h, for example.  By the time a climber got to about 36,000 kilometers, not much less distance than going around the world, it would finally have enough velocity (about 11,000 km/h at that point) that a payload would stay in orbit if released.  From the point of view of someone on the elevator it would just appear to stay where it was, at least for a bit.  In real life, it would tend to drift away over time due to factors like the gravity of the Moon and the Earth not being a perfect sphere.  For basically the same reason, the point where the cable reaches geosynchronous orbit would experience zero g (again neglecting secondary effects).  Points below would experience a pull toward Earth, however slight, and points above would experience a pull away from Earth.

Where does that leave us with Thoth's structure?

A 15km tower is nowhere near high enough to function as a space elevator.  The advantage to launching from there is not the difference in speed between the ground and the top, which would amount to a moderate walking pace, but being above about 90% of the atmosphere.  That's certainly helpful, but not enough to neglect the atmosphere entirely.  To do that, you would need to be somewhere above the Kármán line, somewhat arbitrarily defined as 100km, though the Wikipedia article asserts that 160km is the lowest point at which you can complete an orbit without further propulsion.

While Thoth's structure might be useful in launching thanks to bypassing much of the atmosphere, landing, on the other hand, seems a bit of a stretch.  If you're leaving low Earth orbit, you'll have to get rid of that 28,000 km/h somehow.  You could use rockets, the same as you used to get into orbit, but that means more fuel -- not just twice as much but a bunch more, because the extra fuel is just dead weight on the way up and you'll need more fuel to compensate for that.  And more fuel for that extra fuel, and so forth.

This is why actual reentry uses the Earth's atmosphere to turn kinetic energy into heat.  If you're going to do that, you might as well go all the way to the ground, where you can have a nice big runway, and safety crews and other amenities if you're a spaceplane, or at least your choice of an ocean or a large expanse of open ground to aim for if you're not.

Of course, if you have an actual space elevator, you can re-attach to the elevator at geosync and let the climber spill your kinetic energy gradually on the way down (neglecting the not-so-small matter of moving from your current orbit back to geosync).  But that's not what Thoth is proposing.

So the whole "space elevator" thing is marketing hype.  Is there anything left after you account for that?  Well, maybe.  It depends on the numbers.

The Really Tall Tower idea still has some value, I think.  Actually building it would turn up all kinds of interesting issues in building tall structures, from how to build a stable structure about 20 times taller than the current record holder to the logistics of supporting a crew well above the Everest death zone and even just getting materials to a construction site 15km up in the air.  At the end, though, you'd at least have a unique hotel property and, depending on how much load that pressurized Kevlar can take, potentially a whole lot of residential and office space.

Do you have a space elevator?  No.

Do you have a compelling value proposition for someone wanting to put things in orbit in a world where ground-launched rockets are pretty much the only game in town?  I really don't know enough to say, but my guess is that it would be better if the business model didn't depend on that happening.

(*) If you prefer, feel free to recast this in terms of centripetal force.  You'll get the same vectors, just with different names.

Saturday, July 28, 2018

The woods are dark and full of terrors

I wanted to "circle back" on a comment I'd made on the "Dark Forest" hypothesis, basically the idea that we don't see signs of alien life because everyone's hiding for fear of everyone else.  This will probably be all I want to say about the "Fermi Paradox", at least until the next time I feel like posting about it ...

I haven't read the book, but my understanding is that the Dark Forest hypothesis is based on the observation that when a relatively technologically developed society on Earth has made contact with a less developed society, the results have generally not been pretty.   "Technology" in this context particularly means "military technology."

If you're a civilization at the point of being able to explore the stars, you're probably aware of this first hand from your own history, and the next obvious observation is that you're just at the beginning of the process of exploring the stars.  Is it really prudent to assume that there's no one out there more advanced?

Now put yourself in the place of that hypothetical more advanced civilization.  They've just detected signs of intelligent life on your world.  You are now either a threat to them, or a potential conquest, or both.  Maybe you shouldn't be so eager to advertise your presence.

But you don't actually see anyone out there, so there's nothing to worry about, right?  Not so fast.  Everyone else out there is probably applying the same logic.  They might be hunkering down quietly, or they might already be on their way, quietly, in order to get the jump on you, but either way you certainly shouldn't assume that not detecting anyone is good news.

Follow this through and, assuming that intelligent life in general isn't too rare at any given point in time, you get a galaxy dotted with technological civilizations, each doing its best to avoid detection, detect everyone else and, ideally, neutralize any threats that may be out there.  Kind of like a Hunger Games scenario set in the middle of a dark forest.

This all seems disturbingly plausible, at least until you take scale into account.

There are two broad classes of scenarios:  Faster-than-light travel is possible, or it's not.  If anyone's figured out how to travel faster than light, then all bets are off.  The procedure in that case seems pretty simple: Send probes to as many star systems as you can.  Have them start off in the outer reaches, unlikely to be detected, scanning for planets, then scanning for life on those planets.  If you find anything that looks plunderable, send back word and bring in the troops.  Conquer.  Build more probes.  Repeat.

This doesn't require listening for radio waves.  Put a telescope and a camera on an asteroid with a suitable orbit and take pictures as it swings by your planet of choice.  Or whatever.  The main point is if there's anyone out there with that level of technology, our fate is sealed one way or another.

On the other hand, if the speed of light really is a hard and fast limit, then economics will play a significant role.  Traveling interstellar distances takes a huge amount of energy and not a little time (from the home planet's point of view -- less for the travelers, particularly if they manage to get near light speed).  By contrast, in the period of exploration and conquest from the late 1400s to the late 1700s it was not difficult to build a seaworthy ship and oceans could be crossed in weeks or months using available energy from the wind.  The brutal fact is that discovering and exploiting new territories on Earth at that time was economically profitable.

If your aim is to discover and exploit resources in other star systems, then you have to ask what they might have that you can't obtain on your home system using the very large amount of energy you would have to use to get to the other system.  The only sensible answer I can come up with is advanced technology, which assumes that your target is more advanced than you are, in which case you might want to rethink.

Even if your aim is just to conquer other worlds for the evulz or out of some mostly-instinctive drive, you're fighting an extremely uphill battle.  Suppose you're attacking a planet 10 light-years away.  Messages from the home planet will take 10 years to reach your expeditionary force, and any reply will take another 10 years, so they're effectively on their own.

You detected radio transmissions from your target ten years ago.  It takes you at least 10 years to reach their planet (probably quite a bit longer, but let's take the best case, for you at least).  They're at least 20 years more advanced than when the signal that led you to plot this invasion left their planet.

You've somehow managed to assemble and send a force of thousands, or tens of thousands, or a million.  You're still outnumbered by -- well, you don't really know until you get there, do you? -- but hundreds to one at the least and more likely millions to one.  You'd better have a crushing technological advantage.

I could come up with scenarios that might work.  Maybe you're able to threaten with truly devastating weapons that the locals have no way to counter.  The locals treat with you and agree to become your loyal minions.

Now what?

Unless your goal was just the accomplishment of being able to threaten another species from afar, you'll want to make some sort of physical contact.  Presumably you land your population on the planet and colonize, assuming the planet is habitable to you and the local microbes don't see you as an interesting host environment/lunch (or maybe you've mastered the art of fighting microbes, even completely unfamiliar ones).

You're now on unfamiliar territory to which you're not well adapted, outnumbered at least a hundred to one by intelligent and extremely resentful beings that would love to steal whatever technology you're using to maintain your position.  Help is ten years away and if you're in a position to need it, is the home planet really going to want to send another wave out?  By the time they get there, the locals will have had another twenty years to prepare since you sent your distress call, this time with access to at least some of your technology.

I'm always at least a little skeptical of the idea that other civilizations will think like we do.  Granted, it doesn't seem too unreasonable to assume that anyone who gets to the point that we would call them "technological" is capable of doing the same kind of cost/benefit analyses that we do.  On the other hand, it also seems reasonable to assume that they have the same sort of cognitive biases and blind spots that we do.

The "soft" sciences are a lot about how to model the aggregate behavior of not-completely rational individuals.  There's been some progress, but there's an awful lot we don't know even about our own species, which we have pretty good access to.  When it comes to hypothetical aliens, I don't see how we can say anything close to "surely they will do thus-and-such", even if there are practical limits on how bonkers you can be and still develop technology on a large scale.

In the context of the Dark Forest, the question is not so much how likely it is that alien species are actually a danger to us, but how likely is it that an alien species would think they were in danger from another alien species (maybe us) and act on that by actively going dark.

Our own case suggests that's not very likely.  There may be quite a few people who think that an alien invasion is a serious threat (or for that matter, that one has already happened), or who think that it's unlikely but catastrophic enough if it did happen that we should be prepared.  That doesn't seem to have stopped us from spewing radio waves into the universe anyway.  Maybe we're the fools and everyone else is smarter, but imagine the level of coordination it would take to keep the entire population of a planet from ever doing anything that would reveal their presence.  This seems like a lot to ask, even if the threat of invasion seems likely, which for all we know it's probably not.

Overall, it seems unlikely that every single technological civilization out there would conclude that staying dark was worth the trouble.  At most, I think, there would be fewer detectable civilizations, but I still think it's more likely that whatever civilizations there are, have been or will be out there are too far away for our present methods to detect (and may always be), and that the window of opportunity for detecting them is either long past or far in the future.

Saturday, July 21, 2018

Fermi on the Fermi paradox

One of the pleasures of life on the modern web is that if you have a question about, say, the history of the Fermi paradox, there's a good chance you can find something on it.  In this case, it didn't take long to turn up E. M. Jones's "Where is Everybody?" an  Account of Fermi's Question.

The article includes letters from Emil Konopinski, Edward Teller and Herbert York, who were all at lunch with Enrico Fermi at Los Alamos National Laboratory some time in the early 1950s when Fermi asked his question.  Fermi was wondering specifically about the possibility that somewhere in the galaxy some civilization had developed a viable form of interstellar travel and had gone on to explore the whole galaxy, and therefore our little blue dot out on one of the spiral arms.

Fermi and Teller threw a bunch of arguments at each other, arriving at a variety of probabilities.  Fermi eventually concluded that probably interstellar travel just wasn't worth the effort or perhaps no civilization had survived long enough to get to that stage (I'd throw in the possibility that they came by millions of years ago, decided nothing special was going on and left -- or won't come by for a few million years yet).

Along the way Fermi, very much in the spirit of "How many piano tuners are there in Chicago?" broke the problem down into a series of sub-problems such as "the probability of earthlike planets, the probability of life given an earthlike planet" and so forth.  Very much something Fermi would have done, (indeed, this sort of exercise goes by the name "Fermi estimation") and very similar to what we now call the Drake equation.

In other words, Fermi and company anticipated much of the subsequent discussion on the subject over lunch more than fifty years ago and then went on to other topics (and presumably coffee).  There's been quite a bit of new data on the subject, particularly the recent discovery that there are in fact lots of planets outside our solar system, but the theoretical framework hasn't changed much at all.

What's a Fermi paradox?

So far, we haven't detected strong, unambiguous signs of extraterrestrial intelligence.  Does that mean there isn't any?

The usual line of attack for answering this question is the Drake equation [but see the next post for a bit on its origins --D.H Oct 2018], which breaks the question of "How many intelligent civilizations are there in our galaxy?" down into a series of factors that can then be estimated and combined into an overall estimate.

Let's take a simpler approach here.

The probability of detecting extraterrestrial intelligence given our efforts so far is the product of:
  • The probability it exists
  • The probability that what we've done so far would detect it, given that it exists
(For any math geeks out there, this is just the definition of conditional probability)

Various takes on the Fermi paradox (why haven't we seen anyone, if we're pretty sure they're out  there?) address these two factors
  • Maybe intelligent life is just a very rare accident.  As far as we can tell, Earth itself has lacked intelligent life for almost all of its history (one could argue it still does, so feel free to substitute "detectable" for "intelligent").
  • Maybe intelligent life is hard to detect for most of the time it's around (See this post for an argument to that effect and this one for a bit on the distinction between "intelligent" and "detectable").  A particularly interesting take on this is the "dark forest" hypothesis, that intelligent civilizations soon figure out that being detectable is dangerous and deliberately go dark, hoping never to be seen again.  I mean to take this one on in a bit, but not here.
  • One significant factor when it comes to detecting signs of anything, intelligent or otherwise: as far as we know detectability drops with the square of distance, that is, twice as far away means four times harder to detect.  Stars are far away.  Other galaxies are really far away.
  • Maybe intelligent life is apt to destroy itself soon after it develops, so it's not going to be detectable for very long and chances are we won't have been looking when they were there .  This is a popular theme in the literature.  I've talked about it here and here.
  • Maybe the timing is just wrong.  Planetary time scales are very long.  Maybe we're one of the earlier ones and life won't develop on nearby planets for another million or billion years (basically low probability of detection again, but also an invitation to be more rigorous about the role of timing)

At first blush, the logic of the Fermi paradox seems airtight: Aliens are out there.  We'd see them if they were out there.  We haven't seen them.  QED.  But we're not doing a mathematical proof here.  We're dealing in probabilities (also math, but a different kind).  We're not trying to explain a mathematically impossible result.  We're trying to determine how likely it is that our observations are compatible with life being out there.

I was going to go into a longish excursion into Bayesian inference here, but ended up realizing I'm not very adept at it (note to self: get better at Bayesian inference).  So in the spirit of keeping it at least somewhat simple, let's look at a little badly-formatted table with, granted, a bunch of symbols that might not be familiar:

We see life (S) We don't see life (¬S)
Life exists (L) P(L ∧ S) P(L ∧ ¬S) P(L)
No life (¬L) P(¬L ∧ S) P(¬L ∧ ¬S) P(¬L)

P(S) P(¬S) 100%

P is for probability.  P(L) is the probability that there's intelligent life out there we could hope to detect as such, at all.  P(S) is the probability that we see evidence strong enough that the scientific community (whatever we mean by that, exactly) agrees that intelligent life is out there.  The ¬ symbol means "not" and the ∧ symbol means "and".  The rows sum to the right, so
  • P(L ∧ S) + P(L ∧ ¬S) = P(L) (the probability life exists is the probability that life exists and we see it plus the probability it exists and we don't see it)
  • P(S + ¬S) = 100% (either we see life or we don't see it)
Likewise the columns sum downward.  Also "and" means multiply (as long as the two probabilities are independent; they are here, since we allow for false positives), so P(L ∧ S) = P(L)×P(S).  This all puts restrictions on what numbers you can fill in.  Basically you can pick any three and those determine the rest.

Suppose you think it's likely that life exists, and you think that it's likely that we'll see it if it's there.  That means you think P(L) is close to 100% and P(L ∧ S) is a little smaller but also close to 100% (see conditional probability for more details) .  You get to pick one more.  It actually turns out not to matter that much, since we've already decided that life is both likely and likely to be detected.  One choice would be P(¬L ∧ S), the chance of a "false positive", that is, the chance that there's no life out there but we think we see it anyway.  Again, in this scenario we're assuming false positives should be unlikely overall, but choosing exactly how unlikely locks in the rest of the numbers.

It's probably worth calling out one point that kept coming up while I was putting this post together: The chances of finding signs of life depend on how much we've looked and how we've done it.  A lot of SETI has centered around radio waves, and in particular radio waves in a fairly narrow range of frequencies.  There are perfectly defensible reasons for this approach, but that doesn't mean that any actual ETs out there are broadcasting on those frequencies.  In any case we're only looking at a small portion of the sky at any given moment, our current radio dishes can only see a dozen or two light years out and there's a lot of radio noise from our own technological society to filter out.

I could model this as a further conditional probability, but it's probably best just to keep in mind that P(S) is the probability of having detected life after everything we've done so far, and so includes the possibility that we haven't really done much so far. 

To make all this concrete, let's take an optimistic scenario: Suppose you think there's a 90% chance that life is out there and a 95% chance we'll see it if it's out there.  If there's no chance of a false positive, then there's an 85.5% chance that we'll see signs of life and so a 14.5% chance we won't (as is presently the case, at least as far as the scientific community is concerned).  If you think there's a 50% chance of a false positive, then there's a 90.5% chance we'll see signs of life, including the 5% chance it's not out there but we see it anyway.  That means a 9.5% chance of not seeing it, whether or not it's actually there.

This doesn't seem particularly paradoxical to me.  We think life is likely.  We think we're likely to spot it.  So far we haven't.  By the assumptions above, there's about a 10% chance of that outcome.  You generally need 99.99994% certainty to publish a physics paper, that is, a 0.00006% chance of being wrong.  A 9.5% chance isn't even close to that

Only if you're extremely optimistic and you think that it's overwhelmingly likely that detectable intelligent life is out there, and that we've done everything possible to detect it do we see a paradox in the sense that our present situation seems very unlikely.  But when I say "overwhelmingly likely" I mean really overwhelmingly likely.  For example, even if you think both are 99% likely, then there's still about a 1-2% chance of not seeing evidence of life, depending on how likely you think false positives are.  If, on the other hand, you think it's unlikely that we could detect intelligent life even if it is out there, there's nothing like a paradox at all.

My personal guess is that we tend to overestimate the second of the two bullet points at the beginning.  There are good reasons to think that life on other planets is hard to detect, and our efforts so far have been limited.  In this view,  the probability that detectably intelligent life is out there right now is fairly low, even if the chance of intelligent life being out there somewhere in the galaxy is very high and the chance of it being out there somewhere in the observable universe is near certain.

As I've argued before, there aren't a huge number of habitable planets close enough that we could hope to detect intelligent life on them, and there's a good chance that we're looking at the wrong time in the history of those planets -- either intelligent life hasn't developed yet or it has but for one reason or another it's gone dark.

Finding out that there are potentially habitable worlds in our own solar system is exciting, but probably doesn't change the picture that much.  There could well be a technological civilizations in the oceans of Enceladus, but proving that based on what molecules we see puffing out of vents on the surface many kilometers above said ocean seems like a longshot.

With that in mind, let's put some concrete numbers behind a less optimistic scenario.  If there's a 10% chance of detectable intelligent life (as opposed to intelligent life we don't currently know how to detect), and there's a 5% chance we'd have detected it based on what we've done so far and a 1% chance of a false positive (that is, of the scientific community agreeing that life is out there when in fact it's not), then it's 98.6% likely we wouldn't have seen clear signs of life by now.   That seems fine.

While I'm conjecturing intermittently here, my own wild guess is that it's quite likely that some kind of detectable life is out there, something that, while we couldn't unequivocally say it was intelligent, would make enough of an impact on its home world that we could hope to say "that particular set of signatures is almost certainly due to something we would call life".   I'd also guess that it's pretty likely that in the next, say, 20 or 50 or 100 years we would have searched enough places with enough instrumentation to be pretty confident of finding something if it's there.  And it's reasonably likely that we'd get a false positive in the form of something that people would be convinced it was a sign of life when there in fact wasn't -- maybe we'd figure out our mistake in another 20 or 50 or 100 years.

Let's say life of some sort is 90% likely, there's a 95% chance of finding it in the next 100 years if it's there and a 50% chance of mistakenly finding life when it's not there, that is, a 50% chance that at some point over those 100 years we mistakenly convince ourselves we've found life and later turn out to be wrong.  Who knows?  False positives are based on the idea that there's no detectable life out there, which is another question mark.  But let's go with it.

I actually just ran those numbers a few paragraphs ago and came up with a 9.5% chance of not finding anything, even with those fairly favorable odds.

All in all, I'd say we're quite a ways from any sort of paradoxical result.

One final thought occurs to me:  The phrase "Fermi paradox" has been in the lexicon for quite a while, long enough to have taken on a meaning of its own.  Fermi himself, being one of the great physicists, was quite comfortable with uncertainty and approximation, so much so that the kind of "How many piano tuners are there in Chicago?" questions given to interview candidates are meant to be solved by "Fermi estimation".

I should go back and get Fermi's own take on the "Fermi paradox".  My guess was he wasn't too bothered by it and probably put it down to some combination of "we haven't really looked" and "maybe they're not out there".

If I find out I'll let you know.

[As noted above, I did in fact come across something --D.H Oct 2018]

Friday, July 6, 2018

Are we alone in the face of uncertainty?

I keep seeing articles on the Drake equation and the Fermi Paradox on my news feed, and since I tend to click through and read them, I keep getting more of them.  And since I find at least some of the ideas interesting, I keep blogging about them.  So there will probably be a few more posts on this topic.  Here's one.

One of the key features of the Drake equation is how little we know, even now, about most of the factors.  Along these lines, a recent (preprint) paper by Anders Sandberg, Eric Drexler and Toby Ord claims to "dissolve" the Fermi Paradox (with so many other stars out there why haven't we heard from them?), claiming to find "a substantial ex ante probability of there being no other intelligent life in our observable universe".

As far as I can make out, "ex ante" (from before) means something like "before we gather any further evidence by trying to look for life".  In other words, there's no particular reason to believe there should be other intelligent life in the universe, so we shouldn't be surprised that we haven't found any.

I'm not completely confident that I understand the analysis correctly, but to the extent I do, I believe it goes like this (you can probably skip the bullet points if math makes your head hurt -- honestly, some of this makes my head hurt):
  • We have very little knowledge of the some of the factors in the Drake equation, particularly fl (probability of life on a planet that might support life) fi (probability of a planet with life developing intelligent life) and L (the length of time a civilization produces a detectable signal)
  • Estimates of those range over orders of magnitude.
    • Estimates for L range from 50 years to a billion or even 10 billion years.
    • The authors do some modeling and come up with a range of uncertainty of 50 orders of magnitude for fl.  That is, it might be close to 1 (that is, close to 100% certain), or it might be more like 1 in 100,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000.  Likewise they take fi to range over three orders of magnitude, from near 1 to 1 in 1,000.
  • Rather than assigning a single number to every term, as most authors do, it makes more sense to assign a probability distribution.  That is, instead of saying "the probability of life arising on a suitable planet is 90%", or 0.01% or whatever, assign probability for each possible value (the actual math is a bit more subtle, but that should do for our purposes).  Maybe the most likely probability of life developing intelligence is 1 in 20, but there's a possibility, though not as likely, that it's actually 1 in 10 or 1 in 100, so take that into account with a probability distribution..
  • (bear in mind that the numbers were looking at are themselves probabilities, so we're assigning a probability that the probability is a given number -- this is the part that makes my head hurt a bit)
  • Since we're looking very wide ranges of values, a reasonable distribution is the "log normal" distribution -- basically "the number of digits fits a bell curve".
  • These distributions have very long tails, meaning that if, say, 1 in a thousand is a likely value for the chance of life evolving into intelligent life, then (depending on the exact parameters) 1 in a million may be reasonably likely, 1 in a billion not too unlikely and 1 in trillion is not out of the question.
  • The factors in the Drake equation multiply, following the rules of probability, so it's quite possible that the aggregate result is very small.
    • For example if it's reasonably likely that fl is 1 in a trillion and fi is 1 in a million, then we can't ignore the chance that the product of the two is 1 in a quintillion.
    • Numbers like that would make it unlikely that there is any life in our galaxy's few hundred billion stars and that ours just happened to get lucky.
  • Putting it all together, they estimate that there's a significant chance that we're alone in the observable universe.

I'm not sure how much of this I buy.

There are two levels of probability here.  The terms in the Drake equation represent what has actually happened in the universe.  An omniscient observer that knew the entire history of every planet in the universe (and exactly what was meant by "life" and "intelligent") could count the number of planets, the number that had developed life and so forth and calculate the exact values of each factor in the equation.

The probability distributions in the paper, as I understand it, represent our ignorance of these numbers.  For all we know, the portion of "habitable" planets with intelligent life is near 100%, or near 1 in a quintillion or even lower.  If that's the case, then the paper is exploring to what extent our current knowledge is compatible with there being no other life in the universe.  The conclusion is that the two are fairly compatible -- if you start with what (very little) we know about the likelihood of life and so forth, there's a decent chance that the low estimates are right, or even too optimistic, and there's no one but us.

Why?  Because low probabilities are more plausible than we think, and multiplying probabilities increases that effect.  Again, the math is a bit subtle, but if you have a long chain of contingencies, any one of them failing breaks the whole chain.  If you have several unlikely links in the chain, the chances of the chain breaking are even better.

The conclusion -- that for all we know life might be extremely rare -- seems fine.  It's the methodology that makes me a bit queasy.

I've always found the Drake equation a bit long-winded.  Yes, the probability of intelligent life evolving on a planet is the probability of life evolving at all multiplied by the probability of life evolving into intelligent life, but does that really help?

On the one hand, it seems reasonable to separate the two.  As far as we know it took billions of years to go from one to the other, so clearly they're two different things.

But we don't really know the extent of our uncertainty about these things.  If you ask for an estimate of any quantity like this, or do your own estimate based on various factors, you'll likely* end up with something in the wide range of values people consider plausible enough to publish (I'm hoping to say more on this theme in a future post).  No one is going to say "zero ... absolutely no chance" in a published paper, so it's a matter of deriving a plausible really small number consistent given our near-complete ignorance of the real number -- no matter what that particular number represents or how many other numbers it's going to be combined with.

You could almost certainly fit the results of surveying several good-faith attempts into a log-normal distribution.  Log-normal distributions are everywhere, particularly where the normal normal distribution doesn't fit because the quantity being measured has something exponential about it -- say, you're multiplying probabilities or talking about orders of magnitude.

If the question is "what is the probability of intelligent life evolving on a habitable planet?" without any hints as to how to calculate it, that is, one not-very-well-determined number rather than two, then the published estimates, using various methodologies, should range from a small fraction to fairly close to certainty depending on the assumptions used by the particular authors.  You could then plug these into a log normal distribution and get some representation of our uncertainty about the overall question, regardless of how it's broken down.

You could just as well ask "What is the probability of any self-replicating system arising on a habitable planet?", "What is the probability of a self-replicating system evolving into cellular life?"  "What is the probability of cellular life evolving into multicellular life?" and so forth, that is, breaking the problem down into several not-very-well-determined numbers.  My strong suspicion is that the distribution for any one of those sub-parts will look a lot like the distribution for the one-question version, or the parts of the two-question version, because they're basically the same kind of guess as any answer to the overall question.  The difference is just in how many guesses your methodology requires you to make.

In particular, I seriously doubt that anyone is going to cross-check that pulling together several estimates is going to yield the same distribution, even approximately, as what's implied by a single overall estimate.  Rather, the more pieces you break the problem into, the more likely really small numbers become, as seen in the paper.

I think this is consistent with the view that the paper is quantifying our uncertainty.  If the methodology for estimating the number of civilizations requires you to break your estimate into pieces, each itself with high uncertainty, you'll get an overall estimate with very high uncertainty.  The conclusion "we're likely to be alone" will lie within that extremely broad range, and may even take up a sizable chunk of it.  But again, I think this says much more about our uncertainty than about the actual answer.

I suspect that if you surveyed estimates of how likely intelligent life is using any and all methodologies*, the distribution would imply that we're not likely to be alone, even if intelligent life is very rare.  If you could find estimates of fine-grained questions like "what is the probability of multicellular life given cellular life?" you might well get a distribution that implied we're an incredibly unlikely fluke and really shouldn't be here at all.  In other words, I don't think the approach taken in the paper is likely to be robust in the face of differing methodologies.  If it's not, it's hard to draw any conclusions from it about the actual likelihood of life.

I'm not even sure, though, how feasible it would be to survey a broad sample of methodologies.  The Drake formulation dominates discussion, and that itself says something.  What estimates are available to survey depends on what methods people tend to use, and that in turn depends on what's likely to get published.  It's not like anyone somehow compiled a set of possible ways to estimate the likelihood of intelligent life and prospective authors each picked one at random.

The more I ponder this, the more I'm convinced that the paper is a statement about the Drake equation and our uncertainty in calculating the left hand side from the right.  It doesn't "dissolve" the Fermi paradox so much as demonstrate that we don't really know if there's a paradox or not.  The gist of the paradox is "If intelligent life is so likely, why haven't we heard from anyone?", but we really have no clear idea how likely intelligent life is.

* So I'm talking about probabilities of probabilities about probabilities?

Monday, June 18, 2018

Did clickbait kill the aliens?

Disclaimer: This post is on a darker topic than most.  I've tried to adjust the tone accordingly, but if anything leads you to ask "How can he possibly say that so casually?", rest assured that I don't think any of this is a casual matter.  It's just that if we're talking at the scale of civilizations and stars we have to zoom out considerably from the everyday human scale, to the point where a truly horrible cataclysm becomes just another data point.

As I've noted elsewhere, the Fermi paradox is basically "It looks likely that there's life lots of other places in the universe, so why haven't we been able to detect it -- or why haven't they made it easy by contacting us?"  Or, as Fermi put it, "Where is everybody?"

One easy answer, though something of a downer, is "They're all dead."*

This is the idea that once a species gets to a certain level of technological ability, it's likely to destroy itself.  This notion has been floated before, in the context of the Cold War: Once it became technically possible, it took shockingly little time for humanity to develop enough nuclear weapons to pose a serious threat to itself.  One disturbingly ready conclusion from that was that other civilizations hadn't contacted us because they'd already blown themselves up.

While this might conjure up images of a galaxy full of  the charred, smoking cinders of once vibrant, now completely sterile planets, that's not exactly what the hypothesis requires.  Before going into that in detail, it's probably worth reiterating here that most planets in the galaxy are much too far away to detect directly against the background noise, or to be able to carry on a conversation with (assuming that the speed of light is the cosmic speed limit we think it is).  In order to explain why we haven't heard from anyone, we're really trying to explain why we haven't heard from anyone within, say, a hundred light years.  I've argued elsewhere that that narrows the problem considerably (though maybe not).

A full-scale nuclear exchange by everyone with nuclear weapons would not literally kill all life on Earth.  There are a lot of fungi and bacteria, and a lot of faraway corners like hydrothermal vents for all kinds of life to hide.  It probably wouldn't even kill all of humanity directly, but -- on top of the indescribable death and suffering from the bombing itself -- it would seriously damage the world economy and make life extremely difficult even in areas that weren't directly affected by the initial exchange.  Behind the abstraction of the "world economy" is the flow of food, medicine, energy and other essentials.

There is an extensive literature concerning just how bad things would get under various assumptions, but at some point we're just quibbling over levels of abject misery.  In no realistic case is bombing each other better for anyone involved than not bombing each other.

For our purposes here, the larger point is clear: a species that engages in a full-scale nuclear war is very unlikely to be sending out interstellar probes or operating radio beacons aimed at other stars.  It may not even be sending out much in the way of stray radio signals at all.  It might well be possible for a species in another star system to detect life in such a case without detecting signs of a technological civilization, much less communicating with it.

So how likely is a full-scale nuclear war?  We simply don't know.  So far we've managed to survive several decades of the nuclear age without one, but, as I've previously discussed, that's no time at all when it comes to estimating the likelihood of finding other civilizations.  To totally make up some numbers, supposed that, once nuclear weapons are developed, a world will go an average of a thousand years without seriously using them and then, after the catastrophe, take a couple of centuries to get back to the level of being able to communicate with the rest of the universe.

Again, who knows?  We (fortunately) have very little data to go on here.  In the big picture, though, this would mean that a planet with nuclear weaponry (or something similarly dangerous) would be 10-20% less likely to be detected than one without.  We also have to guess what portion of alien civilizations would be subject to this, but how likely is it, really, that someone would develop the ability to communicate with the stars without also figuring out how to do anything destructive with its technology?

My guess is  that "able to communicate across interstellar distances" is basically the same as "apt to destroy that ability sooner or later".  This applies particularly strongly to anyone who could actually send an effective interstellar probe.  The  kinetic energy of any macroscopic object traveling close to light speed is huge.  It's hard to imagine being able to harness that level of energy for propulsion without also learning how to direct it toward destruction.

For purposes of calculation, it's probably best to assume a range of scenarios.  In the worst case, a species figures out how to genuinely destroy itself, and perhaps even life on its planet, and is never heard from.  In a nearly-as-bad case, a species spends most of its time recovering from the last major disaster and never really gets to the point of being able to communicate effectively across interstellar distances, and is never heard from.  The upshot is a reduction in the amount of time a civilization might produce a detectable signal (or, in a somewhat different formulation, the average expected signal strength over time).

Our own case is, so far, not so bad, and let's hope it continues that way.  However, along with any other reasons we might not detect life like us on other planets, we can add the possibility that they're too busy killing each other to say hello.

With all that as context, let's consider a recent paper modeling the possibility that a technological civilization ends up disrupting its environment with (from our point of view here, at least) pretty much the same result as a nuclear war.   The authors build a few models, crunch through the math and present some fairly sobering conclusions: Depending on the exact assumptions and parameters, it's possible for a (simulated) civilization to reach a stable equilibrium with its (simulated) environment, but several other outcomes are also entirely plausible: There could be a boom-and-bust that reduces the population to, say, 10% of its peak.  The population could go through a repeating boom/bust cycle.  It could even completely collapse and leave the environment essentially unlivable.

So what does this add to the picture?  Not much, I think.

The paper reads like as a proof-of-concept of the idea of modeling an alien civilization and its environment using the same mathematical tools (dynamical system theory) used to model anything from weather to blood chemistry to crowd behavior and cognitive development.  Fair enough.  There is plenty of well-developed math to apply here, but the math is only as good as the assumptions behind it.

The authors realize this and take care only to make the most general, uncontroversial assumptions possible.  They don't assume anything about what kind of life is on the planet in question, or what kind of resources it uses, or what exact effect using those resources has on the planet.  Their assumptions are on the order of "there is a planet", "there is life on it", "that life consumes resources" and so forth.

Relying on few assumptions means that any conclusions you do reach are very general.  On the other hand, if the assumptions support a range of conclusions, how do you pick from amongst them?  Maybe once you run through all the details, any realistic set of assumption leads to a particular outcome -- whether stability or calamity.  Maybe most of the plausible scenarios are in a chaotic region where the slightest change in inputs can make an arbitrarily large difference in outputs.  And so forth.

As far as I can make out, the main result of the paper is that planets, civilizations and their resources can be modeled as dynamical systems.  It doesn't say what particular model is appropriate, much less make any claims about what scenarios are most likely for real civilizations on real exoplanets.  How could it?   Only recently has there been convincing evidence that exoplanets even exist.  The case that there is life on at least some of them is (in my opinion) reasonably persuasive, but circumstantial.  It's way, way to early to make any specific claims about what might or might not happen to civilizations, or even life in general, on other planets.

To be clear, the authors don't seem to be making any such claims, just to be laying some groundwork for eventually making such claims.  That doesn't make a great headline, of course.  The article I used to find the paper gives a more typical take: Climate change killed the aliens, and it will probably kill us too, new simulation suggests.

Well, no.  We're still in the process of figuring out exactly what effect global warming and the resulting climate change will have on our own planet, where we can take direct measurements and build much more accurate models than the authors of the paper put forth.  All we can do for an alien planet is lay out the general range of possibilities, as the authors have done.  Trying to draw conclusions about our own fate from our failure (so far) to detect others like us seems quite premature, whether the hypothetical cause of extinction is war or or a ruined environment.

There's a familiar ring to all this.   When nuclear destruction was on everyone's mind, people saw an obvious, if depressing, answer to Fermi's question.  As I recall, papers were published and headlines written.  Now that climate-related destruction is on everyone's mind, people see an obvious, if depressing, answer to Fermi's question, with headlines to match.  It's entirely possible that fifty years from now, if civilization as we know it is still around (as I expect it will be) and we haven't heard directly from an alien civilization (as I suspect we won't), people will see a different obvious, if depressing, answer to Fermi's question.  Papers will be written about it, headlines will do what headlines do, and it will all speak more to our concerns at the time than to the objective state of any alien worlds out there.

I want to be clear here, though.  Just because headlines are overblown doesn't mean there's nothing to worry about.  Overall, nuclear weapons take up a lot less cultural real estate than they did during the height of the cold war, but they're very much still around and just as capable of wreaking widespread havoc.  Climate change was well underway during that period, and already recognized as a hazard, but not nearly as prominent in the public consciousness as it is today.

It's tempting to believe in an inverse relationship between the volume of headlines and the actual threat: If they're making a big deal out of it, it's probably nothing to worry about.  But that's an empirical question to be answered by measurement.  It's not a given.  Without actually taking measurements, the safest assumption is the two are unrelated, not inversely related.  That is, how breathless the headlines are is no indication one way or another.

My own guess, again without actually measuring, is that there's some correlation.  People study threats and publish their findings.  By and large, and over time, there is significant signal in the noise.  If a range of people working in various disciplines say that something is cause for concern, then it most likely is.  Some part of this discussion finds its way into the popular consciousness, with various shorthands and outright distortions, but if you take the time to read past the headlines and go back to original sources you can get a reasonable picture, one that will bear at least some resemblance to the headlines.

Going back to original sources and getting the unruly details may not be as satisfying as a nice, punchy one-sentence summary, but I'd say it's worth the effort nonetheless.

(*) A similar but distinct notion is the "Dark forest" hypothesis: They're out there, but they're staying quiet so no one else kills them -- and we had best follow suit.  That's fodder for another post, though I think at least some of this post applies.