Intermittent Conjecture: statistics

Showing posts with label statistics. Show all posts

Friday, January 12, 2024

On knowing a lot about something and something about a lot of things

The physicist Richard Feynman told a story about being on a panel of experts from a variety of academic fields. The full details are in one of the Surely you're joking books I read many years ago. I'm paraphrasing from memory here because lazy. The gist is that the panel was asked to look at someone's paper that pulled together ideas from a variety of fields and was generating a lot of buzz. Just the sort of thing you'd want an interdisciplinary panel of experts to look at.

All the experts on the panel had a similar reaction: Overall, it looks very interesting, but the stuff in my area needs quite a bit of work -- this bit is a little bit off, they're mis-applying these terms and these parts are just wrong. But there are some really interesting ideas and this is definitely worth further attention.

In Feynman's telling, at least, he was the one to offer a different take: If every expert is saying the part they know about is bad, that says it's just bad all the way through. It doesn't really matter what an expert thinks of the area outside their expertise.

Relying on people's subjective impressions is risky. What we need here is some way to objectively determine the value of a paper that crosses areas of knowledge. Here's one way to do it: Have everyone rate the paper in each area on a scale of 0 - 100 and then pull together the numbers.

Let's say we have five people on the panel, specializing in music theory, physics, Thai cuisine, medieval literature and athletics, and someone has written a paper pulling together ideas from these fields into an exciting new synthesis. Their ratings might be:

	Music	Physics	Thai food	Medi. lit	Athletics	Overall
Music theorist	25	75	80	65	85	66
Physicist	70	15	80	60	60	57
Thai chef	65	85	5	70	70	59
Medievalist	90	70	80	25	85	70
Athlete	85	90	95	90	30	78
Overall	67	67	68	62	66	66

Overall, the panel rates the paper 66 out of 100. We don't have enough context here to know whether 66 is a good score or a mediocre score, but it certainly doesn't look horrible. The highest score is in Thai cuisine, and the highest score there was from the athletics expert, so maybe the author has discovered some interesting contribution to Thai food by way of athletics.

But hang on a minute. The highest overall score is in Thai cuisine, but the lowest rating in that category from any expert is the 5 from the Thai chef. Let's ask each of the experts how much they know about their fields and those outside their home turf:

	Music	Physics	Thai food	Medi. lit	Athletics
Music theorist	95	5	15	10	5
Physicist	20	100	10	5	5
Thai chef	5	10	100	10	15
Medievalist	10	5	10	95	10
Athlete	10	15	5	10	95

Everyone feels confident in their own field, as you might expect, and they don't feel particularly confident outside their own field, which also makes sense. There's also quite a bit more variation outside the home fields, which makes a certain amount of sense as well. Maybe the physicist happens to have taken a couple of courses in music theory. Maybe the athlete has only had Thai food once. You can expect someone to have studied extensively in their field, but who knows what they've done outside it.

We should take this into account when looking at the ratings. A Thai chef saying that the paper is weak in Thai cuisine means more than an athlete saying it's great. If we take a weighted average by multiplying each rating by the panelist's confidence, adding those up and dividing by the total weight (that is, the total of the confidence numbers), we get a considerably different picture:

	Music	Physics	Thai food	Medi. lit	Athletics	Overall
Weighted result	40	33	27	38	42	36

Overall, the paper rates 36 out of 100 rather than 66. Its weakest area is Thai cuisine, and even its strongest area, athletics, is well below the previous score of 66.

This seems much more plausible. The person who knows Thai food best rated it low, and now we're counting that ten times more heavily than the physicist's rating and twenty times more heavily than the judge who said they knew least about it.

I think there are a few lessons to be drawn here. First, it's important to take context into account. The medievalist's rating means a lot if it's about Medieval literature and not much if it's about physics, unless they also happen to have a background there. Second, just putting numbers on something doesn't make it any more or less rigorous. The 66 rating and the 36 rating are both numbers, but one means a lot more than the other.

Third, when it comes specifically to averages, a weighted average can be a useful tool for expressing how much any particular data point should count for. Just be sure to assign the weights independently from the numbers you're weighting. Asking the panelists ahead of time how much they know about each field makes sense. Looking at rating numbers and then deciding how much to weight them is a classic example of data fiddling.

Finally, it's worth keeping in mind that people often give the benefit of the doubt to something that sounds plausible when they don't have anything better to go on. As I understand it, this was the case in Feynman's example. In that case, giving the paper to a panel of experts from different fields gave the author much more room to hide than if they'd, say, submitted a shortened version of the paper for each field.

The answer is not necessarily to actively distrust anything from outside one's own expertise, but it's important not to automatically trust something you don't know about just because it seems reasonable. The better evaluation isn't "I don't believe it" but "I really can't say".

I'll leave it up to the reader how any of this might apply to, say, generative AI, LLMs and chatbots.

Wednesday, October 27, 2021

Mortality by the numbers

The following post talks about life expectancy, which inevitably means talking about people dying, and mostly-inevitably doing it in a fairly clinical way. If that's not a topic you want to get into right now, I get it, and I hope the next post (whatever it is) will be more appealing.

Maybe I just need to fix my news feed, but in the past few days I've run across at least two articles stating that for most of human existence people only lived 25 years or so.

Well ... no.

It is true that life expectancy at birth has taken a large jump in recent decades. It's also true that estimates of life expectancy from prehistory up to about 1900 tend to be in the range of 20-35 years, and that estimates for modern-day hunter-gatherer societies are in the same range. As I understand it, that's not a complete coincidence since estimates for prehistoric societies are generally not based on archeological evidence, which is thin for all but the best-studied cases, or written records, which by definition don't exist. Rather, they're based on the assumption that ancient people were most similar to modern hunter-gatherers, so there you go.

None of this means that no one used to live past 25 or 30, though. The life expectancy of a group is not the age by which everyone will have died. That's the maximum lifespan. Now that life expectancies are in the 70s and 80s, it's probably easier to confuse life expectancy with maximum lifespan, and from there conclude that life expectancy of 25 means people didn't live past 25, but that's not how it works. For example, in the US, based on 2018 data, the average life expectancy was 78.7 years, but about half the population could expect to still be alive at age 83, and obviously there are lots of people in the US older than 78.7 years. The story is similar for any real-world calculation of life expectancy.

A life expectancy of 25 years means that if you looked at everyone in the group you're studying, say, everyone born in a certain place in a given year, then counted up the total number of years everyone lived and divided that by the number of people in your group, you'd get 25 years. For example, if your group includes ten people, three of them die as infants and the rest live 10, 15, 30, 35, 40, 50 and 70 years, that's 250 person-years. Dividing that by ten people gives 25 years.

No matter what particular numbers you use, the only way the life expectancy can equal the maximum lifespan is if everybody lives to exactly that age. If some people in a particular group died younger than the life expectancy, that means that someone else lived longer.

Sadly, the example above is likely a plausible distribution for most times and places. Current thinking is that for most of human existence, infant mortality has been much higher than it is now. If you survived your first year, you had a good chance of making it to age 15, and if you made it that far, you had a good chance of living at least into your forties and probably your fifties. In the made-up sample above, the people who made it past 15 lived to an average age of 45. However, there was also a tragically high chance that a newborn wouldn't survive that first year.

Life expectancies in the 20s and 30s are mostly a matter of high infant mortality, and to a lesser extent high child mortality, not a matter of people dying in their mid 20s. For the same reason, the increase in life expectancy in the late 20th century was largely a matter of many more people surviving their first year and of more children surviving into adulthood (even then, the rise in life expectancy hasn't been universal).

In real environments where average life expectancy is 25, there will be many people considerably older, and a 24-year-old has a very good chance of making it to 25, and then to 26 and onward. The usual way of quantifying this is with age-specific mortality, which is the chance at any particular birthday that you won't make it to the next one (this is different from age-adjusted mortality, which accounts for age differences when comparing populations).

At any given age, you can use age-specific mortality rates to calculate how much longer a person can expect to live. By itself, "life expectancy" means "life expectancy at birth", but you can also calculate life expectancy at age 30, or 70 or whatever. From the US data above, a 70-year old can expect to live to age 86 (85.8 if you want to be picky). A 70-year-old has a significantly higher chance of living to be 86 than someone just born, just because they've already lived to 70, whether or not infant mortality is low and whether the average life expectancy is in the 70s or 80s or in the 20s or 30s. They also have a 100% chance of living past 25.

Looking at it from another angle, anyone who makes it to their first birthday has a higher life expectancy than the life expectancy at birth, anyone who makes it to their second birthday has a higher life expectancy still, and so forth. Overall, the number of years you can expect to live beyond your current age goes down each year, because there's always a chance, even if it's small, that you won't live to see the next year. However, it goes down by less than a year each year, because that chance isn't 100%. Even as your expected number of years left decreases, your expected age of death increases, but more and more slowly as you age.

Past a certain point in adulthood, age-specific mortality tends to increase exponentially. Since the chances of dying at, say, age 20 are pretty low, and the doubling period is pretty long, around 8-10 years, and the maximum for any probability is 100%, this doesn't produce the hockey-stick graph that's usually associated with exponential growth, but it's still exponential. Every year, your chance of dying is multiplied by a fairly constant factor of around 1.08 to 1.09, or 8-9% annual growth, compounded. Again from the US data, at age 20 you have about a 0.075% chance of dying that year. At age 87, it's about 10%. At age 98, it's about 30%.

This isn't a law of nature, but an empirical observation, and it doesn't seem to quite hold up at the high end. For example, CDC data for the US shows a pretty plausibly exponential increase up to age 99, where the table stops, but extrapolating, the chance of death would become greater than 100% somewhere around age 110, even though people in the US have lived longer than that.

It's been predicted at some point, thanks to advances in medicine and other fields, life expectancy will start to increase by more than one year per year, and as a consequence anyone young enough when this starts to happen will live forever. Life expectancy doesn't work that way, either. There could be a lot of reasons for life expectancy in some population to go up by more than a year in any given year.

Again, the important measure is age-specific mortality. If the chances of living to see the next year increase just a bit for people from, say, 20 to 50, life expectancy could increase by a year or more, but that just means that more people are going to make it into old age. It doesn't mean that they'll live longer once they get there.

The key to extending the maximum lifespan is to increase the chances that an old person will live longer, not to increase the chances that someone will live to be old. If, somehow, anyone 100 or older, but only them, suddenly had a steady 99% chance of living to their next birthday, then the average 100-year-old could look forward to living to about 169. This wouldn't have much effect on overall life expectancy, though, because there aren't that many 100-year-olds to begin with.

What are the actual numbers, once you get past, say, 100? It's hard to tell, because there aren't very many people that old. How many people live to a certain age depends not only on age-specific mortality, but on how many people are still around at what younger ages. This may seem too obvious to state, but it's easy to lose track of this if you're only looking at overall probabilities.

Currently there's no verified record of anyone living to 123 and only one person has been verified to live past 120. No man has been verified to live to 117, and only one has been verified to have lived to 116. Does that mean that no one could live to, say, 135? Not necessarily. Does it mean that women inherently live longer than men? Possibly, but again not necessarily. Inference from rare events is tricky, and people who do this for a living know a lot more about the subject than I do, but in any case we're looking at handfuls out of however many people have well-verified birth dates in the early 1900s.

Suppose, for the sake of illustration, that after age 100 you have a steady 50/50 chance of living each subsequent year. Of the people who live to 100, only 1/2 will live to 101, 1/4 to 102, then 1/8, 1/16 and so forth. Only 1 in 1024 will live to be 110 and only 1 in 1,048,576 -- call it one in a million -- will live to 120.

If there are fewer than a million 100-year-olds to start with, the odds are against any of them living to 120, but they're not zero. At any given point, you have to look at the ages of the people who are actually alive, and (your best estimate of) their odds of living each additional year. If there are a million 100-year-olds now and each year is a 50/50 proposition, there probably won't be any 120-year-olds in twenty years, but if there does happen to be a 119-year-old after 19 years, there's a 50% chance there will be a 120-year-old a year later. By the same reasoning, it's less likely that there were any 120-year-olds a thousand years ago, not only because age-specific mortality was very likely higher, but because there were simply fewer people around, so there were fewer 100-year-olds with a chance to turn 101, and so forth.

In real life, a 100-year-old has a much better than 50% chance of living to be 101, but we don't really know if age-specific mortality ever levels off. We know that it's less than 100% at age 121, because someone lived to be 122, but that just indicates that at some point there's no longer an exponential increase in age-specific mortality (else it would hit 100% before then, based on the growth curve at ages where we do have a lot of data). It doesn't mean that the mortality rate levels off. It might still be increasing to 100%, but slowly enough that it doesn't actually hit 100% until sometime after age 121.

It may well be that there's some sort of mechanism of human biology that prevents anyone from living past 122 or thereabouts, and some mechanism of female human biology in particular that sets the limit for women higher than for men. On the other hand, it may be that there aren't any 123-year-olds because so far only one person has made it to 122, and their luck ran out.

Similarly, there may not have been any 117-year-old men because not enough men made it to, say, 80, for there to be a good chance of any of them making it to 116. That in turn might be a matter of men being more likely to die younger, for example in the 20th-century wars that were fought primarily by men. I'm sure that professionals have studied this and could probably confirm or refute this idea. The main point is that at after a certain point the numbers thin out and it becomes very tricky to sort out all the possible factors behind them.

On the other hand, even if it's luck of the draw that no one has lived to 123, there could still be an inherent limit, whether it's 124, 150 or 1,000, just that no one's been lucky enough to get there.

Along with the difference between life expectancy and lifespan, and the importance of age-specific mortality, it's important to keep in mind where the numbers come from in the first place. Life expectancy is calculated from age-specific-mortality, and age-specific mortality is measured by looking at people of a given age who are currently alive. If you're 25 now, your age-specific mortality is based on the population of 25-year-olds from last year and what proportion of them survived to be 26. Except in exceptional circumstances like a pandemic, that will be a pretty good estimate of your own chances for this year, but it's still based on a group you're not in, because you can only measure things that have happened in the past.

If you're 25 and you want to calculate how long you can expect to live, you'll need to look at the age-specific mortalities for age 25 on up. The higher the age you're looking at, the more out-of-date it will be when you reach that age. Current age-specific mortality for 30-year-olds is probably a good estimate of what yours will be at age 30, but current age-specific mortality at 70 might or might not be. There's a good chance that 45 years from now we'll be significantly better at making sure a 70-year-old lives to be 71.

Even if medical care doesn't change, a current 70-year-old is more likely to have smoked, or been exposed to high levels of carcinogens, or any of a number of other risk factors, than someone who's currently 25 will have been when they're 70. Diet and physical activity have also changed over time, not necessarily for the better or worse, and it's a good bet they will continue to change. There's no guarantee that our future 70-year-old's medical history will include fewer risk factors than a current 70-year-old's, but it will certainly be different.

For those and other reasons, the further into the future you go, the more uncertain the age-specific mortality becomes. On the other hand, it also becomes less of a factor. Right now, at least, it won't matter to most people whether age-specific mortality at 99 is half what it is now, because, unless mortality in old age drops by quite a bit, most people alive today are unlikely to live to be 99.

Saturday, July 27, 2019

Do neural networks have a point of view?

As someone once said, figures don't lie, but liars do figure.

In other words, just because something's supported by cold numbers doesn't mean it's true. It's always good to ask where the numbers came from. By the same token, though, you shouldn't distrust anything with numbers behind it, just because numbers can be misused. The breakdown is more or less:

If you hear "up" or "down" or "a lot" or anything that implies numbers, but they're aren't any numbers behind it, you really don't know if it's true or not, or whether it's significant.
If you hear "up X%" or "down Y%" or -- apparently a popular choice -- "up a whopping Z%" and you don't know where the numbers came from, you still don't really know if it's true or not. Even if they are correct, you don't know whether they're significant.
If you hear "up X%, according to so-and-so", then the numbers are as good as so-and-so's methodology. If you hear "down Y%, vs. Z% for last quarter", you at least have a basis for comparison, assuming you otherwise trust the numbers.
In all, it's a bit of a pain to figure all this out. Even trained scientists get it wrong more than we might think (I don't have numbers on this and I'm not saying it happens a lot, but it's not zero).
No one has time to do all the checking for more than a small subset of things we might be interested in, so to a large extent we have to trust other people to be careful. This largely comes down to reputation, and there are a number of cognitive biases in the way of evaluating that objectively.
But at least we can try to ignore blatantly bad data, and try to cross-check independent sources (and check that they're actually independent), and come up with a rough, provisional picture of what's really going on. If you do this continually over time the story should be pretty consistent, and then you can worry about confirmation bias.
(Also, don't put much stock in "record high" numbers or "up (a whopping) 12 places in the rankings", but that's a different post).

I'm not saying we're in some sort of epistemological nightmare, where no one has any idea what's true and what's not, just that objectivity is more a goal to aim towards rather than something we can generally expect to achieve.

So what does any of this amateur philosophizing have to do with neural networks?

Computers have long been associated with objectivity. The stramwan idea that "it came from a computer" is the same as "it's objectively true" probably never really had any great support, but a different form, I think, has quite a bit of currency, even to the point of becoming an implicit assumption. Namely, that computers evaluate objectively.

"Garbage in, garbage out," goes the old saying, meaning a computed result is only as good as the input it's given. If you say the high temperature in Buenos Aires was 150 degrees Celsius yesterday and -190 Celsius today, a computer can duly tell you the average high was -20 Celsius and the overall high was 150 Celsius, but that doesn't mean that Buenos Aires has been having, shall we say, unusual weather lately. It just means that you gave garbage data to a perfectly good program.

The implication is that if you give a program good data, it will give you a good result. That's certainly true for something simple, like calculating averages and extremes. It's less certain when you have some sort of complicated, non-linear model with a bunch of inputs, some of which affect the output more than others. This is why modeling weather takes a lot of work. There are potential issues with the math behind the model (does it converge under reasonable conditions?), the realization of that model on a computer (are we properly accounting for rounding error?) the particular settings of the parameters (how well does it predict weather that we already know happened?). There are plenty of other factors. This is just scratching the surface.

A neural network is exactly a complicated, non-linear model with a bunch of inputs, but without the special attention paid to the particulars. There is some general assurance that the tensor calculations that relate the input to the output are implemented accurately, but the real validation comes from treating the whole thing as a black box and seeing what outputs it produces from test inputs. There are well-established techniques for ensuring this is done carefully, for example using different datasets for training the network and for testing how well the network really performs, but at the end of the day the network is only as good as the data it was given.

This is similar to "Garbage in, Garbage out," but with a slightly different wrinkle. A neural net trained on perfectly accurate data and given perfectly accurate input can still produce bad results, if the context of the training data is too different from that of the input it was asked to evaluate.

If I'm developing a neural network for assessing home values, and I train and test it on real estate in the San Francisco Bay area, it's not necessarily going to do well evaluating prices in Toronto or Albuquerque. It might, because it might do a good job of taking values of surrounding properties into account and adjusting for some areas being more expensive than others, but there's no guarantee. Even if there is some sort of adjustment going on, it might be thrown off by any number of factors, whether housing density, the local range of variation among homes or whatever else.

The network, in effect, has a point of view based on what we might as well call its experience. This is a very human, subjective way to put it, but I think it's entirely appropriate here. Neural networks are specifically aimed at simulating the way actual brains work, and one feature of actual brains is that their point of view depends to a significant degree on the experience they've had. To the extent that neural networks successfully mimic this, their evaluations are, in a meaningful way, subjective.

There have been some widely-reported examples of neural networks making egregiously bad evaluations, and this is more or less why. It's not (to my knowledge) typically because the developers are acting in bad faith, but because they failed to assemble a suitably broad set of data for training and testing. This gave the net, in effect, a biased point of view.

This same sort of mistake can and does occur in ordinary research with no neural networks involved. A favorite example of mine is drawing conclusions about exoplanets based on the ones we've detected so far. These skew heavily toward large, fast-moving planets, because for various reasons those are much easier to detect. A neural network trained on currently known exoplanets would have the same skew built in (unless the developers were very careful, and quite likely even then), but you don't need a neural network to fall prey to this sort of sampling bias. From my limited sample, authors of papers at least try to take it into account, authors of magazine articles less so and headline writers hardly at all.

Sunday, July 14, 2019

How strong are computer chess players, and how far are they from perfect chess?

Playing a perfect game of chess appears to be an intractable problem. Certainly the total number of possible positions is far too large for a direct tabulation of the best move for each possible position to be feasible. There are far more possible chess positions than subatomic particles in the observable universe.

To be sure, almost all of these positions are extremely unlikely to appear in any reasonable game of chess, much less a perfectly played one, so you should be able to ignore them. All you would really need to know in order to play perfectly is what the best first move is, what the best second move is for every reply, and so forth. Since most possible replies are not good moves, this might thin out enough that it would be possible to store everything in a large database and/or write general rules that will cover large numbers of possible positions. In other words, it might (or might not) be feasible to write down a perfect strategy if we could find it. But we're nowhere close to finding it.

Nonetheless, there has been a lot of progress in the decades since Deep Blue beat Kasparov. It's now quite clear that computers can play better than the best humans, to the point that it's a bit hard to say exactly how much better computers are. There are rating numbers that imply that, say, Stockfish would beat Magnus Carlsen X% of the time, but they probably shouldn't be taken as anything more than an estimate. We can say that X is probably somewhere in the 90s, but that's about it.

Chess rating systems are generally derived from the Elo system (named after Arpad Elo), which tries to quantify playing strength as a single number based on players' records against each other. Two equally-rated players should have roughy equal numbers of wins and losses against each other, plus however many draws. As the rating difference increases, the stronger player should win more and more often.

Ratings are recalculated in light of actual results. If two equally-rated players draw, nothing will change, but if a low-rated player draws against a higher-rated one, the higher-rated player will lose points and the lower-rated player will win points. Likewise, the winner of a game will gain points and the loser will lose points, but you get more points for beating a higher-rated player and you lose more for losing to a lower-rated player.

Over time, this will tend to give a good picture of who's better, and how much better. If the parameters of the rating formula are tuned well, the difference in rating points will give a pretty good prediction of winning percentages. It's interesting in itself that reducing chess skill to a single number on a linear scale works as well as it does -- see this post for more than you probably want to read about that. The point here, though, is that to get useful ratings you need a pool of players all playing against each other.

You don't need a full round-robin of thousands of players, of course, but the results you do have need to be reasonably interconnected. If you're an average club player, you probably won't be playing many games against the world's top ten, but the strongest players in your club may well have played people who've played them, and in any case there will be multiple reasonably short paths connecting you to them.

To make this work, everyone, even at the top, needs to have players of comparable rating level to play against. This is important because rating differences are more meaningful between relatively evenly-matched players. If player A beats player B twelve times, draws five times and loses four times out of twenty games, then player A is probably about 200 points stronger than player B. If it's nineteen wins and one draw, we know player A is several hundred points stronger, but we can't say much more than that with confidence. If it's 20-0, the formula says that A is 800 points stronger, but it could really be any number from 800 on up.

If you're a club player, playing a 20-game match against Magnus Carlsen will only tell you something about your rating if you manage to at least draw at least one game. For most of us, that's not particularly likely, and for more or less the same reason, that tournament isn't likely to happen at all. In real life, people only have so much time, and professional players are only going to play seriously against other professionals.

The same holds true for humans and computers. People, including grandmasters, do play computers a lot, but generally not under controlled match conditions. Since even the best human player is likely to be on the wrong end of a 19-1-0 blowout with a top computer player in a tournament setting, there's little reason to pay a top human player to take time out of their schedule for an exhausting and demoralizing match that won't tell us much more than we already know.

Probably a better approach would be to dial back the parameters on a few well-known engines, with a mix of AB and NN, to produce a set of standard players at, say, 200 rating-point intervals -- 2800, 3000, 3200 ... . It would probably be much easier to get human players to play against the "near human-level" players, knowing they might be able to win bragging rights over their friends. Some of the top human players should even be able to beat the 2800-rated players. The human-level computer players could thus be calibrated against actual humans, and those could be calibrated against the stronger computer players.

Calibrating the top computer players against the standard computer players would take a lot of games, but that's not a problem for computers. In the last computer tournament I watched, there were hundreds of games, with dozens between each pair in the round-robin phase, and more among the finalists. Outside of tournaments, a group called CCRL plays computer players against each other under standard conditions specifically to rate them.

If only because of the sheer amount of human players, we can be reasonably confident that human players are reasonably accurately rated with respect to each other, and because of the sheer number of games played between computers, we can be similarly sure that their ratings reflect their relative strengths. If computer tournaments included a few human-calibrated players, then we would have a better idea just how strong the strongest players were, compared to humans.

As it is, the two sets of ratings may well be drifting apart. In the last computer tournament I watched, the bottom player -- which, to be sure, lost quite heavily -- had a rating in the 2900s. Carlsen is currently in the high 2800s. In theory, Carlsen would probably lose to that computer player, but would probably manage a few draws and quite possibly a win or two. In practice, that match is probably not going to happen.

While it would be interesting to get more detail on ratings across humans and computers, it doesn't really change the story of "computers can beat humans easily" and by itself it doesn't shed much light on the limits of how well chess can be played. An Elo-style rating system doesn't have a built-in maximum rating. There is, in theory, a best-possible player, but we don't know how strong that player is or, put another way, we don't know how close the current best engines are to playing perfect chess or where they would rate with respect to a perfect player.

It is interesting, though, that stronger players, both human and computer, tend to draw more against each other than weaker players. Amateur-level games are often decisive because one player or the other blunders (or, more accurately, blunders more than the other player does). Higher-level human players blunder less, and chess engines don't blunder at all, or at least the notion of "blunder" for an engine is more on the order of "didn't realize that having an enemy pawn that far advanced on the kingside could lead to a devastating attack". The finals of the last computer tournament I watched consisted almost entirely of draws.

While it's entirely possible that perfectly-played chess is a win for white (or, who knows, a win for black), and the top engines just aren't finding the right moves, it seems more likely to me that perfectly played chess is a draw and engines are basically getting better at not losing, while being able to ruthlessly exploit imperfect play when it does happen.

If this is the case then it's quite possible that ratings will top out at a point where decisive games become exceedingly rare. A plausible match result might be 1-0 with 999 draws. If rules required a certain margin of victory, that might essentially be a matter of flipping coins until there are are that many more heads than tails or tails than heads. The "law of large numbers" doesn't forbid this, it just says that you'll need a lot of coin flips to get there.

We could well get to a point where a match between top players ends in a score of 1439-1429 with 50,726 draws or something like that. At that point writing chess engines becomes similar to cranking out more and more digits of pi -- interesting to a small group of people, and useful in stress-testing systems and tools, but no longer of general interest, even among geeks.

The players would probably still not be playing perfect chess, but it seems plausible that they would nearly hold their own against perfect play. As draws become more common, ratings should start to bunch together as a result. If player A's record against player B is 12-0 with 123,765 draws, their ratings will be essentially identical, even if A is playing perfectly and B is only almost-perfect.

Working backward from this, if top ratings are in fact starting to bunch together, you could extrapolate toward a theoretical maximum rating that they're approaching. A player's actual rating is then an estimate of how often they would lose, as opposed to drawing, against perfect play. I wouldn't be surprised if someone's already done a regression based on a scenario like this.

Saturday, July 22, 2017

Yep. Tron.

It was winter when I started writing this, but writing posts about physics is hard, at least if you're not a physicist. This one was particularly hard because I had to re-learn what I thought I knew about the topic, and then realize that I'd never really understood it as well as I'd thought, then try to learn it correctly, then realize that I also needed to re-learn some of the prerequisites, which led to a whole other post ... but just for the sake of illustration, let's pretend it's still winter.

If you live near a modest-sized pond or lake, you might (depending on the weather) see it freeze over at night and thaw during the day. Thermodynamically this can be described in terms of energy (specifically heat) and entropy. At night, the water is giving off heat into the surrounding environment and losing entropy (while its temperature stays right at freezing). The surrounding environment is taking on heat and gaining entropy. The surroundings gain at least as much entropy as the pond loses, and ultimately the Earth will radiate just that bit more heat into space. When you do all the accounting, the entropy of the universe increases by just a tiny bit, relatively speaking.

During the day, the process reverses. The water takes on heat and gains entropy (while its temperature still stays right at freezing). The surroundings give off heat, which ultimately came from the sun, and lose entropy. The water gains at least as much entropy as the surroundings lose*, and again the entropy of the universe goes up by just that little, tiny bit, relatively speaking.

So what is this entropy of which we speak? Originally entropy was defined in terms of heat and temperature. One of the major achievements of modern physics was to reformulate entropy in a more powerful and elegant form, revealing deep and interesting connections, thereby leading to both enlightenment and confusion. The connections were deep enough that Claude Shannon, in his founding work on information theory, defined a similar concept with the same name, leading to even more enlightenment and confusion.

The original thermodynamic definition relies on the distinction between heat and temperature. Temperature, at least in the situations we'll be discussing here, is a measure of how energetic individual particles -- typically atoms or molecules -- are on average. Heat is a form of energy, independent of how many particles are involved.

The air in an oven heated to 500K (that is, 500 Kelvin, about 227 degrees Celsius or 440 degrees Fahrenheit) and a pot full of oil at 500K are, of course, at the same temperature, but you can safely put your hand in the oven for a bit. The oil, not so much. Why? Mainly because there's a lot more heat in the oil than in the air. By definition the molecules in the oven air are just as energetic, on average, as a the molecules the oil, but there are a lot more molecules of oil, and therefore a lot more energy, which is to say heat.

At least, that's the quick explanation for purposes of illustration. Going into the real details doesn't change the basic point: heat is different from temperature and changing the temperature of something requires transferring energy (heat) to or from it. As in the case of the pond freezing and melting, there are also cases where you can transfer heat to or from something without changing its temperature. This will be important in what follows.

Entropy was originally defined as part of understanding the Carnot cycle, which describes the ideal heat-driven engine (the efficiency of a real engine is usually given as a percentage of what the Carnot cycle would produce, not as a percentage of the energy it uses). Among the principal results in classical thermodynamics is that the Carnot cycle was as good as you can get even in principle, but not even it can ever be perfectly efficient, even in principle.

At this point it might be helpful to read that earlier post on energy, if you haven't already. Particularly relevant parts here are that the state of the working fluid in a heat engine, such as the steam in a steam engine, can be described with two parameters, or, equivalently, as a point in a two-dimensional diagram, and that the cycle an engine goes through can be described by a path in that two-dimensional space.

Also keep in mind the ideal gas law: In an ideal gas, the temperature of a given amount of gas is proportional to pressure times volume. Here and in the rest of this post, "gas" means "a substance without a fixed shape or volume" and not what people call "gasoline" or "petrol".

If you've ever noticed a bicycle pump heat up as you pump up a tire, that's (more or less) why. You're compressing air, that is, decreasing its volume, so (unless the pump is able to spill heat with perfect efficiency, which it isn't) the temperature has to go up. For the same reason the air coming out of a can of compressed air is dangerously cold. The air is expanding rapidly so the temperature drops sharply.

In the Carnot cycle you first supply heat a to gas (the "working fluid", for example steam in a steam engine) while maintaining a perfectly constant temperature by expanding the container it's in. You're heating that gas, in the sense of supplying heat, but not in the sense of raising its temperature. Again, heat and temperature are two different things.

To continue the Carnot cycle, let the container keep expanding, but now in such a way that it neither gains nor loses heat (in technical terms, adiabatically). In these first two steps, you're getting work out of the engine (for example, by connecting a rod to the moving part of a piston and attaching the other part of that rod to a wheel). The gas is losing energy since it's doing work on the piston, and it's also expanding, so the temperature and pressure are both dropping, but no heat is leaving the container in the adiabatic step.

Work is force times distance, and force in this case is pressure times the area of the surface that's moving. Since the pressure, and therefore the force, is dropping during the second step you'll need to use calculus to figure out the exact amount of work, but people know how to do that.

The last two steps of the cycle reverse the first two. In step three you compress the gas, for example by changing the direction the piston is moving, while keeping the temperature the same. This means the gas is cooling in the sense of giving off heat, but not in the sense of dropping in temperature. Finally, in step four, compress the gas further, without letting it give off heat. This raises the temperature. The piston is doing work on the gas and the volume is decreasing. In a perfect Carnot cycle the gas ends up in the same state -- same pressure, temperature and volume -- as it began and you can start it all over.

As mentioned in the previous post, you end up putting more heat in at the start then you end up getting back in the third step, and you end up getting more work out in the first two steps than you put in in the last two (because the pressure is higher in the first two steps). Heat gets converted to work (or if you run the whole thing backwards, you end up with a refrigerator).

If you plot the Carnot cycle on a diagram of pressure versus volume, or the other two combinations of pressure, volume and temperature, you get a a shape with at least two curved sides, and it's hard to tell whether you could do better. Carnot proved that this cycle is the best you can do, in terms of how much work you can get out of a given amount of heat, by choosing two parameters that make the cycle into a rectangle. One is temperature -- steps one and three maintain a constant temperature.

The other needs to make the other two steps straight lines. To make this work out, the second quantity has to remain constant while the temperature is changing, and change when temperature is constant. The solution is to define a quantity -- call it entropy -- that changes, when temperature is constant, by the amount of heat transferred, divided by that temperature (ΔS = ΔQ/T -- the deltas (Δ) say that we're relating changes in heat and entropy, not absolute quantities; Q stands for heat and S stands for entropy, because reasons). When there's no heat transferred, entropy doesn't change. In step one, temperature is constant and entropy increases. In step two, temperature decreases while entropy remains constant, and so forth.

To be clear, entropy and temperature can, in general, both change at the same time. For example, if you heat a gas at constant volume, then pressure, temperature and entropy all go up. The Carnot cycle is a special case where only one changes at a time.

Knowing the definition of entropy, you can convert, say, a pressure/volume diagram to a temperature/entropy diagram and back. In real systems, the temperature/entropy version won't show absolutely straight vertical and horizontal lines -- that is, there will be at least some places where both change at the same time. The Carnot cycle is exactly the case where the lines are perfectly horizontal and vertical.

This definition of entropy in terms of heat and temperature says nothing at all about what's going on in the gas, but it's enough, along with some math I won't go into here (but which depends on the cycle being a rectangle), to prove Carnot's result: The portion of heat wasted in a Carnot cycle is the ratio of the cold temperature to the hot temperature (on an absolute temperature scale). You can only have zero loss -- 100% efficiency -- if the cold temperature is absolute zero. Which it won't be.

Any cycle that deviates from a perfect rectangle will be less efficient yet. In real life this is inevitable. You can come pretty close on all the steps, but not perfectly close. In real life you don't have an ideal gas, you can't magically switch from being able to put heat into the gas to perfectly insulating it, you won't be able to transfer all the heat from your heat source to the gas, you won't be able to capture all the heat from the third step of the cycle to reuse in the first step of the next cycle, some of the energy of the moving piston will be lost to friction (that is, dissipated into the surroundings as heat) and so on.

The problem-solving that goes into minimizing inefficiencies in real engines is why engineering came to be called engineering and why the hallmark of engineering is getting usefulness out of imperfection.

There are other cases where heat is transferred at a constant temperature, and we can define entropy in the same way as for a gas. For example, temperature doesn't change during a phase change such as melting or freezing. As our pond melts and freezes, the temperature stays right at freezing until the pond completely freezes, at which point it can get cooler, or melts entirely, at which point it can get warmer.

If all you know is that some water is at the freezing point, you can't say how much heat it will take to raise the temperature above freezing without knowing how much of it is frozen and how much is liquid. The concept of entropy is perfectly valid here -- it relates directly to how much of the pond is liquid -- and we can define "entropy of fusion" to account for phase transitions.

There are plenty of other cases that don't look quite so much like the ideal gas case but still involve changes of entropy. Mixing two substances increases overall entropy. Entropy is a determining factor in whether a chemical reaction will go forward or backward and in ice melting when you throw salt on it.

Before I go any further about thermodynamic entropy, let me throw in that Claude Shannon's definition of entropy in information theory is, informally, a measure of the number of distinct messages that could have been transmitted in a particular situation. On the other blog, for example, I've ranted about bits of entropy for passwords. This is exactly a measure of how many possible passwords there are in a given scheme for picking passwords.

What in the world does this have to do with transferring heat at a constant temperature? Good question.

Just as the concept of energy underwent several shifts in understanding on the way to its current formulation, so did entropy. The first major shift came with the development of statistical mechanics. Here "mechanics" refers to the behavior of physical objects, and "statistical" means you've got enough of them that you're only concerned about their overall behavior.

Statistical mechanics models an ideal gas as a collection of particles bouncing around in a container. You can think of this as a bunch of tiny balls bouncing around in a box, but there's a key difference from what you might expect from that image. In an ideal gas, all the collisions are perfectly elastic, meaning that the energy of motion (called kinetic energy) remains the same before and after. In a real box full of balls, the kinetic energy of the balls gets converted to heat as the balls bump into each other and push each other's molecules around, and sooner or later the balls stop bouncing.

But the whole point of the statistical view of thermodynamics is that heat is just the kinetic energy of the particles the system is made up of. When actual bouncing balls lose energy to heat, that means that the kinetic energy of the large-scale motion of the balls themselves is getting converted into kinetic energy of the small-scale motion of the molecules the balls are made of, and of the air in the box, and of the walls of the box, and eventually the surroundings. That is, the large scale motion we can see is getting converted into a lot of small-scale motion that we can't, which we call heat.

When two particles, say two oxygen molecules, bounce off each other, the kinetic energy of the moving particles just gets converted into kinetic energy of differently-moving particles, and that's it. In the original formulation of statistical mechanics, there's simply no other place for that energy to go, no smaller-scale moving parts to transfer energy to (assuming there's no chemical reaction between the two -- if you prefer, put pure helium in the box).

When a particle bounces off the wall of the container, it imparts a small impulse -- an instantaneous force -- to the walls. When a whole lot of particles continually bounce off the walls of a container, those instantaneous forces add up to (for all practical purposes) a continuous force, that is, pressure.

Temperature is the average kinetic energy of the particles and volume is, well, volume. That gives us our basic parameters of temperature, pressure and volume.

But what is entropy, in this view? In statistical mechanics, we're concerned about the large-scale (macroscopic) state of the system, but there are many different small-scale (microscopic) states that could give the same macroscopic picture.

Once you crank through all the math, it turns out that entropy is a measure of how many different microscopic states, which we can't measure, are consistent with the macroscopic state, which we can measure. In fuller detail, entropy is actually proportional to the logarithm of that number -- the number of digits, more or less -- both because the raw numbers are ridiculously big, and because that way the entropy of two separate systems is the sum of the entropy of the individual systems.

The actual formula is S = k ln(W), where k is Boltzmann's constant and W is the total number of possible microstates, assuming they're all equally probable. There's a slightly bigger formula if they're not. Note that, unlike the original thermodynamic definition, this formula deals in absolute quantities, not changes.

When ice melts, entropy increases. Water molecules in ice are confined to fixed positions in a crystal. We may not know the exact energy of each individual molecule, but we at least know more or less where it is, and we know that if the energy of such a molecule is too high, it will leave the crystal (if this happens on a large scale, the crystal melts). Once it does, we know much less about its location or energy.

Even without a phase change, the same sort of reasoning applies. As temperature -- the average energy of each particle -- increases, the range of energies each particle can have increases. How to translate this continuous range of energies into a number we can count is a bit of a puzzle, but we can handwave around that for now.

Entropy is often called a measure of disorder, but more accurately it's a measure of uncertainty (as theoretical physicist Sabine Hossenfelder puts it: "a measure for unresolved microscopic details"), that is, how much we don't know. That's why Shannon used the same term in information theory. The entropy of a message measures how much we don't know about it just from knowing its size (and a couple of other macroscopic parameters). Shannon entropy is also logarithmic, for the same reasons that thermodynamic entropy is.

The formula for Shannon entropy in the case that all possible messages are equally probable is H = k ln(M), where M is the number of messages. I put k there to account for the logarithm usually being base 2 and because it emphasizes the similarity to the other definition. Again, there's a slightly bigger formula if the various messages aren't all equally probable, and it too looks an awful lot like the corresponding formula for thermodynamic entropy.

The original formulation of statistical mechanics assumed that physics at the microscopic scale followed Newton's laws of motion. One indication that statistical mechanics was on to something is that when quantum mechanics completely reformulated what physics looks like at the microscopic scale, the statistical formulation not only held up, but became more accurate with the new information available.

In our current understanding, when two oxygen molecules bounce off each other, their electron shells interact (there's more going on, but let's start there), and eventually their energy gets redistributed into a new configuration. This can mean the molecules traveling off in new paths, but it could also mean that some of the kinetic energy gets transferred to the electrons themselves, or some of the electrons' energy gets converted into kinetic energy.

Macroscopically this all looks the same as the old model, if you have huge numbers of molecules, but in the quantum formulation we have a more precise picture of entropy. This makes a difference in extreme situations such as extremely cold crystals. Since energy is quantized, there is a finite (though mind-bendingly huge) number of possible quantum states a typical system can have, and we can stop handwaving about how to handle ranges of possible energy. This all works whether you have a gas, a liquid, an ordinary solid or some weird Bose-Einstein condensate. Entropy measures that number of possible quantum states.

Thermodynamic entropy and information theoretic entropy are measuring basically the same thing, namely the number of specific possibilities consistent with what we know in general. In fact, the modern definition of thermodynamic entropy specifically starts with a raw number of possible states and includes a constant factor to convert from the raw number to the units (energy over temperature) of classical thermodynamics.

This makes the two notions of entropy look even more alike -- they're both based on a count of possibilities, but with different scaling factors. Below I'll even talk, loosely, of "bits worth of thermodynamic entropy" meaning the number of bits in the binary number for the number of possible quantum states.

Nonetheless, they're not at all the same thing in practice.

Consider a molecule of DNA. There are dozens of atoms, and hundreds of subatomic particles, in a base pair. I really don't know how many possible states a phosphorous atom (say) could be in under typical conditions, but I'm going to guess that there are thousands of bits worth of entropy in a base pair at room temperature. Even if each individual particle can only be on one of two possible states, you've still got hundreds of bits.

From an information-theoretic point of view, there are four possible states for a base pair, which is two bits, and because the genetic code actually includes a fair bit of redundancy in the form of different ways of coding the same amino acid and so forth, it's actually more like 10/6 of a bit, even without taking into account other sources of redundancy.

But there is a lot of redundancy in your genome, as far as we can tell, in the form of duplicated genes and stretches of DNA that might or might not do anything. All in all, there is about a gigabyte worth of base pairs in a human genome, but the actual gene-coding information can compress down to a few megabytes. The thermodynamic entropy of the molecule that encodes those megabytes is much, much, larger. If each base pair represents about a thousand bits worth of thermodynamic entropy under typical conditions, then the whole strand is into the hundreds of gigabytes.

I keep saying "under typical conditions" because thermodynamic entropy, being thermodynamic, depends on temperature. If you have a fever, your body, including your DNA molecules in particular, has higher entropy than if you're sitting in an ice bath. The information theoretic entropy, on the other hand, doesn't change.

But all this is dwarfed by another factor. You have billions of cells in your body (and trillions of bacterial cells that don't have your DNA, but never mind that). From a thermodynamic standpoint, each of those cells -- its DNA, its RNA, its proteins, lipids, water and so forth -- contributes to the overall entropy of your body. A billion identical strands of DNA at a given temperature have the same information content as a single strand but a billion times the thermodynamic entropy.

If you want to compare bits to bits, the Shannon entropy of your DNA is inconsequential compared to the thermodynamic entropy of your body. Even the change in the thermodynamic entropy of your body as you breathe is enormously bigger than the Shannon entropy of your DNA.

I mention all this because from time to time you'll see statements about genetics and the second law of thermodynamics. The second law, which is very well established, states that the entropy of a closed system cannot decrease over time. One implication of it is that heat doesn't flow from cold to hot, which is a key assumption in Carnot's proof.

Sometimes the second law is taken to mean that genomes can't get "more complex" over time, since that would violate the second law. The usual response to this is that living cells aren't closed systems and therefore the second law doesn't apply. That's perfectly valid. However, I think a better answer is that this confuses two forms of entropy -- thermodynamic entropy and Shannon entropy -- which are just plain different. In other words, thermodynamic entropy and the second law don't work that way.

From an information point of view, the entropy of a genome is just how many bits it encodes once you compress out any redundancy. Longer genomes typically have more entropy. From a thermodynamic point of view, at a given temperature, more of the same substance has higher entropy than less as well, but we're measuring different quantities.

A live elephant has much, much higher entropy than a live mouse, and likewise for a live human versus a live mouse. As it happens, a mouse genome is roughly the same size as a human genome, even though there's a huge difference in thermodynamic entropy between a live human and a live mouse. The mouse genome is slightly smaller than ours, but not a lot. There's no reason it couldn't be larger, and certainly no thermodynamic reason. Neither the mouse nor human genome is particularly large. Several organisms have genomes dozens of times larger, at least in terms of raw base pairs.

From a thermodynamic point of view, it hardly matters what exact content a DNA molecule has. There are some minor differences in thermodynamic behavior among the particular base pairs, and in some contexts it makes a slight difference what order they're arranged in, but overall the gene-copying machinery works the same whether the DNA is encoding a human digestive protein or nothing at all. Differences in gene content are dwarfed by the thermodynamic entropy change of turning one strand of DNA and a supply of loose nucleotides into two strands, that in turn is dwarfed by everything else going on in the cell, and that in turn is dwarfed by the jump from one cell to billions.

For what it's worth, content makes even less thermodynamic difference in other forms of storage. A RAM chip full of random numbers has essentially the same thermodynamic entropy, at a given temperature, as one containing all zeroes or all ones, even though those have drastically different Shannon entropies. The thermodynamic entropy changes involved in writing a single bit to memory are going to equate to a lot more than one bit.

Again, this is all assuming it's valid to compare the two forms of entropy at all, based on their both being measures of uncertainty about what exact state a system is in, and again, the two are not actually comparable, even though they're similar in form. Comparing the two is like trying to compare a football score to a basketball score on the basis that they're both counting the number of times the teams involved have scored goals.

There's a lot more to talk about here, for example the relation between symmetry and disorder (more disorder means more symmetry, which was not what I thought until I sat down to think about it), and the relationship between entropy and time (for example, as experimental physicist Richard Muller points out, local entropy decreases all the time without time appearing to flow backward), but for now I think I've hit the main points:

The second law of thermodynamics is just that -- a law of thermodynamics
Thermodynamic entropy as currently defined and information-theoretic (Shannon) entropy are two distinct concepts, even though they're very similar in form and derivation.
The two are defined in different contexts and behave entirely differently, despite what we might think from them having the same name.
Back at the first point, the second law of thermodynamics says almost nothing about Shannon entropy, even though you can, if you like, use the same terminology in counting quantum states.
All this has even less to do with genetics.

* Strictly speaking, you need to take the Sun into account. The Sun is gaining entropy over time, at a much, much higher rate than our little pond and its surroundings, and it's only an insignificantly tiny part of the universe. But even if you had a closed system, of a pond and surroundings that were sometimes warm and sometimes cold, for whatever reason, the result would be the same: The entropy of a closed system increases over time.

Friday, May 26, 2017

The value of the thing ...

... is what it will bring.

I've now seen several headlines along the lines of "NASA to explore $10,000 quadrillion metal asteroid"

What does this even mean?

Two things, really:

NASA is planning a mission to the nickel-iron asteroid 16 Psyche, which is true
That asteroid contains $10 quintillion worth of metal, which is, um ...

I mean, on the one hand it's a simple calculation: Psyche contains X tons of nickel at $Y/ton, and likewise for iron. Total value: $10 quintillion or, for whatever reason, $10,000 quadrillion.

Except that's about 100,000 times the world's GDP, so maybe we're missing something?

Suppose we could magically bring all the nickel and iron in Psyche to earth. That's a ball about 200km across, so we'd have to be a bit careful, but say we break it down into a few million 1km heaps distributed strategically around the world. How much is that really worth?

You might think "Yay, free iron and nickel!" but that's not quite right. Even scrap iron, which has already been refined and packaged into usable pieces, costs something to buy, something to transport and something more to put to use, unless it happens to be in just the form you need. More realistically, it would mean no more iron mining, which is great unless you happen to be in the iron/nickel mining business. That's not nothing -- world iron production looks to be around $300 billion and nickel maybe more like $20 billion. But it's not a trillion dollars, much less a quadrillion or quintillion.

Or look at it another way: We've got a rock out in space that's worth as much as the entire world economy would produce in 100,000 years at current rates. The total budget of NASA is around $20 billion, with ESA JAXA and the Russian space agency accounting for a few billion more. Surely it would be worth it to throw the world's entire space budget into mining that rock.

Except, the question isn't whether there's a bunch of valuable metal to be mined. The question is whether it's worth mining. It currently costs about $20,000 per kilogram to get a payload to low earth orbit. It's anybody's guess what it would cost to actually mine a given amount of metal in the asteroid belt and bring it back to Earth safely -- though if you're transporting a hunk of metal I suppose you just have to make sure that it doesn't hit anything on the way in. But bulk nickel from Earth runs more like $10/kg and iron is cheaper yet, so ... maybe not.

I don't really want to pick on NASA for trying to drum up a little interest in its latest mission -- though it's probably worth mentioning that the past couple of decades of unmanned missions by NASA and the other space agencies have been spectacularly successful in exploring the solar system and in an ideal world that would speak for itself. If there's a point here, it's that it's a good idea not to take numbers, especially eye-catching dollar amounts, at face value without asking what they actually mean.

Tuesday, January 29, 2013

We're #1. So what?

On the radio today I heard that a certain statistic was at its highest (or lowest) level in seventeen months. Certainly sounds impressive, but what does it mean? Without having followed the history of the statistic, I'd have know way of knowing.

For example, if it's 100 now, and it was 99 seventeen months ago and 98 for the other months (including last month), it may not mean much at all. On the other hand, if the sequence had been more like 99, 82, 64, 57, 43, 51, 46 ... 54, 47, 100, that jump from 47 to 100 might be very significant, particularly if the original fall from 99 to the 40s and 50s had been significant.

Suppose I'm part of a community of gamers in which each gamer has a numerical rating. Last month I had the 1523rd-highest rating. This month I'm 1209th. I've just rocketed 314 places up the rankings. Pretty awesome, huh?

Well, maybe. Suppose there are 704 people with a rating of 98, 313 people with a rating of 99 and 1208 people with higher ratings. The top rating is 106. Last month my rating was 98, so I was one of the 704 tied for 1523rd - 2226th. This month, by virtue of a one-point improvement, I'm now one of the proud 313 tied for 1209th - 1522nd. Last month I was good, though not quite as good as the best. This month I got a little closer to the top. Maybe not so impressive.

On the other hand, suppose there are three million or so players. Most of them have fairly unremarkable ratings, but once you get to the top ranks the scores start to increase dramatically. The 1523rd best ranking is 12,096, the 1209th is 451,903 and the top player has an unbelievable 75,419,223. I've made really amazing strides in the last month, but I'm still far, very far, from the top.

Ok, that's a lot of made-up numbers for just four paragraphs. What's going on here?

First, any measurement is meaningless without context. I originally said "a statistic" instead of "measurement", but the whole point of statistics, that is, pulling (abstracting) concise metrics out of a pile of data, is to provide context. If I say that the mass of a sample is 153 grams, that doesn't tell me much, but if you tell me that the average (mean) mass of past samples is 75 grams and the standard deviation is 8 grams, I know I'm dealing with an extremely rare high-mass sample. Or my scale is broken, or I'm actually measuring a completely different kind of sample, or something else significant is going on. The mean and standard deviation statistics provide context for knowing what I'm dealing with.

Simply saying "highest in seventeen months" or "jumped 314 places in the rankings" doesn't provide any meaningful context. Either or both of those could be highly significant, or nothing in particular.

Second, citing rankings like highest, 1209th and so forth implies that something noteworthy about a ranking is also noteworthy about the underlying measurement that's being ranked. But this is misleading. Depending on how the rating is distributed, a large change in rating could mean a small change in ranking, or a large one, and likewise for "highest in N time periods." Technically, ranking can be highly non-linear.

Rankings are not entirely useless. For example, there have been many more record high temperatures than record low temperatures in recent decades. Given that short term temperature fluctuations over more than a few days are fairly random (or at least, chaotic), this strongly suggests that temperatures overall are rising. More sophisticated measurements bear this out, but the simple comparison of record highs versus record lows quickly suggests a trend in the climate as a whole. Even then, though, it's the careful measurement of the temperatures themselves that tells what's really going on. Looking at record highs and lows just points us in a useful direction.

In general, when someone cites a ranking or a record extreme, it's good to ask what's going on with the quantity being ranked.