Tuesday, January 29, 2013

We're #1. So what?

On the radio today I heard that a certain statistic was at its highest (or lowest) level in seventeen months.  Certainly sounds impressive, but what does it mean?  Without having followed the history of the statistic, I'd have know way of knowing.

For example, if it's 100 now, and it was 99 seventeen months ago and 98 for the other months (including last month), it may not mean much at all.  On the other hand, if the sequence had been more like 99, 82, 64, 57, 43, 51, 46 ... 54, 47, 100, that jump from 47 to 100 might be very significant, particularly if the original fall from 99 to the 40s and 50s had been significant.


Suppose I'm part of a community of gamers in which each gamer has a numerical rating.  Last month I had the 1523rd-highest rating.  This month I'm 1209th.  I've just rocketed 314 places up the rankings. Pretty awesome, huh?

Well, maybe.  Suppose there are  704 people with a rating of 98, 313 people with a rating of 99 and 1208 people with higher ratings.  The top rating is 106.  Last month my rating was 98, so I was one of the 704 tied for 1523rd - 2226th.  This month, by virtue of a one-point improvement, I'm now one of the proud 313 tied for 1209th - 1522nd.  Last month I was good, though not quite as good as the best.  This month I got a little closer to the top.  Maybe not so impressive.

On the other hand, suppose there are three million or so players.  Most of them have fairly unremarkable ratings, but once you get to the top ranks the scores start to increase dramatically.  The 1523rd best ranking is 12,096, the 1209th is 451,903 and the top player has an unbelievable 75,419,223.  I've made really amazing strides in the last month, but I'm still far, very far, from the top.


Ok, that's a lot of made-up numbers for just four paragraphs.  What's going on here?

First, any measurement is meaningless without context.  I originally said "a statistic" instead of "measurement", but the whole point of statistics, that is, pulling (abstracting) concise metrics out of a pile of data, is to provide context.  If I say that the mass of a sample is 153 grams, that doesn't tell me much, but if you tell me that the average (mean) mass of past samples is 75 grams and the standard deviation is 8 grams, I know I'm dealing with an extremely rare high-mass sample.  Or my scale is broken, or I'm actually measuring a completely different kind of sample, or something else significant is going on.  The mean and standard deviation statistics provide context for knowing what I'm dealing with.

Simply saying "highest in seventeen months" or "jumped 314 places in the rankings" doesn't provide any meaningful context.  Either or both of those could be highly significant, or nothing in particular.

Second, citing rankings like highest, 1209th and so forth implies that something noteworthy about a ranking is also noteworthy about the underlying measurement that's being ranked.  But this is misleading.  Depending on how the rating is distributed, a large change in rating could mean a small change in ranking, or a large one, and likewise for "highest in N time periods."  Technically, ranking can be highly non-linear.

Rankings are not entirely useless.  For example, there have been many more record high temperatures than record low temperatures in recent decades.  Given that short term temperature fluctuations over more than a few days are fairly random (or at least, chaotic), this strongly suggests that temperatures overall are rising.  More sophisticated measurements bear this out, but the simple comparison of record highs versus record lows quickly suggests a trend in the climate as a whole.  Even then, though, it's the careful measurement of the temperatures themselves that tells what's really going on.  Looking at record highs and lows just points us in a useful direction.

In general, when someone cites a ranking or a record extreme, it's good to ask what's going on with the quantity being ranked.