Tuesday, January 29, 2013

We're #1. So what?

On the radio today I heard that a certain statistic was at its highest (or lowest) level in seventeen months.  Certainly sounds impressive, but what does it mean?  Without having followed the history of the statistic, I'd have know way of knowing.

For example, if it's 100 now, and it was 99 seventeen months ago and 98 for the other months (including last month), it may not mean much at all.  On the other hand, if the sequence had been more like 99, 82, 64, 57, 43, 51, 46 ... 54, 47, 100, that jump from 47 to 100 might be very significant, particularly if the original fall from 99 to the 40s and 50s had been significant.


Suppose I'm part of a community of gamers in which each gamer has a numerical rating.  Last month I had the 1523rd-highest rating.  This month I'm 1209th.  I've just rocketed 314 places up the rankings. Pretty awesome, huh?

Well, maybe.  Suppose there are  704 people with a rating of 98, 313 people with a rating of 99 and 1208 people with higher ratings.  The top rating is 106.  Last month my rating was 98, so I was one of the 704 tied for 1523rd - 2226th.  This month, by virtue of a one-point improvement, I'm now one of the proud 313 tied for 1209th - 1522nd.  Last month I was good, though not quite as good as the best.  This month I got a little closer to the top.  Maybe not so impressive.

On the other hand, suppose there are three million or so players.  Most of them have fairly unremarkable ratings, but once you get to the top ranks the scores start to increase dramatically.  The 1523rd best ranking is 12,096, the 1209th is 451,903 and the top player has an unbelievable 75,419,223.  I've made really amazing strides in the last month, but I'm still far, very far, from the top.


Ok, that's a lot of made-up numbers for just four paragraphs.  What's going on here?

First, any measurement is meaningless without context.  I originally said "a statistic" instead of "measurement", but the whole point of statistics, that is, pulling (abstracting) concise metrics out of a pile of data, is to provide context.  If I say that the mass of a sample is 153 grams, that doesn't tell me much, but if you tell me that the average (mean) mass of past samples is 75 grams and the standard deviation is 8 grams, I know I'm dealing with an extremely rare high-mass sample.  Or my scale is broken, or I'm actually measuring a completely different kind of sample, or something else significant is going on.  The mean and standard deviation statistics provide context for knowing what I'm dealing with.

Simply saying "highest in seventeen months" or "jumped 314 places in the rankings" doesn't provide any meaningful context.  Either or both of those could be highly significant, or nothing in particular.

Second, citing rankings like highest, 1209th and so forth implies that something noteworthy about a ranking is also noteworthy about the underlying measurement that's being ranked.  But this is misleading.  Depending on how the rating is distributed, a large change in rating could mean a small change in ranking, or a large one, and likewise for "highest in N time periods."  Technically, ranking can be highly non-linear.

Rankings are not entirely useless.  For example, there have been many more record high temperatures than record low temperatures in recent decades.  Given that short term temperature fluctuations over more than a few days are fairly random (or at least, chaotic), this strongly suggests that temperatures overall are rising.  More sophisticated measurements bear this out, but the simple comparison of record highs versus record lows quickly suggests a trend in the climate as a whole.  Even then, though, it's the careful measurement of the temperatures themselves that tells what's really going on.  Looking at record highs and lows just points us in a useful direction.

In general, when someone cites a ranking or a record extreme, it's good to ask what's going on with the quantity being ranked.

2 comments:

  1. If the mean of whatever we are measuring is staying fairly constant (if the climate is not changing, if basketball players are not getting taller, if golf balls are not getting more elastic) then records should be getting rarer. The first time a golf ball was ever hit, it was a record drive. The second one had an even chance of breaking the record, assuming all the conditions remained the same. The 4000th drive had a much worse chance, and so on.

    Athletic records fall fairly frequently. Even without chemicals, people have been getting taller, and stronger. Training regimens and tennis racquets are better. Athletic events are better paid, and thus more and more competitors are attracted, and motivated.

    One expects this to approach a limit,

    If record temperatures (especially highs) do not become less frequent, very simple statistical tests (e.g., the Chi Square) can be used to show that this is not what would be expected from random spikes.

    For what it's worth, Sunday's high in my hometown beat the record for the day by 10 degrees F. It beat the record for the month, though I don't know by how much.

    ReplyDelete
  2. That's a good point about records as a whole becoming less common over time, ceteris paribus, and a good one to keep in mind next time someone claims that we're no longer living in an age of giants because no one breaks records any more.

    There's another interesting phenomenon that happens in the face of competition, say between pitchers and batters: The standard deviation shrinks over time. Early on, no one knows what they're doing and spectacularly good or bad performances are more common. Over time, everyone becomes more skilled, equipment gets better and performances become more consistent.

    Miguel Cabrera hit .330 for the Tigers in 2012, best in the AL. Ty Cobb hit .409 for the Tigers in 1912, best in the AL. That tells us nothing, though, about how Cobb would have hit today's pitching or Cabrera would have hit in 1912.

    There were at least 20 .400 hitters up until 1941 (counting each season separately for players with more than one .400 season). Famously, there have been none since. However, there were also several seasons in that golden era where a league-leading hitter hit worse than Cabrera's .330, including Cobb's .324 in 1908.

    Stephen Jay Gould used essentially this example to illustrate the decline in diversity of body plans over the course of evolution. Early on, lots of different things would work. Over time, though, most of the dodgier plans got et (and some workable plans probably died out due to bad luck). Today, only a few of the early designs are still around.

    For what it's worth, Ted Williams himself hit .328 to lead the AL in 1958. Williams in 1958 was clearly not playing at the same level as Williams in his prime in 1941, but neither was anyone else.

    ReplyDelete