Intermittent Conjecture: math

Showing posts with label math. Show all posts

Saturday, April 5, 2025

ML-friendly problems and unexpected consequences

This started out as "one more point before I go" in the previous post, but it grew enough while I was getting that one ready to publish that it seemed like it should have its own post.

Where machine learning systems like LLMs do unexpectedly well, like in mimicking our use of language, it might not be because they've developed unanticipated special abilities. Maybe ML being good at generating convincing text says as much about the problem of generating convincing text as it does about the ML doing it.

The current generation of chatbots makes it pretty clear that producing language that's hard to distinguish from what a person would produce isn't actually that hard a problem, if you have a general pattern-matcher (and a lot of training text and computing power). In that case, the hard part, that people have spent decades trying to perfect and staggering amounts of compute power implementing, is the general pattern-matcher itself.

We tend to look at ML systems as problem solvers, and fair enough, but we can also look at current ML technology as a problem classifier. That is, you can sort problems according to whether ML is good at them. From that point of view, producing convincing text, recognizing faces, spotting tumors in radiological images, producing realistic (though still somewhat funny-looking) images and videos, spotting supernovas in astronomical images, predicting how proteins will fold, along with many other problems, are all examples of pattern-matching that a general ML-driven pattern-matcher can solve as well as, or even better than, our own naturally evolved neural networks can.

Not knowing a better term, I'll call these ML-friendly problems. In the previous post, I argued that understanding the structure of natural languages is a separate problem from understanding what meaning natural language is conveying. Pretty clearly, understanding the structure of natural languages is an ML-friendly problem. If you buy that understanding meaning is a distinct problem, I would argue that we don't know one way or another whether it's ML-friendly, partly, I would further argue, because we don't know nearly as much about what that problem involves.

From about 150 years ago into the early 20th century, logicians made a series of discoveries about what we call reasoning and developed formal systems to describe it. This came out of a school of thought, dating back to Leibniz (and as usual, much farther and wider if you look for it), holding that if we could capture rules describing how reasoning worked, we could use those rules to remove all uncertainty from any kind of thought.

Leibniz envisioned a world where, "when there are disputes among persons, we can simply say: Let us calculate, without further ado, to see who is right". That grand vision failed, of course, both because, as Gödel and others discovered, formal logic has inescapable limitations, but also because formal reasoning captures only a small portion of what our minds actually do and how we reason about the world.

Nonetheless, it succeeded in a different sense. The work of early 20th-century logicians was essential to the development of computing in the mid-20th century. For example, LISP -- for my money one of the two most influential programming languages ever, along with ALGOL -- was based directly Church's lambda calculus. I run across and/or use Java lambda expressions on a near-daily basis. For another example, Turing's paper on the halting problem used the same proof technique of diagonalization that Gödel borrowed from Cantor to prove incompleteness, and not by accident.

Current ML technology captures another, probably larger, chunk of what naturally-evolved minds do. Just as formal logic broke open a set of problems in mathematics, ML has broken open a set of problems in computing. Just as formal logic didn't solve quite as wide a range of problems as people thought it might, ML might not solve quite the range of problems people today think it might, but just as formal logic also led to significant advances in other ways, so might ML.

Thursday, March 27, 2025

Losing my marbles over entropy

In a previous post on Entropy, I offered a garbled notion of "statistical symmetry." I'm currently reading Carlo Rovelli's The Order of Time, and chapter two laid out the idea that I was grasping at concisely, clearly and -- because Rovelli is an actual physicist -- correctly.

What follows is a fairly long and rambling discussion of the same toy system as the previous post, of five marbles in a square box with 25 compartments. It does eventually circle back to the idea of symmetry, but it's really more of a brain dump of me trying to make sure I've got the concepts right. If that sounds interesting, feel free to dive in. Otherwise, you may want to skip this one.

In the earlier post, I described a box split into 25 little compartments with marbles in five of the compartments. If you start with, say, all the marbles on one row (originally I said on one diagonal, but that just made things a bit messier) and give the box a good shake, the odds that the marbles all end up in the same row that they started in are low, about one in 50,000 for this small example. So far, so good.

But this is really true for any starting configuration -- if there are twenty-five compartments in a five-by-five grid, numbered from left to right then top to bottom, and the marbles start out in, say, compartments 2, 7, 8, 20 and 24, the odds that they'll still be in those compartments after you shake the box are exactly the same, about one in 50,000.

On the one hand, it seems like going from five marbles in a row to five marbles in whatever random positions they end up in is making the box more disordered. On the other hand, if you just look at the positions of the individual marbles, you've gone from a set of five numbers from 1 to 25 ... to a set of numbers from 1 to 25, possibly the one you started with. Nothing special has happened.

This is why the technical definition of entropy doesn't mention "disorder". The actual definition of entropy is in terms of microstates and macrostates. A microstate is a particular configuration of the individual components of a system, in this case, the positions of the marbles in the compartments. A macrostate is a collection of microstates that we consider to be equivalent in some sense.

Let's say there are two macrostates: Let's call any microstate with all five marbles in the same row lined-up, and any other microstate scattered. In all there are 53,130 microstates (25 choose 5). Of those, five have all the marbles in a row (one for each row), and the other 53,125 don't. That is, there are five microstates in the lined-up microstate and 53,125 in the scattered microstate.

The entropy of a macrostate is related to the number of microstates consistent with that macrostate (for more context, see the earlier post on entropy, which I put a lot more care into). Specifically, it is the logarithm of the number of such states, multiplied by a factor called the Boltzmann constant to make the units come out right and to scale the numbers down, because in real systems the numbers are ridiculously large (though not as large as some of these numbers), and even their logarithms are quite large. Boltzman's constant is 1.380649×10⁻²³ Joules per Kelvin.

The natural logarithm of 5 is about 1.6 and the natural logarithm of 53,125 is about 10.9. Multiplying by Boltzmann's constant doesn't change their relative size: The scattered macrostate has about 6.8 times the entropy of the lined-up macrostate.

If you start with the marbles in the low-entropy lined-up macrostate and give the box a good shake, 10,625 times out of 10,626 you'll end up in the higher-entropy scattered macrostate. Five marbles in 25 compartments is a tiny system, considering that there are somewhere around 10,800,000,000,000,000,000,000,000 molecules in a milliliter of water. In any real system, except cases like very low-temperature systems with handfuls of particles, the differences in entropy are large enough that "10,625 times out of 10,626" turns into "always" for all intents and purposes.

This distinction between microstates and macrostates gives a rigorous basis for the intuition that going from lined-up marbles to scattered-wherever marbles is a significant change, while going from one particular scattered state to another isn't.

In both cases, the marbles are going from one microstate to another, possibly but very rarely the one they started in. In the first case, the marbles go from one macrostate to another. In the second, they don't. Macrostate changes are, by definition, the ones we consider significant, in this case, between lined-up and scattered. Because of how we've defined the macrostates, the first change is significant and the second isn't.

Let's slice this a bit more finely and consider a scenario where only part of a system can change at any given time. Suppose you don't shake up the box entirely. Instead, you randomly choose one of the marbles, take it out and put it back in a random position, including, possibly, the one it came from. In that case, the chance of going from lined-up to scattered is 20 in 21, since (regardless of which marble you chose) out of the 21 positions the marble can end up in, only one, its original position, has the marbles all lined up.

What about the other way around? Of the 53,120 microstates in the scattered macrostate, only 500 have four of the five marbles in one row. For any microstate, there are 105 different ways to take one marble out and replace it: Five marbles times 21 empty places to put it, including the place it came from.

For the 500 microstates with four marbles in a row, only one of those 105 possibilities will result in all five marbles in a row: Remove the lone marble that's not in a row and put it in the only empty place in the row of four. For the other 52,615 microstates in the scattered macrostate, there's no way at all to end up with five marbles lined up by moving only one marble.

So there are 500 cases where the scattered macrostate becomes lined-up, 500*104 cases where it might but doesn't, and 52,615*105 cases where it couldn't possibly. In all, that means that the odds are 11,153.15 to one against scattered becoming lined-up by removing and replacing one marble randomly.

Suppose that the marbles are lined up at some starting time, and every time the clock ticks, one marble gets removed and replaced randomly. After one clock tick, there is a 104 in 105 chance that the marbles will be in the high-entropy scattered state. How about after two ticks? How about if we let the clock run indefinitely -- what portion of the time will the system spend in the lined-up macrostate?

The there are tools to answer questions like this, particularly Markov chains and stochastic matrices (that's the same Markov Chain that can generate random text that resembles an input text). I'll spare you the details, but the answer requires defining a few more macrostates, one for each way to represent the number five as the sum of whole numbers: [5], [4, 1], [3, 2], [3, 1, 1], [2, 2, 1], [2, 1, 1, 1] and [1, 1, 1, 1, 1].

The macrostate [5] comprises all microstates with five marbles in one row, the macrostate [4, 1] comprises all microstates with four marbles in one row and one in another row, the macrostate [2, 2, 1] comprises all microstates with two marbles in one row, two marbles in another row and one marble in a third one, and so forth.

Here's a summary

Macrostate	Microstates	Entropy
[5]	5	1.6
[4,1]	500	6.2
[3,2]	2,000	7.6
[3,1,1]	7,500	8.9
[2,2,1]	15,000	9.6
[2,1,1,1]	25,000	10.1
[1,1,1,1,1]	3,125	8.0

The Entropy column is the natural logarithm of the Microstates column, without multiplying by Boltzmann's constant. Again, this is just to give a basis for comparison. For example [2,1,1,1] is the highest-entropy state, and [2,2,1] has four times the entropy of [5].

It's straightforward, but tedious, to count the number of ways one macrostate can transition to another. For example, of the 105 transitions for [3,2], 4 end up in [4,1], 26 end up back in [3,2] (not always by putting the removed marble back where it was), 30 end up in [3, 1, 1] and 45 end up in [2, 2, 1]. Putting all this into a matrix and taking the matrix to the 10th power (enough to see where this is converging) gives

Macrostate	% time	% microstates
[5]	.0094	.0094
[4,1]	.94	.94
[3,2]	3.8	3.8
[3,1,1]	14	14
[2,2,1]	28	28
[2,1,1,1]	47	47
[1,1,1,1,1]	5.9	5.9

The second column is the result of the tedious matrix calculations. The third column is just the size of the macrostate as the portion of the total number of microstates. For example, there are 500 microstates in [4,1], which is 0.94% of the total, which is also the portion of the time that the matrix calculation says system will spend in [4, 1]. Technically, this means the system is ergodic, which means I didn't have to bother with the matrix and counting all the different transitions.

Even in this toy example, the system will spend very little of its time in the low-entropy lined-up state [5], and if it ever does end up there, it won't stay there for long.

Given some basic assumptions, a system that evolves over time, transitioning from microstate to microstate, will spend the same amount of time in any given microstate (as usual, that's not quite right technically), which means that the time spent in each macrostate is proportional to its size. Higher-entropy states are larger than lower-entropy states, and because entropy is a logarithm, they're actually a lot larger.

For example, the odds of an entropy decrease of one millionth of a Joule per Kelvin are about one in e^(10¹⁷). That's a number with somewhere around 40 quadrillion digits. To a mathematician, the odds still aren't zero, but to anyone else they would be.

For all but the tiniest, coldest systems, the chance of entropy decreasing even by a measurable amount are not just small, but incomprehensibly small. The only systems where the number of microstates isn't incomprehensibly huge are are small collections of particles near absolute zero.

I'm pretty sure I've read about experiments where such a system can go from a higher-entropy state to a very slightly lower-entropy state and vice versa, though I haven't had any luck tracking them down. Even if no one's ever done it, such a system wouldn't violate any laws of thermodynamics, because the laws of thermodynamics are statistical (and there's also the question of definition over whether such a system is in equilibrium).

So you're saying ... there's a chance? Yes, but actually no, in any but the tiniest, coldest systems. Any decrease in entropy that could actually occur in the real world and persist long enough to be measured would be in the vicinity of 10⁻²³ Joules per Kelvin, which is much, much too small to be measured except under very special circumstances.

For example, if you have 1.43 grams of pure oxygen in a one-liter container at standard temperature and pressure, it's very unlikely that you know any of the variables involved -- the mass of the oxygen, its purity, the size of the container, the temperature or the pressure, to even one part in a billion. Detecting changes 100,000,000,000,000 times smaller than that is not going to happen.

But none of that is what got me started on this post. What got me started was that the earlier post tried to define some sort of notion of "statistical symmetry", which isn't really a thing, and what got me started on that was my coming to understand that higher-entropy states are more symmetrical. That in turn was jarring because entropy is usually taken as a synonym for disorder, and symmetry is usually taken as a synonym for order.

Part of the resolution of that paradox is that entropy is a measure of uncertainty, not disorder. The earlier post got that right, but evidently that hasn't stopped me from hammering on the point for dozens more paragraphs and a couple of tables in this one, using a slightly different marbles-in-compartments example.

The other part is that more symmetry doesn't really mean more order, at least not in the way that we usually think about it.

From a mathematical point of view, a symmetry of an object is something you can do to it that doesn't change some aspect of the object that you're interested in. For example, if something has mirror symmetry, that means that it looks the same in the mirror as it does ordinarily.

It matters where you put the mirror. The letter W looks the same if you put a mirror vertically down the middle of it -- it has one axis of symmetry. The letter X looks the same if you put the mirror vertically in the middle, but it also looks the same if you put it horizontally in the middle -- it has two axes of symmetry.

Another way to say this is that if you could draw a vertical line through the middle of the W and rotate the W out of the page around that line, and kept going for 180 degrees until the W was back in the page, but flipped over, it would still look the same. If you chose some other line, it would look different (even if you picked a different vertical line, it would end up in a different place). That is, if you do something to the W -- rotate it around the vertical line through the middle -- it ends up looking the same. The aspect you care about here is how the W looks.

To put it somewhat more rigorously: if f is the particular mapping that takes each point to its mirror image across the axis, then f takes the set of points in the W to the exact same set of points. Any point on the axis maps to itself, and any point off the axis maps to its mirror image, which is also part of the W. The map f is defined for every point on the plane and it moves all of them except for the axis. The aspect we care about, which f doesn't change, is whether a particular point is in the W.

If you look at all the things you can do to an object without changing the aspect you care about, you have a mathematical group. For a W, there are two things you can do: leave it alone and flip it over. For an X, you have four options: leave it alone, flip it around the vertical axis, flip it around the horizontal axis, or do both. Leaving an object alone is called the identity transformation, and it's always considered a symmetry, because math. An asymmetrical object has only that symmetry (it's symmetry group is trivial).

In normal speech, saying something is symmetrical usually means it has the same symmetry group as a W -- half of it is a mirror image of the other half. Technically, it has bilateral symmetry. In some sense, though, an X is more symmetrical, since its symmetry group is larger, and a hexagon, which has 12 elements in its symmetry group, is more symmetrical yet.

A figure with 19 sides, each of which is the same lopsided squiggle, would have a symmetry group of 19 (rotate by 1/19 of a full circle, 2/19 ... 18/19, and also don't rotate at all). That would make it more symmetrical than a hexagon, and quite a bit more symmetrical than a W, but if you asked people which was most symmetrical, they would probably put the 19-sided squigglegon last of the three.

Our visual system is mostly trained to recognize bilateral symmetry. Except for special situations like reflections in a pond, pretty much everything in nature with bilateral symmetry is an animal, which is pretty useful information when it comes to eating and not being eaten. We also recognize rotational symmetry, which includes flowers and some sea creatures, also useful information, but even these also tend to have bilateral symmetry as well.

It would make sense, then, that in day to day life, "more symmetrical" generally means "closer to bilateral symmetry". If a house has an equal number of windows at the same level on either side of the front door, we think of it as symmetrical, even though the windows may not be exactly the same, the door itself probably has a doorknob on one side or the other and so forth, so it's not quite exactly symmetrical. We'd still say it's pretty symmetrical, even though from a mathematical point of view it either has bilateral symmetry or it doesn't (and in the real world, nothing we can see is perfectly symmetrical).

That should go some way toward explaining why, along with so many other things, symmetry doesn't necessarily mean the same thing in its mathematical sense as it does ordinarily. The mathematical definition includes things that we don't necessarily think of as symmetry.

Continuing with shapes and their symmetries, you can think of each shape as a macrostate. You can associate a microstate with each mapping (technically, in this case, any rigid transformation of the plane) that leaves the shape unchanged. The macrostate W has two microstates: one for the identity transformation, which leaves the plane unchanged, and one for the mirror transformation around the W's axis.

The X macrostate has four microstates, one for the identity, one for the flip around the vertical axis, one for the flip around the horizontal axis, and one for flipping around one axis and then the other (in this case, it doesn't matter what order you do it in). The X macrostate has a larger symmetry group, which is the same as saying it has more entropy.

In this context, a symmetry is something you can do to the microstate without changing the macrostate. A larger symmetry group -- more symmetry -- means more microstates for the same macrostate, which means more entropy, and vice-versa. They're two ways of looking at the same thing.

In the case of the marbles in a box, a symmetry is any way of switching the positions of the marbles, including not switching them around at all. Technically, this is a permutation group.

For any given microstate, some of the possible permutations just switch the marbles around in their places (for example, switching the first two marbles in a lined-up row), and some of them will move marbles to different compartments. For a microstate of the lined-up macrostate [5], there are many fewer permutations that leave the marbles in the same macrostate (all in one row, though not necessarily the same row) than there are for [2, 1, 1, 1]. Even though five marbles in a row looks more symmetrical, since it happens to have bilateral visual symmetry, it's actually a much less symmetrical macrostate than [2, 1, 1, 1], even though most of its microstates will just look like a jumble.

In the real world, distributing marbles in boxes is really distributing energy among particles, generally a very large number of them. Real particles can be in many different states, many more than the marble/no marble states in the toy example, and different states can have the same energy, which makes the math a bit more complicated. Switching marbles around is really exchanging energy among particles, and there are all sorts of intricacies about how that happens.

Nonetheless, the same basic principles hold: Entropy is a measure of the number of microstates for a given macrostate, and a system in equilibrium will evolve toward the highest-entropy macrostate available, and stay there, simply because the probability of anything else happening is essentially zero.

And also yeah, symmetry doesn't necessarily mean what you think it might.

Monday, September 14, 2020

How real are real numbers?

There is always one more counting number.

That is, no matter how high you count, you can always count one higher. Or at least in principle. In practice you'll eventually get tired and give up. If you build a machine to do the counting for you, eventually the machine will break down or it will run out of capacity to say what number it's currently on. And so forth. Nevertheless, there is nothing inherent in the idea of "counting number" to stop you from counting higher.

In a brief sentence, which after untold work by mathematicians over the centuries we now have several ways to state completely rigorously, we've described something that can exceed the capacity of the entire observable universe as measured in the smallest units we believe to be measurable. The counting numbers (more formally, the natural numbers) are infinite, but they can be defined not only by finite means, but fairly concisely.

There are levels of infinity beyond the natural numbers. Infinitely many, in fact. Again, there are several ways to define these larger infinities, but one way to define the most prominent of them, based on the real numbers, involves the concept of continuity or, more precisely, completeness in the sense that the real numbers contain any number that you can get arbitrarily close to.

For example, you can list fractions that get arbitrarily close to the square root of two: 1.4 (14/10) is fairly close, 1.41 (141/100) is even closer, 1.414 (1414/1000) is closer still, and if I asked for a fraction within one one-millionth, or trillionth, or within 1/googol, that is, one divided by ten to the hundredth power, no problem. Any number of libraries you can download off the web can do that for you.

Nonetheless, the square root of two is not itself the ratio of two natural numbers, that is, it is not a rational number (more or less what most people would call a fraction, but with a little more math in the definition). The earliest widely-recognized recorded proof of this goes back to the Pythagoreans. It's not clear exactly who else also figured it out when, but the idea is certainly ancient. No matter how closely you approach the square root of two with fractions, you'll never find a fraction whose square is exactly two.

OK, but why shouldn't the square root of two be a number? If you draw a right triangle with legs one meter long, the hypotenuse certainly has some length, and by the Pythagorean theorem, that length squared is two. Surely that length is a number?

Over time, there were some attempts to sweep the matter under the rug by asserting that, no, only rational numbers are really numbers and there just isn't a number that squares to two. That triangle? Dunno, maybe its legs weren't exactly one meter long, or it's not quite a right triangle?

This is not necessarily as misguided as it might sound. In real life, there is always uncertainty, and we only know the angles and the lengths of the sides approximately. We can slice fractions as finely as we like, so is it really so bad to say that all numbers are rational, and therefore you can't ever actually construct a right triangle with both legs exactly the same length, even if you can get as close as you like?

Be that as it may, modern mathematics takes the view that there are more numbers than just the rationals and that if you can get arbitrarily close to some quantity, well, that's a number too. Modern mathematics also says there's a number that squares to negative one, which has its own interesting consequences, but that's for some imaginary other post (yep, sorry, couldn't help myself).

The result of adding all these numbers-you-can-get-arbitrarily-close-to to the original rational numbers (every rational number is already arbitrarily close to itself) is called the real numbers. It turns out that (math-speak for "I'm not going to tell you why", but see the post on counting for an outline) in defining the real numbers you bring in not only infinitely many more numbers, but so infinitely many more numbers that the original rational numbers "form a set of measure zero", meaning that the chances of any particular real number being rational are zero (as usual, the actual machinery that allows you to apply probabilities here is a bit more involved).

To recap, we started with the infinitely many rational numbers -- countably infinite since it turns out that you can match them up one-for-one with the natural numbers* -- and now we have an uncountably infinite set of numbers, infinitely too big to match up with the naturals.

But again we did this with a finite amount of machinery. We started with the rule "There is always one more counting number", snuck in some rules about fractions and division, and then added "if you can get arbitrarily close to something with rational numbers, then that something is a number, too". More concisely, limits always exist (with a few stipulations, since this is math).

One might ask at this point how real any of this is. In the real world we can only measure uncertainly, and as a result we can generally get by with only a small portion of even the rational numbers, say just those with a hundred decimal digits or fewer, and for most purposes probably those with just a few digits (a while ago I discussed just how tiny a set like this is). By definition anything we, or all of the civilizations in the observable universe, can do is literally as nothing compared to infinity, so are we really dealing with an infinity of numbers, or just a finite set of rules for talking about them?

One possible reply comes from the world of quantum mechanics, a bit ironic since the whole point of quantum mechanics is that the world, or at least important aspects of it, is quantized, meaning that a given system can only take on a specific set of discrete states (though, to be fair, there are generally a countable infinity of such states, most of them vanishingly unlikely). An atom is made of a discrete set of particles, each with an electric charge that's either 1, 0 or -1 times the charge of the electron, the particles of an atom can only have a discrete set of energies, and so forth (not everything is necessarily quantized, but that's a discussion well beyond my depth).

All of this stems from the Schrödinger Equation. The discrete nature of quantum systems comes from there only being a discrete set of solutions to that equation for a particular set of boundary conditions. This is actually a fairly common phenomenon. It's the same reason that you can only get a certain set of tones by blowing over the opening of a bottle (at least in theory).

The equation itself is a partial differential equation defined over the complex numbers, which have the same completeness property as the real numbers (in fact, a complex number can be expressed as a pair of real numbers). This is not an incidental feature, but a fundamental part of the definition in at least two ways: Differential equations, including the Schrödinger equation, are defined in terms of limits, and this only works for numbers like the reals or the complex numbers where the limits in question are guaranteed to exist. Also, it includes π, which is not just irrational, but transcendental, which more or less means it can only be defined as a limit of an infinite sequence.

In other words, the discrete world of quantum mechanics, our best attempt so far at describing the behavior of the world under most conditions, depends critically on the kind of continuous mathematics in which infinities, both countable and uncountable, are a fundamental part of the landscape. If you can't describe the real world without such infinities, then they must, in some sense, be real.

Of course, it's not actually that simple.

When I said "differential equations are defined in terms of limits", I should have said "differential equations can be defined in terms of limits." One facet of modern mathematics is the tendency to find multiple ways of expressing the same concept. There are, for example, several different but equivalent ways of expressing the completeness of the real numbers, and several different ways of defining differential equations.

One common technique in modern mathematics (a technique is a trick you use more than once) is to start with one way of defining a concept, find some interesting properties, and then switch perspective and say that those interesting properties are the actual definition.

For example, if you start with the usual definition of the natural numbers: zero and an "add one" operation to give you the next number, you can define addition in terms of adding one repeatedly -- adding three is the same as adding one three times, because three is the result of adding one to zero three times. You can then prove that addition gives the same result no matter what order you add numbers in (the commutative property). You can also prove that adding two numbers and then adding a third one is the same as adding the first number to the sum of the other two (the associative property).

Then you can turn around and say "Addition is an operation that's commutative and associative, with a special number 0 such that adding 0 to a number always gives you that number back." Suddenly you have a more powerful definition of addition that can apply not just to natural numbers, but to the reals, the complex numbers, the finite set of numbers on a clock face, rotations of a two-dimensional object, orderings of a (finite or infinite) list of items and all sorts of other things. The original objects that were used to define addition -- the natural numbers 0, 1, 2 ... -- are no longer needed. The new definition works for them, too, of course, but they're no longer essential to the definition.

You can do the same thing with a system like quantum mechanics. Instead of saying that the behavior of particles is defined by the Schrödinger equation, you can say that quantum particles behave according to such-and-such rules, which are compatible with the Schrödinger equation the same way the more abstract definition of addition in terms of properties is compatible with the natural numbers.

This has been done, or at least attempted, in a few different ways (of course). The catch is these more abstract systems depend on the notion of a Hilbert Space, which has even more and hairier infinities in it than the real numbers as described above.

How did we get from "there is always one more number" to "more and hairier infinities"?

The question that got us here was "Are we really dealing with an infinity of numbers, or just a finite set of rules for talking about them?" In some sense, it has to be the latter -- as finite beings, we can only deal with a finite set of rules and try to figure out their consequences. But that doesn't tell us anything one way or another about what the world is "really" like.

So then the question becomes something more like "Is the behavior of the real world best described by rules that imply things like infinities and limits?" The best guess right now is "yes", but maybe the jury is still out. Maybe we can define a more abstract version of quantum physics that doesn't require infinities, in the same way that defining addition doesn't require defining the natural numbers. Then the question is whether that version is in some way "better" than the usual definition.

It's also possible that, as well-tested as quantum field theory is, there's some discrepancy between it and the real world that's best explained by assuming that the world isn't continuous and therefore the equations to describe it should be based on a discrete number system. I haven't the foggiest idea how that could happen, but I don't see any fundamental logical reason to rule it out.

For now, however, it looks like the world is best described by differential equations like the Schrödinger equation, which is built on the complex numbers, which in turn are derived from the reals, with all their limits and infinities. The (provisional) verdict then: the real numbers are real.

* One crude way to see that the rational numbers are countable is to note that there are no more rational numbers than there are pairs of numerator and denominator, each a natural number. If you can count the pairs of natural numbers, you can count the rational numbers, by leaving out the pairs that have zero as the denominator and the pairs that aren't in lowest terms. There will still be infinitely many rational numbers, even though you're leaving out an infinite number of (numerator, denominator) pairs, which is just a fun fact of dealing in infinities. One way to count the pairs of natural numbers is to put them in a grid and count along the diagonals: (0,0), (1,0), (0,1), (2,0), (1,1), (0, 2), (3,0), (2,1), (1,2), (0,3) ... This gets every pair exactly once.

All of this is ignoring negative rational numbers like -5/42 or whatever, but if you like you can weave all those into the list by inserting a pair with a negative numerator after any pair with a non-zero numerator: (0,0), (1,0), (-1,0) (0,1), (2,0), (-2, 0), (1,1), (-1,1) (0, 2), (3,0), (-3, 0) (2,1), (-2, 1), (1,2), (-1,2) (0,3) ... Putting it all together, leaving out the zero denominators and not-in-lowest-terms, you get (0,1), (1,1), (-1, 1),(2,1),(-2,1),(1,2),(-1,2),(3,1),(-3,1),(1,3),(-1,3) ...

Another, much more interesting way of counting the rational numbers is via the Farey Sequence.

Tuesday, April 29, 2014

This is not your uncle Benoit's Mandelbrot set

I've just thrown away a previous draft of this post, in which I tried to make some overarching point about the interaction between technology and art, and tie that in to the current generation of 3D fractal art. That didn't work out so well, so maybe I'll just give a few impressions. All in my personal opinion, of course. I don't pretend to be an art connoisseur.

Early fractal art worked mainly as eye candy. A typical Mandelzoom was an assault of wild shapes and saturated colors, looking vaguely organic, or perhaps suggesting a paisley print, or maybe something you might see when you rubbed your eyes. Striking images. Trippy, beautiful, even.

As with any algorithmic art, there is a vital human element that gives them the claim to be called "art" -- whether good, or bad, whether your cup of tea or not. Someone took the time to search the infinity of the Mandelbrot set for an image to their liking, to frame and color it, and to present it. While not strictly fractal art, Electric sheep is an interesting example of this human-algorithm interplay, on two levels. On the one hand, someone (Scott Draves) came up with the idea and got it going. On the other hand, the actual curation is done collectively by thousands of people on the web. Very cool. I've got it as live wallpaper on my phone as I write.

But ... it's all sort of ... detached. It's cool, without a doubt, but it's also cool in the sense of "aloof". I get the impression of a mind at work, but an emotionless, alien mind. Maybe I'm biased by knowing too much of the mechanics behind it all. I'm sure others have warmer impressions, or more easily see familiar objects like trees or flowers, but to me, even when the images suggest some sort of life form, they look like something other. Which, I'm sure, is exactly the point to a lot of folks.

In 2009, after many not-quite-successful experiments by many people, Daniel White hit upon the Mandelbulb, a way to generalize the process behind the Mandelbrot set to three dimensions in a way that people generally agreed was as cool as the original 2D set. In 2010, Tom Lowe came up with a different way of generalizing to higher dimensions, called the Mandelbox, which was also deemed worthy. Strictly speaking, both the Mandelbulb and Mandelbox are actually families of shapes with infinitely many possible parameter settings, some cooler than others, of which the artist is free to pick the coolest, but they're generally referred to as single entities.

Being truly three-dimensional (in that they're embedded in three-dimensional space), these sets offer possibilities well beyond those of two-dimensional fractals. While some 2D fractals may suggest depth, these have it. A video rendering is truly a trip through an alien space. Technically, any fractal (or at least any non-linear fractal, where the structure doesn't simply repeat as you zoom in) has infinitely elaborate detail, but these 3D fractals somehow seem to have more infinitely elaborate detail.

They can also seem more organic, particularly if you play with the formula a bit. Note the images at the bottom. They look like they grew, rather than having been produced by a purely mechanical process. Even the more mechanical-looking images seem more made than found. In a normal Mandelzoom, it feels like a computer (or math itself) provided the image. Many of the 3D structures give the strong impression someone made them and the algorithmic process managed to stumble upon them -- see the intricacy of the design, the attention to detail, the exotic esthetic -- or even that they somehow made themselves.

If anything, the feeling of alien-ness is even stronger than for a 2D mathscape. Even the most organic-looking forms, though they may seem like they ought to be part of some coral reef or fungal growth, seem about as foreign as they could look and still seem familiar. No puppy dogs or Old Master still lifes here.

The artist has a fair bit of latitude in choosing the color palate, location, scale, viewing angle, lighting and so forth, but the underlying shape is still the underlying shape. Still, there's only so much you can do with a landscape so fundamentally bizarre, either to make it more familiar or more otherworldly [many 3D Mandelpictures have some sort of height map or similar manipulation that can impose a chosen shape on the underlying contours of the fractal. Old-fashioned texture mapping and other rendering techniques can be brought to bear as well, along with compositing and other image manipulation techniques. The artistic effect is to make the fractal more a technique than an end in itself, which seems overall like a good development -- D.H.]

What is it about these forms, both 2D and 3D, that resonates so strangely? What does it say about our brains that we should see these images the way we do?

It's not shocking that fractal forms should remind us of living things or other natural objects. Fractal geometry was invented in part to describe the wide variety of natural shapes that didn't fit into the regular categories of classical geometry.

The name "fractal" itself comes from the notion of a fractional dimension, which is ubiquitous in nature once you look for it. There are several definitions of dimensionality that can take fractional values, but the one usually used with fractals is the Hausdorff dimension. By that standard, the coastline of Great Britain, for example, has been measured to have a dimension of 1.25, the coastline of Norway 1.52. Galaxy clustering comes in around 2, cauliflower around 2.3, and the surface of the human brain around 2.8. Interestingly, the Mandelbrot set itself has a Hausdorff dimension of exactly 2 -- as does its boundary.

As odd as a fractional dimension may sound, it makes a certain kind of sense. A coastline certainly isn't 2-dimensional, but neither is it quite a straight line or smooth curve. Because we see these kinds of shapes all the time -- trees, clouds, veins in leaves, mountains, piles of pebbles, coral, broccoli ... it's not a great leap to think we would respond to something artificial with the same general property. And on the other hand, because they're not exactly like anything we naturally encounter, it's natural to think of them as alien.

In one sense that feeling of some of the 3D forms being organisms or artifacts themselves is a property of the sets themselves, but equally so is it in the eye (or the perceptual machinery behind it) of the beholder and of the artist selecting and rendering the scene. It would be interesting to show a bunch of people random images from the Mandlethingies and ask "Does that look natural, artificial, or other?"

Then follow up with "If this looks artificial to you, who do you think made it?"

And why?

Thursday, October 24, 2013

Arising by chance

Suppose you had a billion dice. How many times would you expect to roll them before you got all sixes? That would be six to the billionth power, or about ten to the 780 millionth, that is, a one with 780 million zeroes after it. As big numbers go, that's bigger than astronomical, but still something you could print out, if only in tiny digits on a very big sheet of paper. It's smaller than the monstrously big numbers I've discussed previously. Archimedes' system could have handled it (see this post on big numbers for more details on all that).

"Bigger than astronomical" means that there's essentially no chance that anyone will ever see a billion dice randomly come up all sixes, even if, say, we set every person alive to rolling a die over and over again, and on through the generations, even if we somehow colonized the galaxy with hordes of dice-rolling humans.

Now suppose that instead of rolling all the dice repeatedly, we just re-roll the ones that didn't come up sixes. In that case, a bit more than 100 rolls will do. Why? With the first roll, about a sixth of the dice -- around 167 million, will come up sixes. On the second roll, around a sixth of the 833 million or so remaining, or about 139 million, will come up sixes, leaving about 694 million. Since we're rolling random dice here, these numbers won't be exact, but because we're rolling a whole bunch of dice, they'll be pretty close, percentage-wise. With each roll there are about 5/6 as many dice left to roll as with the roll before.

At some point, you can no longer assume that close to 1/6 of the dice will come up sixes, but after 100 rolls you should be down to about a dozen, and it won't take too long to get the rest.

One more game before I explain what I'm up to: Same billion dice, but this time, after an initial roll, you pick one die at random and roll it if it's not a six. How many times do you have to do this pick-and-roll (sorry) before you have a complete set of sixes?

At the beginning, you have about 833 million non-sixes and it will take about seven tries before you change one of them to a six. As more and more dice get changed to sixes, it gets harder and harder to find one that isn't already there. The last die will take about 6 billion tries -- you'll need to roll it about six times, but you'll only get a chance to one in a billion tries. All told, according to Wolfram Alpha's handy sum calculator, it will take about 20 billion tries before you get all your sixes. That's not something you could do in an afternoon. If you could do one try every second, it would take somewhat more than 600 years. Not really feasible, but not unimaginable.

If we want to talk about something arising by a random process, it matters, and it matters a lot, what kind of random process we're talking about. In a purely random process, where everything is re-done from scratch at every step, most interesting results will be completely, beyond-astronomically unlikely. But a process can proceed randomly and still produce a highly-ordered result with very high probability, as long as there is some sort of state preserved from one step to the next.

For example, when sugar crystalizes out of sugar water to make rock candy, it is for all practical purposes completely random which sugar molecule sticks to which part of the growing crystal at any given point. And yet, the crystal will grow, and grow in a highly, though not completely, predictable fashion, all without violating any laws of thermodynamics.

The end result will be something that would be completely implausible if sugar molecules behaved completely randomly, but they don't. They behave essentially randomly when drifting around in a solution, but not when near a regular surface of other sugar molecules that's already there. With each molecule added to the crystal, it's that much easier for the next one to find a place to attach (until enough sugar has crystalized out that the system reaches equilibrium).

Put another way, there is no single such thing as a random process. There are infinitely many varieties of random process, some with more or less non-random state than others. It's not meaningful to ask whether something could have arisen at random without specifying what kind of random processes we're talking about.

Saturday, April 6, 2013

Big Numbers

Warning: This post is fairly long and contains a lot more mathematical notation and jargon than I generally like to include, even when the topic is mathematical. In this particular case it seems pretty near unavoidable, so I'm hoping that the extra work to the reader is worth it.

A wise man once said of numbers "There are too many of them" (or words to that effect). To which I'd add "and most of them are too big."

In our minds, we have some concept of numbers beyond merely seeing two apples and two oranges and knowing there is one apple for each orange and vice versa. If I showed you a pile of coins, say, and asked if there were a dozen or so, a hundred or so, or a thousand or so in the pile, you could probably make a reasonable guess as to which was the case, even without knowing the exact number. Along with a concept of numbers, we have a rough concept of their size. Which makes sense.

Mathematicians have made the concept of number rigorous in a variety of ways. I've described some of them, particularly the natural numbers, in a previous post on counting. The natural numbers are 0, 1, 2 and so forth on and on forever. They answer the question "How many (of some kind of discrete object)?", which requires a whole number that may be zero but can't be negative. They generally can't answer "How much (of some substance)?", which need not come out to a whole number of whatever unit you're using, or "What was my score for the first nine holes?" which will be a whole number of strokes but might be negative.

In some sense, the naturals the are the simplest kinds of numbers. All other numbers can be defined in terms of them. They certainly seem friendly and familiar at first blush. I aim here to show that the natural numbers we're comfortable dealing with -- 0, 1, 2, 42, 1024, 14 trillion or whatever -- are an insignificant part of the mathematical picture. Almost all numbers are far, far to big for us to comprehend in any meaningful way.

Big numbers for puny humans

Let's start with human-sized big numbers. A thousand might seem like a big number, but if you think about it, it's pretty small, even in puny human terms:

You can count to 1000 in a few minutes
Crowds of a thousand are commonplace
You've probably met more than a thousand people in your life
Communities of a thousand or more are commonplace
If you can read this, you're almost certainly more than a thousand days old
This post has over a thousand words

and so forth.

A million is big, but still not all that big. It is, though, probably the biggest round number that most of us can relate to directly, even if it may take a little effort.

If you spray a reasonably large picture window with water, there are likely a million or so droplets visible on it.
You can see see an individual pixel on a megapixel display.
Crowds of a million people or more have gathered on many occasions.
Many people, though certainly not most, will make $1 million in their lifetimes.
$1 million in $100 bills doesn't take a lot of space. Even in dollar bills, it will fit in one room of a house.
Many people have practiced some basic skill a million times. For example, a typical professional basketball player has almost certainly made a million baskets (counting practice shots); I once met a baseball scout who quite plausibly claimed to have driven a million miles; I've probably written millions of words, all told.

A billion is probably not readily comprehensible, but examples are not rare

Human population is a few billion. Two countries have populations over a billion.
A billion dollars is about ten dollars per US household.
RAM is currently measured in gigabytes.

A trillion is probably the biggest with well-known commonplace examples

Large economies are measured in trillions of dollars.
You can buy terabyte disks at the store.
You have trillions of cells in your body (and even more bacterial cells).

It's perfectly possible to distinguish a million from billion from trillion, but it requires conscious computation ... oh, a gigabyte of disk will hold a thousand photos of a megabyte each.

A trillion, probably even a billion, is beyond the human scale. We can speak of trillions of cells or bacteria in the human body, but these are abstractions. No one has actually counted them, whereas a moderate team of people could count millions of objects (as happens during elections, for example).

Astronomical numbers

The physical universe goes well beyond this scale. There's a reason we talk of "astronomical" numbers.

The observable universe (using "comoving coordinates") is around 1,000,000,000,000,000,000,000,000,000 meters in diameter (one octillion, in US terms as I'll use here, or one thousand quadrillion in "long scale" terms). When dealing with numbers this big, we generally just count the digits, and say, for example, 10²⁷.
There are 6×10²³ (Avogadro's number) atoms in a gram of hydrogen (assuming it's purely the light isotope, which it won't be).
There are somewhere around 10⁵⁷ atoms in the sun
The smallest distance that can possibly be measured, even in principle, according to quantum theory, is called the Planck length (after Max Planck). In practice, no one has come even remotely close to directly measuring anything at that small a scale. The volume of the observable universe in Planck volumes (cubic Planck lengths), that is, the biggest volume we know of measured in the smallest measurable volume units, is on the order of 10¹⁸⁴.

Consider that last number. Ten is a nice, familiar number. One hundred eighty-four is not intimidating. So 10¹⁸⁴can't be that big a deal, right? Well, let's try to put it in terms we can understand. I've claimed a million is about as big a number as we can really grasp, but maybe we can build up from there. I have well over a thousand digital pictures, and on average there are a few million pixels in each, so it's not too hard to imagine what a billion pixels would look like. You can find pictures of a million people gathering, so imagine that each one of them has a similar image collection. That's a quadrillion pixels -- a thousand million million. Not bad.

With a little more thought, one could probably put together images for, say, a million trillion or a billion billion, but even my image for a billion pixels is stretching. I can plausibly say that if I'm looking at a computer monitor I can imagine that each of the pixels is its own unit (it's probably not coincidence that the human eye has megapixel resolution, more or less). If I'm imagining a thousand photos, though, I'm not imagining each of their pixels as a separate unit. The unit is photos, not pixels. At best there's the implication that I could mentally switch scales and think of pixels, at which point the photos fade into abstraction.

If I'm imagining a crowd of a million people with a thousand megapixel photos each, it's harder to argue that I'm really keeping the whole image in my head. The technique of imagining collections of collections of collections gets less and less useful as we approach the limits of human short-term memory, typically seven or so items. Maybe, maybe, a person could claim to comprehend a million million million million million million million things. Maybe.

But 10¹⁸⁴ is ten thousand million million million million million million million million million million million million million million million million million million million million million million million million million million million million million million.

Now we're ready to talk about some big numbers.

Archimedes' Sand Reckoner

In The Sand Reckoner, Archimedes set out to do the same kind of measurement as above, of the biggest known volume in the smallest units. He wanted to measure the number of sand grains it would take to fill the universe as he understood it. This was a universe with the Sun (not the Earth) at the center, and the fixed stars in a sphere far enough out that they did not seem to move as the Earth went around the Sun. In modern terms, it was about a light-year in radius, not too shabby coming from someone from a world with no telescopes or motorized transportation.

The number system in use at the time could count up to myriad, or 10,000. Archimedes called the numbers from 1 to myriad the first numbers and called myriad the unit of the first numbers. The second numbers were myriad, two myriads, three myriads and so on up to a myriad myriads (one hundred million), which he called the unit of the second numbers. Likewise, the unit of the third numbers was that many myriads (one trillion), and so on up to the unit of the myriadth numbers.

This is a decent-sized number. It's a one with 800 million zeroes after it, much bigger than the size of the modern universe in Planck volumes. But Archimedes went further yet. He called the numbers up to this number the first period, and the number itself the unit of the first period. From that, of course, you can define the second period, and so on. Archimedes went on to the myriad-myriadth period, ending with 10^8×10¹⁶, or a one with 80 quadrillion zeroes after it. That's a huge number, but you could still print it if you had, say, enough paper to cover the surface of the Earth (and a printer to match).

As it turns out, this was overkill. Archimedes calculated the volume of his universe in sand grains to be 10⁶³, a number so small I can write it right here:

1,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000

Archimedes was trying to make a point. The grains of sand are not beyond reckoning, even if no one person could possibly count them. They are not infinite in number.

Building big numbers

To build his number system, Archimedes used a simple but powerful approach:

Start with some set of numbers
Define a rule for making bigger numbers
Apply that rule repeatedly to get a larger set of numbers
Repeat, using the new, larger set as a starting point

A slightly different way of looking at this is

Start with some set of numbers
Define a rule for making bigger numbers
Define a new rule, namely applying the first rule repeatedly, to get a second rule
Define a third rule from the second rule, and so on, repeatedly

For example, start with the numbers from 1 to N, and the simple rule of adding one

Starting with N and adding 1 N times gives you N + N, so the new rule is "add N"
Starting with N and adding N N times gives you N x N, so the new rule is "multiply by N"
Starting with N and multiplying by N N times gives you N^N, so the new rule is "take to the Nth power"

There are various names and notations for what happens next, repeatedly taking to the Nth power, but it will get you very big numbers from small numbers very quickly. You can actually do this two different ways:

First take N^N, then take that to the N, and so forth. If N is 3, you get (3³)³, which is 27³, or 19683
Take N^N, and then take N to that power, and so forth. If N is 3, then you get 3^3³, which is 3²⁷, which is 7625597484987

Since the second way gets bigger numbers faster, let's do it that way.

4^(4⁴) is

13,407,807,929,942,597,099,574,024,998,205,846,127,479,365,820,592,393,377,723,561,443,721,764,030,073,546,976,801,874,298,166,903,427,690,031,858,186,486,050,853,753,882,811,946,569,946,433,649,006,084,096

That's big, but 4^{4^4⁴} is 4 times itself that many times. The number of digits in the result is itself a number with 154 digits. This is already much, much, much bigger than Archimedes' big number. Much too big to write down, even if we used the universe for paper and subatomic particles for ink. And this is just from applying the third rule on the list to a small number.

Let's not even talk about 5^{5^{5^5⁵}}, or trying to do the same thing with myriad.

[In an earlier version, I'd taken the first option. It still ended up with big numbers, but not as dramatically. (4⁴)⁴ is 256⁴, or 4,294,967,296, rather than the monster above, and 4,294,967,296⁴ is 340,282,366,920,938,463,463,374,607,431,768,211,456. This is astronomically large, but not too-big-to-fit-in-the-known-universe large. Even (((5⁵)⁵)⁵)⁵ has only 437 digits --D.H. Mar 2022]

But of course, you can repeat the rule building process itself as much as you like. What if I start with 4^{4^4⁴} -- let's call it Fred -- and apply the Fredth rule to it. Call that number Barney. Now apply the Barneyth rule to Barney. And so on. Just as you can always add one to a number to get a bigger one, you can repeat any hairy big-number-building process some hairy big number of times to get an unimaginably hairier big-number-building process. Or rather, you can define a process for defining processes and so forth. You could never, ever come close to actually writing out that process.

(If all this seems similar to the process for building ordinal numbers that I described in the counting post, that's because it is similar.)

Ackermann's function

Ackermann's function, generally denoted A(m,n), boils this whole assembly down to three very simple rules which can generate mind-bogglingly big numbers much more quickly than what I've described so far. Since we're in full-on math mode in this post, here's the usual definition (other variants are also used, but this one is as monstrous as any of them):

A(m, n) =

n + 1, if m = 0
A(m - 1, 1), if m > 0 and n = 0
A(m - 1, A(m, n - 1)) otherwise

We're just adding and subtracting one. How bad can it be?

Ackermann's function normally takes two numbers and produces a number from them, but you can easily define A'(n) = A(n,n). A'(4) is 2^{2^{2⁶⁵⁵³⁶}} - 3, which is vastly bigger than any number I've mentioned so far in this post except for Barney. Howard Friedman (more from him below) has this to say about A (I've modified the notation slightly to match what's in this post):

I submit that A'(4) is a ridiculously large number, but it is not an incomprehensibly large number. One can imagine a tower of 2’s [that is, a tower of exponents] of a large height, where that height is 65,536, and 65,536 is not ridiculously large.

However, if we go much further, then a profound level of incomprehensibility emerges. The definitions are not incomprehensible, but the largeness is incomprehensible. These higher levels of largeness blur, where one is unable to sense one level of largeness from another.

For instance, A(4, 5) is an exponential tower of 2’s of height A'(4). It seems safe to assert that, say, A'(5) = A(5, 5) is incomprehensibly large. We propose this number as a sort of benchmark.

In other words, A'(4) is, as I've argued, far, far too big to comprehend, calculate fully or even write down, but at least we can more or less understand in principle how it could be constructed. The recipe for constructing A'(5) contains so many levels of repetition that we can't even really understand how to construct it, much less the final result.

Combinatorial explosions

Everything I've mentioned so far is constructive. That is, each number is defined by stating exactly how to construct it from smaller numbers by some set of operations, ultimately by starting with zero and adding one repeatedly. It's also possible to specify numbers non-constructively, that is, without saying exactly how one might construct them.

The field of combinatorics is particularly notorious for defining huge numbers. Combinatorics deals with questions such as enumerating objects with some particular set of properties. A simple example would be "How many ways can two six-sided dice show seven spots?" For bonus points, list them exactly, or at least describe their general form. In this case, it's easy. There are six, assuming you can tell the two dice apart: {(1,6), (2,5), (3,4), (4,3), (5,2), (6,1)}.

A slightly different kind of combinatorial question is "How large can a collection of objects subject to some particular constraint get before it has to have some property?" For example, how many natural numbers less than 5 can you put in a set before there must be two numbers in the set that differ by 3? So {0, 1} is fine but {1, 4} isn't since 4 - 1 = 3. The answer in this case is 3: {0, 1, 2}, has no members that differ by three, but any larger set must. In this case, we can easily check because there are only five sets of four natural numbers less than five. For a larger version, say the same question but for the numbers less than 1000, there's no way to check all the combinations and you'll need a real proof.

Friedman gives a somewhat more involved example that produces large numbers astonishingly quickly. Consider strings of letters from a given alphabet. A string of length n has what I'll call Friedman's property if, for at least one i and j up to n/2, with i < j, the portion of the string from positions i to 2i appears in the portion from j to 2j. In other words, cut the string into overlapping portions:

The first and second letters (two total)
The second through fourth letters (three total)
The third through sixth letters (four total) ...
... and so forth, starting one letter later and going one letter longer each time

Friedman's property says at least one of those is contained in at least one of the later ones, and the question is, how long does a string have to be before you know it must have this property (for whatever reason, Friedman actually phrases the question as how long can a string be and not have this property, but this way seems clearer and is more in line with Ramsey Theory in general).

If the alphabet in question is just one letter, say a, then it's a simple problem:

Start with aa. That's the one and only string with two letters, using our alphabet, and there is only one portion to look at (the whole string).
Add another a to get aaa. There's still only the one portion, so we're still good.
Add another a to get aaaa. Now there are two portions to look at 1-2 (aa) and 2-4 (aaa). The first one is contained in the second, so we're done. Any four-letter string (that is, the one and only four-letter string), using one letter, must have Friedman's property.

If the alphabet is just the two letters a and b, then

ababba, for example, has the property, because the portion from position 1 to 2 (ab) appears in the portion from 2 to 4 (bab)
(taking an example with three letters) aabcabbca also has the property, because abc from positions 2 to 4 appears in abbca from positions 5 to 9. The letters in the first string don't have to be consecutive in the second one, but they do have to all appear in order.
abbbaaaaaaa on the other hand, does not have Friedman's property: ab doesn't appear in bbb, bbaa, baaaa or aaaaaa, and more generally, none of those portions appears in any of the ones after it.
If you add either an a or a b to the end of abbbaaaaaaa, though, the result has to have Friedman's property. If you add an a then aaaaaa (positions 5-10) appears in aaaaaaa (positions 6-12). If you add a b, then ab (positions 1-2) appears in aaaaaab (positions 6-12).
With some more fiddling (at the worst, trying out all 4096 12-letter strings), you can determine that any string, using two letters, 12 or more letters long has Friedman's property.

What if we use three letters, say a, b and c? You need a somewhat longer string before you can no longer avoid Friedman's property. How long? Friedman finds a lower bound (the real answer may be bigger), of A(7198, 158386). Recall that we've already established that A(5,5) is deeply incomprehensible. Adding one to either parameter of A just kicks things up another unfathomable notch, and here we're doing so hundreds of thousands of times.

For four letters, Friedman gives a lower bound obtained by applying A' over and over again, starting with A'(1) = 3. How many times? A'(187196) times.

Friedman goes on to define a similar construction on trees (groups of objects with parent/child/sibling relationships, more or less like family trees but without fun stuff like double cousins). Again, things start small but then go completely haywire. If n is the number of letters you can use in labeling the objects in a tree, and Tree(n) is the largest sequence of trees you can have (subject to a couple of restrictions) before there must be at least one tree that's contained in a later one, then

Tree(1) = 1
Tree(2) = 3
Tree(3) is much, much larger than the result we got above for strings using three letters. It's much, much larger than the result above for four letters. It's so large that you have to dive deeply into the foundations of mathematics to be able even to describe it. Maybe you can. I can't.

And, of course, once you've got an out-of-control function like Tree, you can use it as grist for the constructive mill. What's A'(Tree(Tree(Barney))) taken to its own power Fred times? Can't tell you, but it's certainly big, and we can keep this tomfoolery up all day.

(How do we know that there is a limit on how big a set of strings or trees can get before it has to have Friedman's property? Friedman gives a proof. It's a pretty standard proof, but I didn't understand it and didn't take the time to dive into and figure it out.)

It's all just so ... big

As Friedman says, such gigantic numbers (whether defined constructively or non-constructively) are not at all like the numbers we're familiar with. I can easily tell you that 493 has a remainder of one when divided by three (4+9+3 = 16, 1+6 = 7, 7 divided by three has one left over). I can't easily tell you whether Fred has a remainder of one or two. I know there will be a remainder, since at no point in constructing it did we multiply by three, but to figure out which remainder we'd have to carefully examine the definition of Fred and track the remainders all the way through.

Likewise I know that 17 is a prime number and 39 isn't. With a little effort I can figure out that 14,351 isn't prime (it's 113x127), but is A'(4) - Fred a prime number? No idea. Probably not, as the odds of any number that size being prime are very small (though not quite zero), but I have no idea how to go about proving it.

For that matter, it's not always immediately obvious whether one huge number is larger than another. If they were constructed by different means -- say, one uses Ackermann's function and another came out stacking up exponents some particular way and repeatedly applying three different big-making functions in turn, then there may be no feasible way to compare them. It may even be possible to prove that the number of steps needed to make the comparison would itself be huge.

We're used to simplifying numbers when we do calculations with them. If I tell you a room is four meters by three meters with a one meter by two meter closet cut out of it, you can easily tell me the area of the room is ten square meters. If I did a similar calculation with four arbitrary huge numbers, quite likely all anyone can say is the answer is ab - cd, if a, b, c and d are the numbers in question.

Nonetheless, such numbers are just as mathematically real as any others. If you make a claim about "all natural numbers", you're not just saying it's true for numbers you can easily manipulate. You're saying it's true for really big ones as well, and keep in mind that for every huge number we can describe there are vastly many more of similar size that we have no hope of ever describing. Fermat's last theorem doesn't just say there are no cases where a³ + b³ = c³. It says that there are no cases where a^Barney + b^Barney = c^Barney, and so forth for every ridiculous number I've mentioned, and all the others as well.

It's also important to point out that there is nothing particularly special about most of these numbers, unless, like Tree(3), they are the solution to some specific problem. The number of numbers we know to be special for one reason or another is limited by our human capacity to designate things as special, which runs out long before we get to the astronomical realm, to say nothing of the territory Friedman explores.

If a number is too big to relate to our everyday experience, or to compare with other numbers, or to comprehend how it might be constructed, about the only thing we can really say about it is that it's a number, and it's big.

And that's all we can say about almost all numbers.

Tuesday, January 29, 2013

We're #1. So what?

On the radio today I heard that a certain statistic was at its highest (or lowest) level in seventeen months. Certainly sounds impressive, but what does it mean? Without having followed the history of the statistic, I'd have know way of knowing.

For example, if it's 100 now, and it was 99 seventeen months ago and 98 for the other months (including last month), it may not mean much at all. On the other hand, if the sequence had been more like 99, 82, 64, 57, 43, 51, 46 ... 54, 47, 100, that jump from 47 to 100 might be very significant, particularly if the original fall from 99 to the 40s and 50s had been significant.

Suppose I'm part of a community of gamers in which each gamer has a numerical rating. Last month I had the 1523rd-highest rating. This month I'm 1209th. I've just rocketed 314 places up the rankings. Pretty awesome, huh?

Well, maybe. Suppose there are 704 people with a rating of 98, 313 people with a rating of 99 and 1208 people with higher ratings. The top rating is 106. Last month my rating was 98, so I was one of the 704 tied for 1523rd - 2226th. This month, by virtue of a one-point improvement, I'm now one of the proud 313 tied for 1209th - 1522nd. Last month I was good, though not quite as good as the best. This month I got a little closer to the top. Maybe not so impressive.

On the other hand, suppose there are three million or so players. Most of them have fairly unremarkable ratings, but once you get to the top ranks the scores start to increase dramatically. The 1523rd best ranking is 12,096, the 1209th is 451,903 and the top player has an unbelievable 75,419,223. I've made really amazing strides in the last month, but I'm still far, very far, from the top.

Ok, that's a lot of made-up numbers for just four paragraphs. What's going on here?

First, any measurement is meaningless without context. I originally said "a statistic" instead of "measurement", but the whole point of statistics, that is, pulling (abstracting) concise metrics out of a pile of data, is to provide context. If I say that the mass of a sample is 153 grams, that doesn't tell me much, but if you tell me that the average (mean) mass of past samples is 75 grams and the standard deviation is 8 grams, I know I'm dealing with an extremely rare high-mass sample. Or my scale is broken, or I'm actually measuring a completely different kind of sample, or something else significant is going on. The mean and standard deviation statistics provide context for knowing what I'm dealing with.

Simply saying "highest in seventeen months" or "jumped 314 places in the rankings" doesn't provide any meaningful context. Either or both of those could be highly significant, or nothing in particular.

Second, citing rankings like highest, 1209th and so forth implies that something noteworthy about a ranking is also noteworthy about the underlying measurement that's being ranked. But this is misleading. Depending on how the rating is distributed, a large change in rating could mean a small change in ranking, or a large one, and likewise for "highest in N time periods." Technically, ranking can be highly non-linear.

Rankings are not entirely useless. For example, there have been many more record high temperatures than record low temperatures in recent decades. Given that short term temperature fluctuations over more than a few days are fairly random (or at least, chaotic), this strongly suggests that temperatures overall are rising. More sophisticated measurements bear this out, but the simple comparison of record highs versus record lows quickly suggests a trend in the climate as a whole. Even then, though, it's the careful measurement of the temperatures themselves that tells what's really going on. Looking at record highs and lows just points us in a useful direction.

In general, when someone cites a ranking or a record extreme, it's good to ask what's going on with the quantity being ranked.

Wednesday, December 12, 2012

Adventures in hyperspace

The hypercube. A geekly rite of passage, at least for geeks of a certain age. The tesseract. The four-dimensional cube. Because what could be cooler than three dimensions? Four dimensions! Cue impassioned discussion over whether time is "the" fourth dimension, or such.

Cool concept, but can you visualize one, really "see" a hypercube in your mind's eye? We can get hints, at least. It's possible to draw or build a hypercube unfolded into ordinary three-dimensional space, just as you can unfold a regular cube flat into two-dimensional space. Dali famously depicted such an unfolded 4-cube. You can also depict the three-dimensional "shadow" of a 4-cube, and even -- using time as an extra dimension -- animate how that shadow would change as the 4-cube rotated in 4-space (images courtesy of Wikipedia, of course).

That's all well and good, but visualizing shadows is not the same as visualizing the real thing. For example, imagine an L-shape of three equal-sized plain old 3D cubes. Now another L. Lay one of them flat and rotate the other so that it makes an upside-down L with one cube on the bottom and the other two arranged horizontally on the layer above it. Fit the lower cube of that piece into the empty space inside the L of the first piece, so that the first piece is also fitting into the empty space of that piece.

What shape have you made? Depending on how natural such mental manipulation is for you and how clear my description was, you may be able to answer "A double-wide L" or something similar. Even if such things make your head hurt, you probably had little trouble at least imagining the two individual pieces.

Now do the analogous thing with 4-cubes. What would the analogue of an L-shape even be in 4-space? How many pieces would we need? Two? Three? Four? Very few people, I expect, could answer a four-dimensional version of the question above, or even coherently describe the process of fitting the pieces together.

Our brains are not abstract computing devices. They are adapted to navigating a three-dimensional world which we perceive mainly (but not exclusively) by processing a two-dimensional visual projection of it. Dealing with a four-dimensional structure is not a simple matter of allocating more mental space to accommodate the extra information. It's a painstaking process of working through equations and trying to make sense of their results.

That's not to say we're totally incapable of comprehending 4-space. We can reason about it to a certain extent. People have even developed four-dimensional, and even up to seven-dimensional (!) Rubik's Cubes using computer graphics. It's not clear if anyone has ever solved a 7-cube, but a 3x3x3x3x3 cube definitely has been solved.

Even so, it's pretty clear that the solvers are not mentally rotating cube faces in four or five dimensions, but dealing with a (two-dimensional representation of) a three-dimensional collection of objects that move in prescribed, if complicated, ways.

From a mathematical point of view, on the other hand, dealing in four or five or more dimensions is just a matter of adding another variable. Instead of (x,y) coordinates or (x,y,z), you have (w,x,y,z) or (v,w,x,y,z) coordinates and so forth. Familiar formulas generally apply, with appropriate modifications. For example, the distance between two points in 5-space is given by

d² = v² + w² + x² + y² + z²

if v, w, etc. are the distances in each of the dimensions. This is just the result of applying the pythagorean theorem repeatedly.

Abstractly, we can go much, much further. There are 10-dimnsional spaces, million-dimensional spaces, and so on for any finite number. There are infinite-dimensional spaces. There are uncountably infinite-dimensional spaces (I took a stab at explaining countability in this post).

Whatever intuition we may have in dealing with 3- or 4-space can break down completely when there are many dimensions. For example, if you imagine a 3-dimensional landscape of hills and valleys, and a hiker who tries to get to the highest point by always going uphill when there is a chance to and never going downhill, it's easy to imagine that hiker stuck on the top of a small hill, unwilling to go back down, never reaching the high point. If the number of dimensions is large, though, there will almost certainly be a path the hiker could take from any given point to the high point (glossing over what "high" would mean). Finding it, of course, is another matter.

You can't even depend on things to follow a consistent trend as dimensions increase, as we can in the case of a path being more and more likely to exist as the number of dimensions increases. A famous example is the problem of finding a differentiable structure on a sphere.

Since we can meaningfully define distance in any finite number of dimensions, it's easy to define a sphere as all points a given distance from a given center point (it's also possible to do likewise in infinite dimensions). If you really want to know what a differentiable structure is, have fun finding out. Suffice it to say that the concepts involved are not too hard to visualize in two or three dimensions. Indeed, the whole field they belong to has a lot to do with making intuitive concepts like "smooth" and "connected" mathematically rigorous. Even without knowing any of the theory (I've forgotten what little I knew years ago), it's not hard to see something odd is going on if I tell you there is:

exactly one way to define a differentiable structure on a 1-sphere (what most of us would call a circle)
likewise on a 2-sphere (what most of us would just call a sphere)
and the 3-sphere (what some would call the 3-dimensional surface of a hypersphere)
and the 5-sphere (never mind)
and the 6-sphere

Oh ... did I leave out the 4-sphere? Surely there can only be one way for that one too, right?

Actually no one knows. There is at least one. There may be more. There may even be an infinite number (countable or uncountable).

Fine. Never mind that. What happens after 6 dimensions?

there are 28 ways on a 7-sphere
2 on an 8-sphere
8 on a 9-sphere
6 on a 10-sphere
992 on an 11-sphere
exactly one on a 12-sphere
then 3, 2, 16256, 2, 16, 16, 523264, and 24 as we go up to 20 dimensions

See the pattern? Neither do I, nor does anyone else as far as I know. [The pattern of small, small, small, big-and-(generally)-getting-bigger continues at least up to 64 dimensions, but the calculations become exceedingly hairy and even the three-dimensional case required solving one of the great unsolved problems in mathematics (the Poincaré conjecture). See here for more pointers, but be prepared to quickly be hip-deep in differential topology and such.] In the similar question of differential structures on topological manifolds, there is essentially only one answer for any number of dimensions except four. There are uncountably many differential structures on a four-dimensional manifold. So much for geometric intuition.

It's worth pondering to what extent we can really understand results like these. Certainly not in the same way that we understand how simple machines work, or that if you try to put five marbles in four jars, at least one jar will have more than one marble in it.

Statements like "there are 992 differentiable structures on an 11-sphere" are purely formal statements, saying essentially that if you start with a given set of definitions and assumptions, there are 992 ways to solve a particular problem. The proofs of such statements may use various structures that we can visualize, but that's not the same as being able to visualize an 11-dimensional differentiable structure. Even if we happen to be able to apply this result to something in our physical world, we're really just mechanically applying what the theorems say should happen in the real world. Doing so doesn't give us a concrete understanding of an eleven-dimensional differentiable structure.

That, we're just not cut out to do. In fact, we most likely don't even visualize three complete dimensions. We're fairly finely tuned to judging how big things are, how far away they are and what's behind what (including things we can't see at the moment) and what's moving in what direction how fast, but we don't generally visualize things like the color of surfaces we can't see. A truly three dimensional mental model would include that, but ours don't. Small wonder a hypercube is a mind-boggling structure, to say nothing of some of the oddities listed above.

Tuesday, May 15, 2012

Counting

(Intermittent, indeed)

Counting. What could be simpler? Well ...

1. Origins and speculation

Counting is older than humanity. Other primates can count, birds can count and elephants can count. There are even claims that some insects can count. How would we know? Suppose an animal sees three tasty treats put behind a screen, and then two taken away from behind it. An animal that makes a beeline for the remaining treat, but acts indifferent if all three have been taken away instead, can plausibly be said to have some form of counting.

There's clear survival value in being able to reckon in this way. If I'm being chased by four wolves and I only see three at the moment, I should be wary. If I see four run off looking interested in something else, I can probably go back to whatever it was I was doing.

How would a counting ability work? Totally speculating here, I could imagine having some supply of "markers" in short term memory, which could be mentally placed in some collection of notional locations. Even a small number of markers and locations would be useful. If I have three markers, a "visible" location and a "hidden" location, I can track the tasty treats in the first example. As each treat is hidden, the marker-tracking system puts a marker in the hidden location. As each is taken out, it goes to the visible location, or is freed up for future use if the if the treat is taken away in real life. If there's still a marker in the hidden location, I can expect a treat.

This is all handwaving over what a marker or a location might be, or how to make and dissolve the associations between markers and treats, and between the "hidden" mental location and the area behind the experimenter's screen. Nonetheless, there are testable hypotheses to be made here. For example, does the subject still respond appropriately if there are two separate screened areas, or seven, or twenty? Suppose there are two kinds of objects: tasty treats and something useless. Does the subject behave differently if there is one treat left instead of one useless object? How many objects can the subject keep track of?

The results would put limits on how many markers, or locations, or kinds of markers there might be. They wouldn't rule out other explanations, like some sort of purely symbolic approach, but an artful experimenter ought to be able to come up with scenarios that would make one or the other explanation more or less plausible. I'm certain that people have done just that, but more carefully and thoroughly than what I've described.

If you're thinking this doesn't quite seem to match up with what pops into your head when you think "counting", you'd be right. Counting as we normally think of it, "one, two, three, four, ... thirty-nine, forty, forty-one ... five thousand nine hundred and thirty-seven, five thousand nine hundred and thirty-eight ..." is a different beast. A markers-and-locations scheme is limited by the number of markers. Counting with a number system requires a small set of symbols and a small set of rules. With those, you can count until you get tired, so long as your number system tells you how to add one more. Your supply of numbers is still finite -- no one can count from a million random digits to the number after that -- but it's orders of magnitude larger.

This more familiar notion of number is probably much rarer in the animal world. It's much more closely akin to language, though it's at least logically possible to have a number system without having a fully-developed language. It's also possible, and I'd think highly likely, that we and any other animals that can count digitally also have a markers-and-locations style capability. A toddler who hasn't yet mastered counting to a hundred can still track objects behind screens.

2. The mathematicians arrive

Number theory is one of the oldest branches of mathematics. Pythagoras studied numbers extensively, trying to uncover the mystic harmonies of the universe. Archimedes showed how to construct ridiculously large numbers, large enough to, say, number the grains of sand on a beach, or even how many it would take to fill the universe as he knew it. Pascal, Fermat, Euler, Gauss ... all of these and many more had interesting things to say about numbers, but for my money it was Georg Cantor who got deepest into the question of "what is number"?

Cantor helped formalize the notion of a set -- a collection of objects about which we only care if a given item is in it or not. Cantor postulated that two sets have the same number of items if, and only if, the items in them can be matched up one-for-one. For example, if I have the set {Groucho, Harpo, Chico} and the set {Julius, Arthur, Leonard}, I can match Groucho with Julius, Harpo with Arthur and Chico with Leonard and know that there are the same number of items in each set. I can also do the matching five other ways, but the result is always the same. Either the two sets match up one-to-one or they don't.

Working from this basis, a number is a collection (formally, the equivalence class) of all sets that can be matched up with each other. For example, the number three would include {Groucho, Harpo, Chico}, {Julius, Arthur, Leonard}, {1, 2, 3}, {42, 17, 23}, {uno, due, tre}, {zebra, desk, Louis XIV} and any other set that can be matched up one-to-one with any (and thus, all) of those.

The numbers that can be described this way include zero (described by the empty set, {}), one, two, and so on ad infinitum. If you can (at least in principle) list all the objects in a set, or in other words, if the set is finite, then there is a natural number representing all sets of that size. These numbers do not include negative numbers, fractions, irrational numbers, imaginary numbers, quaternions or other such. Those need more machinery, but they can all be related to the basic set of natural numbers {0, 1, 2, 3 ...}.

One of Cantor's great discoveries, and one of the most important discoveries in all of mathematics, is that there are numbers, in this sense, beyond the natural numbers that denote the size of finite sets. For example, how many natural numbers are there? There are more than, say, three, because 0, 1, 2 and 3 are all natural numbers. There are more than 497. There are more than a billion and more than googolplex. No matter matter how high you count, you can, in principle, always count one more.

If this sounds quite a bit like the second kind of number system above, where you can always count one more, that's because it is. In a bit we'll see a mathematical description even more closely aligned with it.

If you have any finite set, you can always add another object that's not already in it. Put yet another way, there is no finite set that you can match up one-to-one with the set of all natural numbers. The set of natural numbers is infinite.

But Cantor's definition of a number doesn't say numbers have to be finite, so that's fine. There was quite a bit of discussion about this, but at the end of the day Cantor's rules for reasoning about numbers work just as well for the infinite. In fact they present a far richer picture than just the single ∞ that we often take to represent "infinity" (this symbol does have perfectly rigorous math behind it, but it has more to do with the mathematical study called analysis than with the set theory we're talking about here).

The collection of all sets that match up one-to-one with the natural numbers is itself a number, which Cantor called aleph-null (ℵ₀). Are there actually other sets that can be matched up one-to-one with the naturals? Yes. For example, the set of points in an infinite square grid. Start at any particular point and make a spiral outward. Match the starting point up with zero, the next point with one, and so forth. There is a natural number for every point in the spiral, and any point you care to pick on the grid, the spiral will eventually reach.

A bit surprisingly, the natural numbers can also be matched up one-to-one with what might appear to be smaller sets of natural numbers. For example, there is an even number for every natural number, and vice versa. You can match {0, 1, 2, 3, ...} up with {0, 2, 4, 6 ...} by matching every natural number up with twice that number: 0 with 0, 1 with 2, 2 with 4, 3 with 6 and so on. There is an even number for every natural and vice versa. This isn't a paradox or contradiction. Infinite sets just behave a little differently.

How many sets of natural numbers are there? Could you match them up with the natural numbers themselves? Cantor's (second) proof that you can't, the diagonal argument, is a masterpiece, one of those things that sounds like magic or trickery when you first hear it (the first proof is pretty slick as well). If you can grasp it, and you very likely can, you can probably grasp a large chunk of modern mathematics. It goes like this:

Try to give each set a natural number. If there are as many as there are natural numbers, then you can do just that. For example, here's one partial attempt:

{} (the empty set)
{1, 2, 3, 4}
{0, 1, 2, 3, 4, 5, ...} (all of the natural numbers)
{0, 2, 4, 6, ...} (the even numbers)
...

Suppose I come up with a scheme for listing sets of natural numbers so that I can tell you the set for any natural number you like. Zero gets the empty set. One, two and three get the sets listed above. 42 gets the set of all primes. Googolplex plus five gets the Fibonacci numbers ... whatever. Could such a scheme get them all?

Look at each row in turn, and form a new set thus: If the row number isn't in the set on its own row, add that number to the set you're building, otherwise don't. In the case above, the result starts off {0, 3, ...}: zero is not in the empty set, so add it. One is in the set {1, 2, 3, 4}, so skip it. Two is in the set of all natural numbers, so skip it as well. Three is not in the set of even numbers, so add it.

We can continue down the line. 42 isn't prime, so it's in the set. Googolplex plus five is (almost certainly) not a Fibonacci number, so it's (almost certainly) in the set, and so forth. Is the set we just made already on the list somewhere? No. It's not in row 0, because it contains a number not in the set on that row. It's not in row 1, because 1 is in that set but not in ours. And so forth.

No matter how you try to number the sets of natural numbers with the natural numbers themselves, there will always be at least one you missed -- a massive understatement, but one is all it takes -- so it can't be done.

Since you can't match the natural numbers up with the sets of natural numbers, there must be an infinite number besides aleph-null, namely the number of sets of natural numbers, that is, all sets for which you can match up each member with a set of natural numbers and vice versa. This includes, among other things, all the numbers between 0 and 1, if you include infinite expansions like 0.11111... = 1/9 or pi - 3 = .14159265... (see the bottom of this section for a rough proof).

Call this number aleph-one. Aleph-one is a bigger infinity than aleph null. Cantor conjectured, but could not prove, that there are no infinities between aleph-null and aleph-one as defined here, or in other words that aleph-one really should be called aleph-one and not aleph-something-else. We now know that you can pick either way and get meaningful, though perhaps strange, results.

In any case, one of the most important facts about a set is often whether it contains

a finite number of elements
exactly as many elements as there are natural numbers
more

In the first two cases, the set is countable (in the second case, countably infinite). Otherwise it is uncountable. Some interesting results depend on this distinction. For example, even though there are infinitely many possible computer programs, and infinitely many numbers that a computer program might produce as output if it could run forever, including irrational numbers like the square root of two and numbers like pi that aren't the solution to any algebraic equation, there are still only countably many. But there are uncountably many numbers between zero and one, which is so many more that if you pick a real number at random (leaving aside exactly what that means), the odds are exactly zero that there's a computer program that could calculate it, even if it had forever to do it.

The basic approach behind Cantor's diagonal argument can also be used to prove a number of other significant results that don't directly involve countability. These include Gödel's incompleteness theorem, which says there are some true statements that can't be proven true, and Church and Turing's result that there is no computer program that can solve the halting problem of deciding whether a given computer program will always eventually stop running.

Cantor was further able to show that there were infinitely many of these infinite cardinal numbers. There is an aleph-two bigger than aleph-one, and so forth. For any aleph you can always find one bigger, by applying the same diagonal proof that found aleph-one in the first place.

Hmm .... "always one bigger" ... that sounds familiar.

Matching things up in order to count them makes sense, but this idea of a number as a collection of sets seems somehow distant from the simple notion of counting one, two three. Suppose we dispense with sets and start with that?

Start with something called "zero". This being math, it doesn't matter particularly what it is. It's really just a symbol. Let's use 0. Likewise, take something to represent "one more than". One common symbol is S, for "successor". So we have

0
One more than zero, or S0
One more than one more than zero, or SS0
and so forth

To translate this to something you're more used to, just count the S symbols, and to represent a particular number (again, whole, non-negative number, including zero), just stick that many S symbols before the 0 symbol. Numbers considered this way are called ordinals, as opposed to the cardinals described above.

You can tell that one number is bigger than the other if you can make the first number by tacking some number of S symbols in front of the symbol for the second number. Since SSSSS0 is three S symbols tacked onto SS0, we know that SSSSS0 is bigger than SS0, that is, five is bigger than two, as one would hope..

How many of these are there? One for every natural number, clearly (just count the number of S symbols).

It would be interesting, at least to a mathematician, to capture some notion of infinity here. If there are infinite cardinals, why not infinite ordinals? Infinity would be a number that you can never reach by counting from zero, so let's try to pin that down. The numbers we've described so far are finite, so let's just postulate that there's a number, call it ω, that's bigger than any of them. Hey, it's math. You can do that sort of thing. You basically just add a rule that says "for any number like those we've described so far, ω is bigger". Then you hope the new rule doesn't lead to any contradictions.

We'd like this ω to act like any other number, apart from being bigger than any of the naturals, so there should be a number one more than it. That is, Sω is a perfectly good number in this scheme. And so are SSω, SSSω, and so forth. We could also call those 1+ω, 2+ω, 3+ω and so forth. Put another way, we can add any natural number to ω and get a bigger ordinal. Counting continues past infinity almost as though infinity never happened. We can continue this counting as far as we like, and we can say whether one such number is bigger than another just as we did before.

So suppose there's a number bigger than ω with any number of S symbols. We may as well call that ω + ω, or 2 times ω, or just 2ω. Likewise, there is a 3ω bigger than 1+2ω, 2 + 2ω or 2ω plus any natural, and a 4ω, and a 5ω, and so on. And a number bigger than any of those? Why, that would be ω times ω, or ω².

And likewise there is an ω³and an ω⁴and so forth (along with numbers like 17 + 42ω + ω⁴) and working from that you can define ω^ω. By now you can probably tell that we can keep this up forever. You end up with something like the infinite cardinals, but more finely sliced. Ah, math.

[To see that aleph one is the number of real numbers between 0 and 1: Every such number can be represented by a binary expansion: 0.1 = 1/2, 0.01 = 1/4, 0.011 = 1/4 + 1/8 = 3/8, 0.01010101... = 1/4 + 1/16 + ... = 1/3, etc. The corresponding set for a binary expansion has 0 in it if the first bit after the binary point is 1, 1 in it if the second is 1, and so forth. For finite expansions like 0.01, the bits are all zero past a certain point, but that's fine. It just means that the corresponding set only has finitely many numbers in it. Likewise, there is a binary expansion for each set -- put a zero in the nth place when n is in the set (and only then). QED. You do have to be a bit careful because, for example, 0.01 = 0.00111111..., and in general every finite binary expansion can be changed to an infinite one by replacing the last 1 with 01111..., but there are only countably many finite expansions so this doesn't really matter]

3. There were bound to be lawyers involved at some point

Suppose the limit for grand theft is $5,000. I steal an old car worth $5,000. Clearly I've committed one count of grand theft. What if I steal a new car worth $15,000? I've still committed one count. What if I steal three old cars worth $5,000? Keeping in mind that I'm not a lawyer, that ought to be three counts of grand theft. For example, if I steal one car, serve my time, steal another, serve my time again and steal another, at which point I'm going to be in for a good long time in most states, I've committed three crimes.

What if I snag a $4,000 trailer with three $5,000 cars on it? Is that one grand theft of $19,000 or one petty theft and three grand thefts? That would depend on whether we're counting acts of theft (one) or objects stolen (one petty, three grand), and that in turn depends on the particular law. Not being a lawyer, I'm not sure which way this tends to go, but I'd guess one count of grand theft.

This sort of question comes up all the time in copyright cases. Was that ten acts of pirating one song or one act of pirating ten songs? What about pirating the same song ten times? Still not a lawyer, but I believe the law in this case treats each copyrighted work as a separate count, so pirating ten songs would be ten counts, while pirating one song ten times would be one count (but maybe a more severe one than pirating it once).

The question here is not how to count, as in how to count the even numbers versus the prime numbers, but what to count.

4. How many rivers run through it?

Pittsburgh lies at the confluence of the Allegheny and Monongahela rivers, which meet to form the Ohio river. Thus the name Three Rivers Stadium (and several other Three Rivers designations).

In 1858, William Larimer staked a claim on a bluff overlooking the confluence of Cherry Creek and the South Platte (which had long been a seasonal camp for the Arapaho and Cheyenne), and not too long afterward named the place after the governor of Kansas Territory, James W. Denver (who had by that time already resigned his post, unbeknownst to Larimer).

How many rivers run through Pittsburgh? How many run through Denver? One could reasonably say that the South Platte River runs through Denver, joined by Cherry Creek, but going by the names, no river runs through Pittsburgh. Rather, two rivers end at the Point and another begins.

A person traveling by boat from Oil City, Pennsylvania (where Oil Creek flows into the Allegheny), through Pittsburgh to Cairo, Illinois (where the Ohio meets the Mississippi) could be forgiven, though, for thinking that there was a river running through Pittsburgh, particularly since the Allegheny is larger than the Monongahela which joins it. A hydrologist would likely agree, and go so far as to say the same river flows through New Orleans to the Gulf of Mexico, the Ohio being larger than the Mississippi at Cairo.

The hydrological definition, that when two streams meet the larger of the two is considered to continue and the smaller to end at the confluence, gives a nice, consistent definition, assuming that the size of a stream is well-defined and no two are exactly equal. Since this definition can be at odds with local names, as at Pittsburgh and Cairo, some might be tempted to say the local names are "wrong", but that may not cut much ice with the locals.

How many rivers run through Los Angeles? That would depend on whether you call the Los Angeles River a river. For most of the year along most of its course it's a trickle of water in the middle of a wide concrete culvert. Near Dodger Stadium it's joined by Arroyo Seco (Spanish for "dry stream") which, as the name implies, lacks even the trickle of water much of the time. Nonetheless, when the water does flow it flows in significant volume along an identifiable course, so we call it a river.

The Colorado River used to flow into the Gulf of California by way of a lush delta, but for most of the past few decades it has run dry well before then. Can we say a river runs through that region still? At this point, maybe not. If it ran dry some years but flowed for at least part of most years, probably, but I doubt there's any clear dividing line between mostly dry river and ex-river, at least outside of the technical literature.

Finally, how many rivers run through the Mississippi Delta, that is, the alluvial plain between the Mississippi and Yazoo rivers in western Mississippi, as opposed to the Mississippi River Delta in southern Louisiana? It's kind of hard to tell. Streams cut this way and that, split, rejoin, sometimes appear to go nowhere. Is this a lot of small streams criss-crossing a plain, or a few larger streams with a lot of unusually large islands? Again, at some point it comes down to arbitrary definitions.

5. What counts?
At least for practical purposes, the world is continuous. And yet, for various reasons, we comprehend it as being made up mostly of discrete objects. Counting is fundamentally discrete. Digital, in fact. You can't count, or at least not very far, without dividing the world into objects.

Not every creature appears to do this. Microbes get by on pure, unconscious reaction. This part of the cell wall is in contact with some nutritious chemical, so those parts of the cell wall contract and the cell moves towards supper.

Only with a mind capable of parsing the world into objects can we begin to talk about counting as a useful activity. Once there are objects, a markers-and-locations facility is not so remote. It doesn't have to arise, but it's useful enough and plausibly a small enough step that it's not surprising that it does arise. The full-blown symbolic version of counting using a few basic names and combining rules, is a whole different matter.

What counts? In the markers-and-locations sense, quite a few things count, including us. More will surely be discovered. In the symbols-and-rules view, it may be just us.