Intermittent Conjecture: April 2016

Saturday, April 30, 2016

Yours for a the special introductory price of ...

The student housing I lived in as an undergrad was modeled after, among other things, European monasteries, which is a nice way of saying the rooms were rather on the small and plain side ... which in turn is a nice way of saying you wouldn't want to live in them unless you really wanted to live in the student houses. Which most people did.

You may not be surprised to hear that people cared a lot about which room they ended up in. There was a whole ritual at the beginning of the school year to decide in what order people got to pick rooms, based on one's class, status as a house officer and possibly other factors I no longer remember. Ordering within groups of equal rank, for example most if not all the freshmen, was decided by drawing cards, and so the whole boisterous mess was called "room draw".

Why did people care so much which of several possible broom closets they ended up in? Well, you wanted to be in the same cluster of rooms with people you liked, you might not want a room over the courtyard where the garbage trucks came at uncivilized hours of the morning, you might feel better at the end of a hall, or in the middle, you might want to be near the showers or not so near ...

The reasons were limitless and probably largely subconscious. Sometimes it didn't matter. Sometimes the people picking ahead of you had different priorities and you ended up with something close your ideal, but not always. Some rooms were almost universally considered good -- and were almost always occupied by seniors -- or not so good -- "Welcome to the house, frosh. Here's your room!"

It was not unknown for people to run for house offices in which they had little interest simply for the room pick. Does anyone really want to be house secretary?

Did you know you can buy land on Mars? It's not clear under what legal authority one can own land on Mars, but that hasn't stopped people from selling it. Apparently copyright law is involved, and that ought to tell you about all you need to know. One site quotes $30/acre. The surface area of Mars is about 36 billion acres, so there's quite a bit of "upside potential". I'm guessing that the site in question has sold considerably less than the full allotment, but they have sold some.

OK, let's assume that you really can own a claim on a piece of Mars. What are you getting?

There's always the small chance that someone will set up an outpost on land you own, in which case you can say "Hey, buddy, I own that" and they can say "OK, nice to know, but we're actually on Mars and we care about your claim, why?"

What you're really buying is bragging rights. You can say to your friends "Hey look, I own a piece of Mars. Says so right here." I can easily see someone wanting to pay $30 for that, especially if someone is keeping track of all the claims and you can honestly say "This particular area is mine and nobody else's (at least according to this particular keeper-tracker-of-er)".

Likewise, you can buy your name on a star. The International Astronomical Union, being the body in charge of giving stars boring catalog numbers, won't be particularly impressed, but that won't stop people from accepting your money. I sincerely hope that nobody is pretending that this is anything more than buying the right to have a name of your choice associated with a star of your choice in somebody's list, but again I could understand someone paying a small fee for the privilege because why not?

On the other blog I wrote about the fascinating case of Cow Clicker, a Facebook game reduced to its very essence, in which people eventually ended up paying for the right to click on a space where a virtual cow had once stood. During the period when there were actual (virtual) cows to click on, people willingly paid for custom cows, including a $100 "Bling Cow". Basically you were announcing to your friends "I bought a Bling Cow", plus you had the pleasure of clicking on a cow with gold accessories.

But is this business of buying virtual cattle or pieces of the sky really abstract or arbitrary enough? We can do better. For the low introductory prices of ... leaving a message in the comments section, I will give you your very own number. That's right. Just leave a short, innocuous piece of text you want immortalized, and I will send you the output of

GIBBERISH=<some randomness I will keep to myself>
MESSAGE=<your message>
echo "print 0x$(echo "${GIBBERISH}${MESSAGE}" | sha256sum | cut -c 1-64)" | python

This lovely shell incantation pastes my gibberish together with your message, uses the SHA256 cryptographic hash algorithm to boil this down to 64 digits of hexadecimal and then uses some hackery to convert that to decimal.

So long as the SHA256 algorithm remains secure, the odds that someone could find another message that gave the same number as your message are astronomically low. For that matter, so are the odds that anyone in human history has ever seen the number for your message before, and so are the odds that anyone who doesn't know what gibberish I chose could tell you what number would result from any given message. Yes, the recipe for mapping messages to numbers is mine ... all mine ... BWAHAHAHA!

Sorry, let it go to my head for a moment.

To get the ball rolling, I claim "squeamish ossifrage" as my message, and I am therefore proud to announce that my personal number is

71550457262820168189328333318549162861663280827014445611156956018117368386158

Let the games begin.

Saturday, April 9, 2016

Primitives

Non sunt multiplicanda entia sine necessitate.

This is one of several formulations of Occam's razor, though Wikipedia informs us that William of Ockham didn't come up with that particular one. Whatever its origins, Occam's razor comes up again and again in what we like to call "rational inquiry". In modern science, for example, it's generally expressed along the lines of "Prefer the simplest explanation that fits the known facts".

If you see a broken glass on the floor, it's possible that someone took the glass into a neighboring county, painstakingly broke it into shards, sent the shards overseas by mail, and then had a friend bring them back on a plane and carefully place them on the floor in a plausible arrangement, but most likely the glass just fell and broke. Only if someone showed you, say, video of the whole wild goose chase might you begin to consider the more complex scenario.

This preference for simple explanations is a major driving force in science. On the one hand, it motivates a search for simpler explanations of known facts, for example Kepler's idea that planets move around the Sun in ellipses, rather than following circular orbits with epicyclets as Copernicus had held. On the other hand, new facts can upset the apple cart and lead to a simple explanation giving way to a more complicated revision, for example the discoveries about the behavior of particles and light that eventually led to quantum theory.

But let's go back to the Latin up at the top. Literally, it means "Entities are not to be multiplied without necessity," and, untangling that a bit, "Don't use more things than you have to", or, to paraphrase Strunk, "Omit needless things". In the scientific world, the things in question are assumptions, but the same principle applies elsewhere.

The mathematical idea of Boolean Algebra underlies much of the computer science that ultimately powers the machinery that brings you this post to read. In fact, many programming languages have a data type called "boolean" or something similar.

In the usual Boolean algebra, a value is always either True or False. You can combine boolean values with several operators, particularly AND, OR and NOT, just as you can combine numbers with operations like multiplication, addition and negation. These boolean operators mean about what you might think they mean, as we can describe them with truth tables very similar to ordinary multiplication or addition tables:

AND	True	False
True	True	False
False	False	False

OR	True	False
True	True	True
False	True	False

NOT	True	False
	False	True

In other words, A AND B is true exactly when both A and B are true, A OR B is true whenever at least one of the two is true, and NOT A is true exactly when A is false. Again, about what you might expect.

You can do a lot with just these simple parts. You can prove things like "A AND NOT A" is always False (something and its opposite can't both be true) and "A OR NOT A" is always True (the "law of the excluded middle": either A or its opposite is true, that is, A is either true or false).

You can break any truth table for any number of variables down into AND, OR and NOT. For example, if you prefer to say that "or" means "one or the other, but not both", you can define a truth table for "exclusive or" (XOR):

XOR	True	False
True	False	True
False	True	False

If you look at where the True entries are, you can read off what that means in terms AND, OR and NOT: There's a True where A is True and B is False, that is, A AND NOT B, and one where B is True and A is False, that is, B AND NOT A. XOR is true when one or the other of those cases hold. They can't both hold at the same time, so it's safe to use ordinary OR to express this: A XOR B = (A AND NOT B) OR (B AND NOT A). The same procedure works for any truth table.

In a situation like this, where we're expressing one concept in terms of others that we take to be more basic, we call the basic concepts "primitive" and the ones built up from them "derived". In this case, AND, OR and NOT are our primitives and we derive XOR (or any other boolean function we like) from them.

Now consider the boolean function NAND, which is true exactly when AND is false. Its truth table looks like this:

NAND	True	False
True	False	True
False	True	True

This is just the table for AND with True entries changed to False and vice versa. That is, A NAND B = NOT (A AND B).

What's A NAND A? If A is True, then we get True NAND True, which is False. If A is False, we get False NAND False, which is True. That is, A NAND A = NOT A. If we have NAND, we don't need NOT. We could just as well use AND, OR and NAND instead of AND, OR and NOT.

Since NAND is just NOT AND, and we can use NAND to make NOT, we don't need AND, either. A AND B = (A NAND B) NAND (A NAND B). So we can get by with just NAND and OR.

As it turns out, we never needed OR to begin with. Quite some time ago, Augustus De Morgan pointed out that A OR B = NOT (NOT A AND NOT B), that is, A or B (or both) are true if both of them are not false, a rule which sometimes comes in handy in making "if" statements in code simpler (the other version of the rule, with the AND and OR switched, is also valid). Using NAND, we can recast that as A OR B = (NOT A NAND NOT B), and we can get rid of the NOT, leaving A OR B = ((A NAND A) NAND (B NAND B)).

Summing up, we can build any boolean function at all out of AND, OR and NOT, and we can build all three of those out of NAND, so we can build any boolean function at all from NAND alone.

For example A XOR B = (A AND NOT B) OR (B AND NOT A). We can use DeMorgan's rules to change that to (NOT (NOT (A AND NOT B) AND NOT (B AND NOT A))), that is, (NOT (A AND NOT B)) NAND (NOT (B AND NOT A)), or more simply, (A NAND NOT B) NAND (B NAND NOT A). We can then replace the NOTs to get (A NAND (B NAND B)) NAND (B NAND (A NAND A)).

Yes, it's ... that ... simple. Feel free to plug in all four combinations of A and B to check.

As silly as this may seem, it has real applications. A particular kind of transistor lets current flow from its source terminal to its drain terminal when the voltage on a third terminal, called the gate, is high. Put a high voltage on the source and let current flow either through the transistor or through an output for the whole thing. Tie the gate of the transistor to an input. If the voltage on the input is high, current will flow through the transistor and not to the output. If the voltage on the input is low, current will flow to the output and not through the transistor. That is, the output voltage will be high when the input voltage is not. The whole thing is called a NOT gate (or inverter).

If you put two transistors in a row, then current will only flow through both of them when the voltage on both of the inputs is high, meaning it will flow through through the output, and the output voltage will be high, unless the voltage on both of the inputs is high. The whole thing is called a NAND gate, and as we saw above, you can build any boolean function you like out of NANDs *.

In fact, we have it a bit easier here because we can build NOT A directly instead of as A NAND A, and for that matter we can build a three-input NAND -- NOT (A AND B AND C) -- easily as well, but even if we couldn't, being able to build a NAND would be enough.

There are other cases where we can build up a whole system from a single primitive. Notably, any computer program (technically, anything a Turing machine can compute) can be expressed in terms of a single instruction or, alternatively, a single "combinator". This includes any boolean function, any numerical function, an HTML parser for web pages, whatever. Of course, there's a difference between being able to express a computation in theory and being able to run it on your laptop or tablet. We'll come back to that.

Before we go on to what all of this might mean, it's worth noting that many significant areas of thought haven't been reduced to simple primitives. For example, chemistry is built from around a hundred elements (there are currently 118 on the periodic table, but you can't do meaningful chemistry with all of them). An atom of any element is composed of protons, neutrons and electrons in varying numbers.

The Standard Model recognizes electrons as elementary, that is, primitive, while protons and neutrons are composed of quarks. In all, it holds that there are 17 particles that everything is composed of -- six quarks, six leptons (including the electron), four gauge bosons and the Higgs. So far, no one has found anything simpler that these might be built up of, but not for lack of trying.

In mathematics, you typically start with a handful of axioms -- statements you assume to be true without proof -- and build from there. There has been extensive work in reducing this to a minimal foundation, but the current formulation of set theory together with model theory has several important basic pieces, not one single concept to rule them all. And, in fact, there are several ways of describing both set theory and model theory, not one single definitive way.

In short, some things can be reduced to a single primitive, but most can't. Even when you can reduce something to a single primitive, there are typically several ways to do it. For boolean algebra, you can just as well use NOR as NAND. In computing there are several universal operations with little to pick among them.

In theory, there is no difference between theory and practice. But, in practice, there is. (attributed to Jan v/d Snepscheut)

If you can reduce any boolean function to NAND, is Boolean algebra in some meaningful sense really just NAND? Is computing really just the study of a single instruction or operator? If we're trying to reduce the number of entities involved, following Occam, is it not better to study a single operation than many?

I think most working mathematicians and computer scientists would answer "No" to all of the above. A general Boolean algebra is a set of objects and operations that follow certain rules. We noted above that A AND A = A. In set theory, the union of a set with itself is that set, and there are other examples. We would like to capture that common behavior somehow, and we do it by defining rules that hold for anything that behaves in the way we're interested in, that is, axioms. In the case of Boolean algebra there are five (chase the link if you're interested).

It just so happens that in the simple case of True, False and the operations on them, NAND and NOR can be used to build all the others. That's nice, but not essential. It's of interest in building circuits out of transistors, but even then there's no requirement to build everything from one type of gate if you don't have to. As noted above, it takes fewer transistors to build a NOT directly, and that's how real circuits are built.

Even from the point of view of Occam's razor, it's not clear that reducing everything to NAND is a good idea. Yes, you have only one operation to deal with, but you can define one truth table just as easily as any other. If you want to use XOR, it's simpler to define the truth table for it than to define the truth table for NAND and then define XOR in terms of it.

In computing, if you have the machinery to rigorously define one instruction or operation, you can define as many as you like with the same machinery. It may be interesting or even useful in some situations that you can define some operations in terms of others, but it doesn't make the system more useful. In practice, I don't care how many instructions the processor has. I care very much if there's an easy way to talk to the network or write to a file, things which are not even mentioned in theoretical discussions of computing (nor should they be in most situations).

So why even bother? Is reducing a system to a single primitive just an interesting academic exercise? Not necessarily. If you're trying to prove properties about circuits or programming systems in general, it can be useful to divide and conquer. First prove, once and for all, that any circuit or program can be reduced to a single primitive, then prove all sorts of useful properties about systems using that primitive. Since you only have a single operator or whatever to deal with, your proofs will be shorter. Since you've proved your operator is universal, they'll be just as powerful. Essentially you've said that when it comes to general properties of a system, adding new operators or whatever doesn't do anything interesting.

You don't have to reduce everything all the way to a single primitive for this to be useful. If you can only reduce a system to five primitives, doing proofs using those five is still easier than doing proofs on an equivalent system with twenty.

In general, there's a tension between keeping a system minimal and making it easy to use. A minimal system is easier to build and it's easier to be confident that it works properly. A larger system is easier to use, as long as there's not too much to take in. Typically there's a sweet spot somewhere in the middle.

There are sixteen possible boolean operators on two variables, and you can build any of them up from NAND (or NOR), but usually we focus on three of them: AND, OR and NOT. These are enough to build anything else in a way that's straightforward to understand. They also correspond fairly closely to familiar concepts. In some useful sense, they minimize the number of things you have to deal with in normal cases, and William of Ockham can rest easy.

It doesn't only matter how many things you have. It matters which things.

* As usual, there are a few more wrinkles to this. You have to tie the output of the last (or only) transistor to ground for current to flow, you need a resistor next to the input, the output also needs to be tied to ground eventually, and so forth.

You may have noted that current is flowing through the two transistors in the NAND gate precisely when both inputs are high, that is, the current is flowing through them (and not to the output) when one gate voltage is high AND the other is. You might think it would be simpler to build an AND than a NAND. However, the current will only flow if the drain of the last transistor is connected to a low voltage. That voltage will be low regardless of what's happening on the gates. To see a difference in voltages, we have to look at the voltage at the top, which will vary depending on whether current is flowing through the transistors (that's probably not too clear, but it's the best I can do).