Intermittent Conjecture

Saturday, June 11, 2016

Doctors Fermi, Drake and Strangelove

By now it's well-accepted that there are large numbers of planets in the universe that could plausibly support life more or less as we know it. From this, it follows that unless civilizations like ours are exceedingly rare on such planets, there must be a great number of them in the universe, if not now then at least over history. A recent paper argues that "... as long as the probability that a habitable zone planet develops a technological species is larger than ∼10⁻²⁴[that is, about one in a trillion trillion], humanity is not the only time technological intelligence has evolved".

I've argued elsewhere that numbers like that are beyond our ability to understand directly. For practical purposes, we can call one in a trillion trillion "zero". The paper is essentially concluding that, based on what we know now, there (practically) certainly have been other intelligent civilizations in the universe.

In evaluating a statement like that it's important to keep in mind the scales involved. We're talking about the whole universe here, of which our galaxy is only a tiny part, and we're talking about the entire history of the universe, of which human history is only a tiny part. The authors make a point of not addressing the question of how many such civilizations there might ever have been in our galaxy, much less close enough for communication with Earth to be practical.

They also make a point of not addressing how many such civilizations there might be right now (regardless of where they might be). I want to get into the significance of that.

Questions of how many intelligent civilizations there might be generally center around the Drake Equation, which is probably best thought of as a framework for breaking down the problem. The breakdown is that the number of civilizations we could communicate with must be the product of

Three factors representing the rate at which planets form that might support life appear (we're assuming here, for better or worse, that life lives on planets)
Three factors representing what portion of those actually produce life that would put out a detectable signal
How long those civilizations actually put out a detectable signal (the 'L' factor, for 'lifetime').

We now have a pretty good handle on the first bullet point above. On the other hand, we don't really know how likely it is that a planet that could support life actually develops life or how likely it is that such life actually puts out a detectable signal. I've previously argued that, because of the distances involved there's a big difference between "detectable" and "detectable by us" and that the last factor, how long there would be a detectable signal, could be very, very short on a cosmic scale.

The paper I referenced sidesteps these questions by considering everything everywhere and over all time, regardless of whether we could hope to make contact or would even be around to try. That's fine, but in doing so it shifts from the practical question of "Are we alone?", or Fermi's "Where is everyone?", to the more philosophical question of "Are we unique?". That's an interesting question, but it somehow lacks the emotional resonance of the other two.

I grew up during the Cold War. I remember the electricity of the Berlin Wall opening, and the profound feeling of disorientation that came with it. All my life the East and West had been locked in a permanent stalemate with no sign of an end. And then it ended. Now what?

For the most part, life went on. That's not to say that the transition was smooth, particularly if you had lived in the Soviet Union or its satellites. My point is more that the "western" developed world, at least, went on more or less as it was. McDonalds is still McDonalds, Hollywood still makes films, football (or soccer, if you prefer) is still the world's sport, the US still doesn't care greatly that it is, and so forth. MTV is still the place to go for music videos ... oh, wait ...

Except for nukes.

The amount of nuclear weaponry developed during the Cold War is staggering. The only two nuclear weapons that have actually been used militarily, the ones dropped on Hiroshima and Nagasaki, yielded under 150TJ (or if you prefer, around 35 kilotons) . We saw what that did.

Modern nuclear warheads are generally in the thousands of TJ, and tens of thousands of those have been made. While you can't just multiply numbers and say "Ten thousand times as many bombs each yielding ten times as much means a hundred thousand times as many people killed," it was really no exaggeration, at all, to say that humanity now had the means to cause much, much more destruction than had ever been possible before.

This was a fact of life growing up in the cold war. My high school newspaper once had a debate in the editorial columns about whether a nuclear war could be survived, at all, and if so whether you should even try. The bidding started at "The US government would no longer exist" and from there it wasn't far to "Industrial civilization would collapse, bringing about a new Dark Age lasting centuries" or "All humans would die as nuclear winter wiped out agriculture and plunged temperatures by 20 degrees Celsius for decades". It wasn't completely outlandish to speculate that multicellular life would be wiped out.

This colored our outlook on the world.

Today, not so much, which is interesting since there are still thousands of extremely powerful nuclear weapons in the world and it's not clear that they're as tightly controlled now as they had been. Just why attitudes might have changed is for another discussion. For now, let's just take it as a given that "nukes could kill us all" is not nearly as prominent a thought in the early 21st century as it was in the mid to late 20th.

That L factor of the Drake equation represents the amount of time during which an intelligent civilization puts out a detectable signal. This could be a very short time, on cosmic scales, if only because unless you're actually trying to be detected, putting out radio or other signals that could be detected dozens or hundreds of light years away is a large waste of energy.

If you're streaming video over the internet, for example, no one has to broadcast a signal from a tower. Even if radio signals are involved they are more likely beamed from one microwave station to another or otherwise narrowly focused. An intelligent species could quite likely get along just fine for almost all of its existence without producing a detectable signal, if it so chose.

When the Drake equation was first developed, however, this wasn't the interpretation that people tended to use.

At the time, we had no idea whether there were many habitable planets out there, but we had made a few efforts to contact other stars and to listen for signs of life on them (including Drake's own Project Ozma), without any clear success. That suggested that the factors of the Drake equation must multiply out to a small number.

Since we knew even less then than we know now, most of the factors of the equation were little more than wild guesses. But we did have at least one data point for an intelligent species (at least by our own definition of "intelligent"), and there was one ready explanation that fit with our understanding of that species and the lack of signs of other species like it: Intelligent species didn't last long.

There was ample reason to believe that. Perhaps it was inevitable that, at least on the cosmic scale, it would not be long between a species developing technology that could have a major impact on its planet and that species destroying itself. In 1961, when Frank Drake put forth his equation, it had been less than 20 years since the end of World War II and nuclear weapons testing was in full swing. It was the most natural thing in the world to wonder if we would make it another 20 years.

Now that we've made it over fifty years since then, it may be more natural to assume that we'll still be here in another fifty, or thousand, or whatever, and either to assume that the L factor could be small for any number of non-lethal reasons or to neglect it altogether on the assumption that we'll be around and detectable forever. What strikes me here is how much room, within the broad limit that our theories need to be consistent with the facts as we know them, there is for them to reflect who we are at the moment. Then as well as now.

Saturday, May 28, 2016

What is syntax and how much does it matter?

From a computing point of view, when we say "syntax" we're largely talking about "parse trees". A parse tree breaks a text into components, which are in turn broken down into sub-components, and so forth down to basic pieces analogous to words and punctuation in natural languages. The largest unit is the root of the tree, and the tree branches wherever you break a larger component into smaller. As I've said before, this is just like breaking a sentence down into, say, a noun phrase and a verb phrase, breaking the noun phrase into a determiner and a noun, and so forth.

I've also noted that this isn't the only way to break sentences down. In fact, if you search for sentence diagramming, you're more likely to turn up material on the Reed-Kellog system, and there has been quite a bit of research on dependency grammars. Both of these have tree structures implicit in them, but you could argue that pretty much any formal system does. The more relevant point is that they don't emphasize constituency, that is, what's a part of what. They're more interested in what modifies what, or more generally, what depends on what.

So, what is this syntax that we speak of? I previously defined it as "the study of how words fit together in sentences". Wikipedia has it as "the set of rules, principles, and processes that govern the structure of sentences in a given language, specifically word order", which seems pretty similar except for the emphasis on word order. What sparked this post, however, was a statement in an Nat Geo blog post on killer whales (a.k.a. orcas, but that's a separate discussion), that

Language in the strict sense means syntax, which means that word placement determines meaning. So, “Put the blue pillow on the red pillow” means something different than, “Put the red pillow on the blue pillow.” Same exact words, different order. That’s language. Some dolphins and some apes have the ability to understand human syntax.

Killer whales are dolphins—the biggest ones. I am not aware of whether they understand human syntax.

Initially I was struck by the simplicity of "word placement determines meaning" followed by a convincing example. Later, I wondered how well that notion (or "specifically word order" in the Wikipedia definition) applies to languages with free word order. Certainly order matters in narrating a story, or in figuring out which noun a pronoun might refer to, but in many languages ordering is more a matter of emphasis than meaning.

But now what strikes me about this passage is the emphasis on understanding. This tends toward a more operational definition of syntax, for example, can you understand the difference between Put the blue pillow on the red pillow and Put the red pillow on the blue pillow.

Intuitively it seems like understanding the difference between Canis hominem mordet (dog bites man) and Canem homo mordet (man bites dog) would be much the same task, even though the word order is the same for both of those sentences. So what are we really after here?

Fundamentally the problem to solve is communicating information reasonably efficiently and accurately. I almost said "communicating concepts", but this depends on what concepts the parties involved can understand. I may have a perfectly concise way to say "The square of the hypotenuse is equal to the sum of the squares of the legs", but that's not going to help much if my listener doesn't know what a hypotenuse is.

There's one other piece here, though. Many species are capable of communicating a repertoire of messages, and even of learning new messages for that repertoire. Vervets famously have different alarm calls for their main predators (leopards, eagles, pythons, and baboons). They can also adjust to individuals that consistently make the wrong call, recognize their offspring by their calls and possibly invent new calls for new dangers. Some dogs can be taught names for dozens of different objects which they can then retrieve by name. Neither, however, seems to have language in the same way we do.

To establish that something unusual is going on, as with theories of mind, we need some sort of combinatorial explosion, that is, a situation in which a small number of basic pieces generate a very large number of possibilities.

For example, if we have a red pillow, a blue pillow, a red box and a blue box, and any one of them can be put to the left of, to the right of, in front of, behind, on top of or under any of the others, there 72 different combinations (red pillow to the left of blue pillow, red pillow to the left of red box ... blue box under red box), though since "red pillow to the left of blue pillow" is the same as "blue pillow to the right of red pillow" there are really only 36 possibilities, but 72 ways of expressing them.

The number of possibilities increases as the square of the number of objects. If you double the number of objects, there are four times as many possibilities. Similarly, if you add a new directional relation, say "to the left of and in front of", you've added as many possibilities as there are pairs of objects. If you add a new kind of relation, say "close to" vs. "far from" (leaving aside whether you can place a red pillow far above a blue box), you've multiplied the total number of possibilities by a new factor.

For example, if you have ten objects, twelve directional relations and "touching", "close" and "far apart", you now have 1620 possibilities. You haven't added much to the original setup, but there are now 45 times as many possibilities as before. It's easy to see how you could make this much, much bigger just by adding more different kinds of distinctions.

Imagine an experiment where your subjects are (somehow) taught signs for the four objects and six directional relations, and then (somehow) required to communicate a particular arrangement. Say subject 1 is shown an arrangement that subject 2 can't see, and if it can convince subject 2 to create the same arrangement they both get a reward.

If your test subjects can handle the original setup of 36 possibilities, it's possible that they learned the examples you gave by rote and guessed on ones they hadn't already seen. You could control for that by making sure the two subjects are shown different examples, but if you want to run several trials and there are only 36 possibilities to choose from, it's hard to be sure that any significant syntax is involved.

On the other hand, suppose you have a pair of subjects that can handle the small setup, and then you add a new object. After they see a couple of examples involving the new object they can handle unfamiliar setups involving it about as well as they can handle the originals. You then give a few examples of a new relation (say, diagonal as above) and their performance doesn't suffer. You then show a new kind of relation (say, distance as above) and they can still handle it. You've now got a reasonably large space of arrangements to choose from and you can easily do repeated trials without repeating the exact arrangements.

At that point, I'd say you can infer that the communication system has some way not only of distinguishing "red pillow on top of blue pillow" from "blue pillow on top of red pillow", but distinguishing "A on top of B" from "B on top of A" in general. I'll claim that at that point you can reasonably say there is syntax in some form, as well as some form of "abstract relation".

This is not the same as saying the experimental subjects have the same kind of language as we do. You can solve the problem in the experiment with any way of selecting an directional relation, a first object, a second object and an distance relation. That could be as simple as listing the four in order, as "red-pillow blue-pillow in-front-of close".

Handling new kinds of relations or constraints (e.g., put the pillow fuzzy side up) doesn't require much more syntax. If the system can distinguish one relation/constraint from another, then something like "direction: red-pillow in-front-of blue-pillow, distance: close, orientation: red-pillow fuzzy-side-up" packs in a lot of information, and it's easy to see how you would extend it.

Where does that leave constructs that we tend to think are unique to human language, including dependent clauses like that I saw yesterday in The movie that I saw yesterday was two hours long? I'm not sure how to set up an experiment like the previous one that could distinguish a language with dependent clauses from one without. After all, I could just as well say I saw a movie yesterday. That movie was two hours long. This requires using that in its sense as a determiner to link the sentences together in a particular way. This is still a pretty powerful construct, but it doesn't require tucking I saw the movie yesterday in as a direct modifier to movie.

From this point of view, the distinction between having dependent clauses and not having them is not particularly important. This is in contrast to the computer science-y view that I've been most familiar with, where there is a huge distinction between recursive structures -- ones that can contain sub-parts of the same general form as the structure they're part of, such as sentences acting as clauses inside larger sentences -- and non-recursive structures, which can't. One important distinction from that point of view is that there are in principle infinitely many possible structures -- sentences, say -- if recursion is allowed but only finitely many if it's not.

This is true in the mathematical world, but it's less important when considering real communication. On the one hand, there are only finitely many sentences that are short enough for a real person to say or understand. In practice, we rarely nest more than a few levels deep. When we do, the result is often pretty hard to understand.

On the other hand, "finite" as a mathematical concept includes numbers large enough to be infinite for any practical purpose. In fact, I've argued, almost all numbers are vastly too big for us to comprehend, let alone to occur in any natural context. In practice, this means that even if you have only a handful of template sentences to fill in and you can't nest sentences within sentences, you can still end up with a gargantuan number of possible sentences -- and there's no reason you can't use more than one sentence in a conversation (technically, stringing sentences together can be expressed as recursion, but let's not get into that).

What if you can't do a controlled experiment to figure out how complex a communication system is? What if all the data you have is from observations in the wild? What if you're not sure what part of the noises, gestures or whatever else you observe are or aren't significant? The task is certainly harder, then, but maybe not infeasible. You're still looking for signs of combinatorial explosion, particularly the ability to deal with novel combinations of factors in a way that requires communication, that is, where

Different individuals have different information,
they need to share that information,
the exact information to share varies within a combinatorially large space of possibilities, and
the individuals involved are able to act usefully in ways they couldn't have without sharing the information.

The first two and the last are easy to find in any number of situations (arguably the second and last points are just different ways to say the same thing). When a vervet sees a baboon, it shrieks out the alarm call for baboons and all the vervets skedaddle, you've met all but the third point. From observation, it's reasonably clear that there isn't a combinatorially large space of possibilities. There is a relatively small and constant set of calls.

Human communication clearly satisfies all four points. Most of the sentences in this post, for example, are not only unique to the post but most likely unique to all human communication (I'm going to guess that the phrase "vervets skedaddle" is fairly rare in its own right -- it didn't turn up when I googled it with quotes, though that should soon change ...). This is not something I consciously aimed for, just a natural consequence of writing reasonably long sentences in English (or any other natural language for that matter).

The interesting question is whether anyone else meets the third point.

Saturday, April 30, 2016

Yours for a the special introductory price of ...

The student housing I lived in as an undergrad was modeled after, among other things, European monasteries, which is a nice way of saying the rooms were rather on the small and plain side ... which in turn is a nice way of saying you wouldn't want to live in them unless you really wanted to live in the student houses. Which most people did.

You may not be surprised to hear that people cared a lot about which room they ended up in. There was a whole ritual at the beginning of the school year to decide in what order people got to pick rooms, based on one's class, status as a house officer and possibly other factors I no longer remember. Ordering within groups of equal rank, for example most if not all the freshmen, was decided by drawing cards, and so the whole boisterous mess was called "room draw".

Why did people care so much which of several possible broom closets they ended up in? Well, you wanted to be in the same cluster of rooms with people you liked, you might not want a room over the courtyard where the garbage trucks came at uncivilized hours of the morning, you might feel better at the end of a hall, or in the middle, you might want to be near the showers or not so near ...

The reasons were limitless and probably largely subconscious. Sometimes it didn't matter. Sometimes the people picking ahead of you had different priorities and you ended up with something close your ideal, but not always. Some rooms were almost universally considered good -- and were almost always occupied by seniors -- or not so good -- "Welcome to the house, frosh. Here's your room!"

It was not unknown for people to run for house offices in which they had little interest simply for the room pick. Does anyone really want to be house secretary?

Did you know you can buy land on Mars? It's not clear under what legal authority one can own land on Mars, but that hasn't stopped people from selling it. Apparently copyright law is involved, and that ought to tell you about all you need to know. One site quotes $30/acre. The surface area of Mars is about 36 billion acres, so there's quite a bit of "upside potential". I'm guessing that the site in question has sold considerably less than the full allotment, but they have sold some.

OK, let's assume that you really can own a claim on a piece of Mars. What are you getting?

There's always the small chance that someone will set up an outpost on land you own, in which case you can say "Hey, buddy, I own that" and they can say "OK, nice to know, but we're actually on Mars and we care about your claim, why?"

What you're really buying is bragging rights. You can say to your friends "Hey look, I own a piece of Mars. Says so right here." I can easily see someone wanting to pay $30 for that, especially if someone is keeping track of all the claims and you can honestly say "This particular area is mine and nobody else's (at least according to this particular keeper-tracker-of-er)".

Likewise, you can buy your name on a star. The International Astronomical Union, being the body in charge of giving stars boring catalog numbers, won't be particularly impressed, but that won't stop people from accepting your money. I sincerely hope that nobody is pretending that this is anything more than buying the right to have a name of your choice associated with a star of your choice in somebody's list, but again I could understand someone paying a small fee for the privilege because why not?

On the other blog I wrote about the fascinating case of Cow Clicker, a Facebook game reduced to its very essence, in which people eventually ended up paying for the right to click on a space where a virtual cow had once stood. During the period when there were actual (virtual) cows to click on, people willingly paid for custom cows, including a $100 "Bling Cow". Basically you were announcing to your friends "I bought a Bling Cow", plus you had the pleasure of clicking on a cow with gold accessories.

But is this business of buying virtual cattle or pieces of the sky really abstract or arbitrary enough? We can do better. For the low introductory prices of ... leaving a message in the comments section, I will give you your very own number. That's right. Just leave a short, innocuous piece of text you want immortalized, and I will send you the output of

GIBBERISH=<some randomness I will keep to myself>
MESSAGE=<your message>
echo "print 0x$(echo "${GIBBERISH}${MESSAGE}" | sha256sum | cut -c 1-64)" | python

This lovely shell incantation pastes my gibberish together with your message, uses the SHA256 cryptographic hash algorithm to boil this down to 64 digits of hexadecimal and then uses some hackery to convert that to decimal.

So long as the SHA256 algorithm remains secure, the odds that someone could find another message that gave the same number as your message are astronomically low. For that matter, so are the odds that anyone in human history has ever seen the number for your message before, and so are the odds that anyone who doesn't know what gibberish I chose could tell you what number would result from any given message. Yes, the recipe for mapping messages to numbers is mine ... all mine ... BWAHAHAHA!

Sorry, let it go to my head for a moment.

To get the ball rolling, I claim "squeamish ossifrage" as my message, and I am therefore proud to announce that my personal number is

71550457262820168189328333318549162861663280827014445611156956018117368386158

Let the games begin.

Saturday, April 9, 2016

Primitives

Non sunt multiplicanda entia sine necessitate.

This is one of several formulations of Occam's razor, though Wikipedia informs us that William of Ockham didn't come up with that particular one. Whatever its origins, Occam's razor comes up again and again in what we like to call "rational inquiry". In modern science, for example, it's generally expressed along the lines of "Prefer the simplest explanation that fits the known facts".

If you see a broken glass on the floor, it's possible that someone took the glass into a neighboring county, painstakingly broke it into shards, sent the shards overseas by mail, and then had a friend bring them back on a plane and carefully place them on the floor in a plausible arrangement, but most likely the glass just fell and broke. Only if someone showed you, say, video of the whole wild goose chase might you begin to consider the more complex scenario.

This preference for simple explanations is a major driving force in science. On the one hand, it motivates a search for simpler explanations of known facts, for example Kepler's idea that planets move around the Sun in ellipses, rather than following circular orbits with epicyclets as Copernicus had held. On the other hand, new facts can upset the apple cart and lead to a simple explanation giving way to a more complicated revision, for example the discoveries about the behavior of particles and light that eventually led to quantum theory.

But let's go back to the Latin up at the top. Literally, it means "Entities are not to be multiplied without necessity," and, untangling that a bit, "Don't use more things than you have to", or, to paraphrase Strunk, "Omit needless things". In the scientific world, the things in question are assumptions, but the same principle applies elsewhere.

The mathematical idea of Boolean Algebra underlies much of the computer science that ultimately powers the machinery that brings you this post to read. In fact, many programming languages have a data type called "boolean" or something similar.

In the usual Boolean algebra, a value is always either True or False. You can combine boolean values with several operators, particularly AND, OR and NOT, just as you can combine numbers with operations like multiplication, addition and negation. These boolean operators mean about what you might think they mean, as we can describe them with truth tables very similar to ordinary multiplication or addition tables:

AND	True	False
True	True	False
False	False	False

OR	True	False
True	True	True
False	True	False

NOT	True	False
	False	True

In other words, A AND B is true exactly when both A and B are true, A OR B is true whenever at least one of the two is true, and NOT A is true exactly when A is false. Again, about what you might expect.

You can do a lot with just these simple parts. You can prove things like "A AND NOT A" is always False (something and its opposite can't both be true) and "A OR NOT A" is always True (the "law of the excluded middle": either A or its opposite is true, that is, A is either true or false).

You can break any truth table for any number of variables down into AND, OR and NOT. For example, if you prefer to say that "or" means "one or the other, but not both", you can define a truth table for "exclusive or" (XOR):

XOR	True	False
True	False	True
False	True	False

If you look at where the True entries are, you can read off what that means in terms AND, OR and NOT: There's a True where A is True and B is False, that is, A AND NOT B, and one where B is True and A is False, that is, B AND NOT A. XOR is true when one or the other of those cases hold. They can't both hold at the same time, so it's safe to use ordinary OR to express this: A XOR B = (A AND NOT B) OR (B AND NOT A). The same procedure works for any truth table.

In a situation like this, where we're expressing one concept in terms of others that we take to be more basic, we call the basic concepts "primitive" and the ones built up from them "derived". In this case, AND, OR and NOT are our primitives and we derive XOR (or any other boolean function we like) from them.

Now consider the boolean function NAND, which is true exactly when AND is false. Its truth table looks like this:

NAND	True	False
True	False	True
False	True	True

This is just the table for AND with True entries changed to False and vice versa. That is, A NAND B = NOT (A AND B).

What's A NAND A? If A is True, then we get True NAND True, which is False. If A is False, we get False NAND False, which is True. That is, A NAND A = NOT A. If we have NAND, we don't need NOT. We could just as well use AND, OR and NAND instead of AND, OR and NOT.

Since NAND is just NOT AND, and we can use NAND to make NOT, we don't need AND, either. A AND B = (A NAND B) NAND (A NAND B). So we can get by with just NAND and OR.

As it turns out, we never needed OR to begin with. Quite some time ago, Augustus De Morgan pointed out that A OR B = NOT (NOT A AND NOT B), that is, A or B (or both) are true if both of them are not false, a rule which sometimes comes in handy in making "if" statements in code simpler (the other version of the rule, with the AND and OR switched, is also valid). Using NAND, we can recast that as A OR B = (NOT A NAND NOT B), and we can get rid of the NOT, leaving A OR B = ((A NAND A) NAND (B NAND B)).

Summing up, we can build any boolean function at all out of AND, OR and NOT, and we can build all three of those out of NAND, so we can build any boolean function at all from NAND alone.

For example A XOR B = (A AND NOT B) OR (B AND NOT A). We can use DeMorgan's rules to change that to (NOT (NOT (A AND NOT B) AND NOT (B AND NOT A))), that is, (NOT (A AND NOT B)) NAND (NOT (B AND NOT A)), or more simply, (A NAND NOT B) NAND (B NAND NOT A). We can then replace the NOTs to get (A NAND (B NAND B)) NAND (B NAND (A NAND A)).

Yes, it's ... that ... simple. Feel free to plug in all four combinations of A and B to check.

As silly as this may seem, it has real applications. A particular kind of transistor lets current flow from its source terminal to its drain terminal when the voltage on a third terminal, called the gate, is high. Put a high voltage on the source and let current flow either through the transistor or through an output for the whole thing. Tie the gate of the transistor to an input. If the voltage on the input is high, current will flow through the transistor and not to the output. If the voltage on the input is low, current will flow to the output and not through the transistor. That is, the output voltage will be high when the input voltage is not. The whole thing is called a NOT gate (or inverter).

If you put two transistors in a row, then current will only flow through both of them when the voltage on both of the inputs is high, meaning it will flow through through the output, and the output voltage will be high, unless the voltage on both of the inputs is high. The whole thing is called a NAND gate, and as we saw above, you can build any boolean function you like out of NANDs *.

In fact, we have it a bit easier here because we can build NOT A directly instead of as A NAND A, and for that matter we can build a three-input NAND -- NOT (A AND B AND C) -- easily as well, but even if we couldn't, being able to build a NAND would be enough.

There are other cases where we can build up a whole system from a single primitive. Notably, any computer program (technically, anything a Turing machine can compute) can be expressed in terms of a single instruction or, alternatively, a single "combinator". This includes any boolean function, any numerical function, an HTML parser for web pages, whatever. Of course, there's a difference between being able to express a computation in theory and being able to run it on your laptop or tablet. We'll come back to that.

Before we go on to what all of this might mean, it's worth noting that many significant areas of thought haven't been reduced to simple primitives. For example, chemistry is built from around a hundred elements (there are currently 118 on the periodic table, but you can't do meaningful chemistry with all of them). An atom of any element is composed of protons, neutrons and electrons in varying numbers.

The Standard Model recognizes electrons as elementary, that is, primitive, while protons and neutrons are composed of quarks. In all, it holds that there are 17 particles that everything is composed of -- six quarks, six leptons (including the electron), four gauge bosons and the Higgs. So far, no one has found anything simpler that these might be built up of, but not for lack of trying.

In mathematics, you typically start with a handful of axioms -- statements you assume to be true without proof -- and build from there. There has been extensive work in reducing this to a minimal foundation, but the current formulation of set theory together with model theory has several important basic pieces, not one single concept to rule them all. And, in fact, there are several ways of describing both set theory and model theory, not one single definitive way.

In short, some things can be reduced to a single primitive, but most can't. Even when you can reduce something to a single primitive, there are typically several ways to do it. For boolean algebra, you can just as well use NOR as NAND. In computing there are several universal operations with little to pick among them.

In theory, there is no difference between theory and practice. But, in practice, there is. (attributed to Jan v/d Snepscheut)

If you can reduce any boolean function to NAND, is Boolean algebra in some meaningful sense really just NAND? Is computing really just the study of a single instruction or operator? If we're trying to reduce the number of entities involved, following Occam, is it not better to study a single operation than many?

I think most working mathematicians and computer scientists would answer "No" to all of the above. A general Boolean algebra is a set of objects and operations that follow certain rules. We noted above that A AND A = A. In set theory, the union of a set with itself is that set, and there are other examples. We would like to capture that common behavior somehow, and we do it by defining rules that hold for anything that behaves in the way we're interested in, that is, axioms. In the case of Boolean algebra there are five (chase the link if you're interested).

It just so happens that in the simple case of True, False and the operations on them, NAND and NOR can be used to build all the others. That's nice, but not essential. It's of interest in building circuits out of transistors, but even then there's no requirement to build everything from one type of gate if you don't have to. As noted above, it takes fewer transistors to build a NOT directly, and that's how real circuits are built.

Even from the point of view of Occam's razor, it's not clear that reducing everything to NAND is a good idea. Yes, you have only one operation to deal with, but you can define one truth table just as easily as any other. If you want to use XOR, it's simpler to define the truth table for it than to define the truth table for NAND and then define XOR in terms of it.

In computing, if you have the machinery to rigorously define one instruction or operation, you can define as many as you like with the same machinery. It may be interesting or even useful in some situations that you can define some operations in terms of others, but it doesn't make the system more useful. In practice, I don't care how many instructions the processor has. I care very much if there's an easy way to talk to the network or write to a file, things which are not even mentioned in theoretical discussions of computing (nor should they be in most situations).

So why even bother? Is reducing a system to a single primitive just an interesting academic exercise? Not necessarily. If you're trying to prove properties about circuits or programming systems in general, it can be useful to divide and conquer. First prove, once and for all, that any circuit or program can be reduced to a single primitive, then prove all sorts of useful properties about systems using that primitive. Since you only have a single operator or whatever to deal with, your proofs will be shorter. Since you've proved your operator is universal, they'll be just as powerful. Essentially you've said that when it comes to general properties of a system, adding new operators or whatever doesn't do anything interesting.

You don't have to reduce everything all the way to a single primitive for this to be useful. If you can only reduce a system to five primitives, doing proofs using those five is still easier than doing proofs on an equivalent system with twenty.

In general, there's a tension between keeping a system minimal and making it easy to use. A minimal system is easier to build and it's easier to be confident that it works properly. A larger system is easier to use, as long as there's not too much to take in. Typically there's a sweet spot somewhere in the middle.

There are sixteen possible boolean operators on two variables, and you can build any of them up from NAND (or NOR), but usually we focus on three of them: AND, OR and NOT. These are enough to build anything else in a way that's straightforward to understand. They also correspond fairly closely to familiar concepts. In some useful sense, they minimize the number of things you have to deal with in normal cases, and William of Ockham can rest easy.

It doesn't only matter how many things you have. It matters which things.

* As usual, there are a few more wrinkles to this. You have to tie the output of the last (or only) transistor to ground for current to flow, you need a resistor next to the input, the output also needs to be tied to ground eventually, and so forth.

You may have noted that current is flowing through the two transistors in the NAND gate precisely when both inputs are high, that is, the current is flowing through them (and not to the output) when one gate voltage is high AND the other is. You might think it would be simpler to build an AND than a NAND. However, the current will only flow if the drain of the last transistor is connected to a low voltage. That voltage will be low regardless of what's happening on the gates. To see a difference in voltages, we have to look at the voltage at the top, which will vary depending on whether current is flowing through the transistors (that's probably not too clear, but it's the best I can do).

Sunday, February 14, 2016

Chaotic demonyms

I don't recall having run across the word demonym before Wikipedia, though I may well have run across it and forgotten it. It's certainly one of those words that appears disproportionately often in Wikipedia (Note to self: What are some others? Despite the title, this post on the other blog doesn't really offer any)

So what's a demonym?

A demonym (demos = people, nym = name) is a word for someone or something from a particular place. At least in Wikipedia, a demonym is a noun (I met a Canadian the other day). The adjectival form is generally the same (We stopped at the Canadian border), but not always (I met a Swede at the Swedish embassy).

Demonyms are a bit of a mess. Why is it Canadian and not Canadan (or why isn't the country Canadia)? Why is someone from Washington a Washigtonian but someone from London a Londoner? Why is someone from Manchester a Mancunian? From Monaco a Monegasque? What about Michigander and Arkansawyer, not to mention Hoosier?

Language is full of cases like this that can't be fully described by hard and fast rules, but aren't completely random, either. For example, let me digress a bit about English regular and irregular forms.

In English, most verbs can be completely described by rules (I rest, he/she/it rests, you rest ... I/he/she/it/you/... rested, I/he/she/it/you/... have rested, I/he/she/it/you/... will rest). One English verb is highly irregular: I am, he/she/it is, you/we/they are, I/he/she/it was, you/we/they were, I/he/she/it/you/... have been, I/he/she/it/you/... will be). This is an example of suppletion, where entirely different words fill in for different roles. Another example is can in dialects that don't use forms like might can but instead say might be able to.

Quite a few English verbs, including many of the most common ones, are somewhat irregular. They follow the rules in present tense (I sing, he/she/it sings ...) but use a slightly different form for the past tense (I sang) and possibly yet another for the past participle (I have sung). These are mostly if not entirely holdovers from older Germanic ancestors of modern English, which used vowel changes to indicate tense (more on the ancestry of English here and here) [one exception is snuck, which used to be sneaked].

There are similar cases for nouns (one goose, two geese), but nouns have an extra wrinkle. For some borrowed nouns we also borrow a plural form (one alumnus, two alumni). In at least one case we try to borrow a plural but miss (one octopus, two octopi ... but that should probably be two octopodes, or just say two octopusses and be done with it). When we borrow a plural we're generally not too fussy about the exact details of the source language's inflections. We say of alumni and not alunorum, for example.

None of this is unique to English. Languages in general tend to have irregular forms due to ancestral holdovers, borrowing and such.

So where does that leave us with demonyms?

Just as with nouns and verbs, there are some regular cases. I can't think of any place ending in -ia that doesn't have -ian for its demonym: Asian, Bavarian, Croatian, Dalmatian, ... This is the "unmarked" case. If we want to designate a place and its people in English, regardless of what the locals call the place and themselves, we tend to take some root particular to the place and use -ia for the place -ian for the demonym.

On the other hand, there are plenty of cases of full-on suppletion: Bay Stater (Massachusetts), Hoosier (Indiana), Tar Heel (North Carolina), Scouser (Liverpool) ...

And then there's a lot of interesting territory in the middle. I'd break it down into

Cases where we use some variation of -ian, but it's hard to predict which one: Canadian, Floridian ... but Manitoban, Kansan, Canberran ... and even European and Panamanian. My guess is that most of these can be described by assuming a basic form, probably -ian, together with some rules that would change Iowaian to Iowan and Kansasian to Kansan. That's not to say that we actually think that way, just that such rules may do a decent job of describing what's going on. With, of course, some inevitable exceptions because ... language.
Cases where we use a different form instead of -ian and its relatives.

In the latter case there are still regularities. Generally if the place is X-land, the demonym will be X-lander (Rhode Islander, Newfoundlander ...), notwithstanding the Irish, English, Poles, Marshallese and almost certainly some others. There are also several cases, generally from western Asia, where we borrow the -i suffix: Gujarati, Kuwaiti, Israeli, Pakistani ... Those appear to be generally regular, just using a different set of rules.

Even accounting for cases like "use -er with -land" and "OK, so we borrowed Gujarati", there is still a region of apparent chaos. I can see why Wyomingian would be a bit of a mouthful, but why Wyomingite in particular and not, say, Wyominger? Why Connecticutter and not Connecticutite? For that matter, why Vermonter and not Vermontian?

Have a look at these lists in Wikipedia (here, here and here). Maybe you can set out some nice, crisp rules for the various -ites, -ers (besides -landers), -eses (c.f. -esians) and so forth. I couldn't, and my guess is that it can't be done in a way that doesn't end up looking highly ad hoc (or post hoc?).

I chose the word "chaos" above not just to suggest disorder, but to suggest a particular kind of disorder. Mathematical chaos often occurs at the boundaries between otherwise regular regions. Outside a small region, the Mandelbrot iteration always diverges. Inside a fairly simple boundary, it always converges. Only in between those two, where it's not clear which outcome will happen, do things get messy. Finding roots of equations with Newtonian iteration behaves similarly. In most cases you quickly find one of the roots but in the boundary regions you can't really tell where you end up.

I suspect something analogous is going on here. Most places are inhabited by -ians, but where that doesn't work and you don't have a definite alternative, what you end up with is essentially luck of the draw.

Friday, September 18, 2015

Counting on Freeway Rick

First, a disclaimer: There are obviously a number of hot-button issues relating to the story of former crack cocaine distributor Ricky "Freeway Rick" Ross. This blog is not about such issues. I'm not going to touch on them here and (not that I expect this to be an issue) I'll summarily delete comments if necessary. Hey, it's my blog. If you're wondering "What hot-button issues?", feel free to chase the Wikipedia link above.

With that out of the way ...

In a post on counting (see section 3), I talked about the problem of counting how many crimes a person had committed. This might seem like a fairly academic question, but the case of Ricky Ross illustrates how it can be anything but. Ross was originally sentenced to life without parole for buying 100 kilos of cocaine from an informant. Since Ross had previously been convicted of two drug-related felonies, this put him afoul of California's "three strikes" law, triggering its mandatory sentencing.

Except ...

That sentencing was later overturned on the basis that Ross's previous two convictions -- in different states -- should have been counted as a single conviction as they related to a single conspiracy (or "continuous criminal spree"). That meant that the conviction for selling to the informant was only Ross's second strike, and the sentence was accordingly reduced to 20 years. Ross was released in 2009, with time off for good behavior.

But there's more.

After his release, Ross became embroiled in a lawsuit against William "Rick Ross" Roberts, a prison guard turned gangsta rapper who had sold millions of records under the name "Rick Ross". Freeway Rick lost the suit, but one of the legal issues that came up was whether each of Roberts' "Rick Ross" albums should count as a separate claim, or whether the "single publication" rule applied.

Whatever else one might think about Rick Ross's legal history, it's clear from it that what might seem like a simple matter of counting can be tricky and serious business.

Sunday, September 13, 2015

More on the invisible oceans, and their roundness

Previously, in speculating about what it would take for an inhabitant of one of the several subsurface oceans thought to exist in our solar system to discover that their world was round, I said

Figuring out that the world is round would be a significant accomplishment. The major cues the Greeks used -- ships sinking below the horizon, lunar eclipses, the position of the noontime sun at different latitudes -- would not be available. The most obvious route left is to actually circumnavigate the world. And figure out that you did it.

Fortunately, I was smart enough to leave myself some wiggle room: "I'm very reluctant to say 'such and such would be impossible because ...'" After all, humanity is in a similar situation in trying to figure out the shape of our universe, though in our case circumnavigating doesn't seem to be even remotely close to an option. Even so, we've had some apparent success.

On further reflection, there are at least two ways an intelligent species living in a world like Ganymede's subsurface ocean could figure out that the world is round.

First, and perhaps most likely, it's not necessary for a Ganymedean to circumnavigate the world, if something can. In this case that something would be sound. Under the right conditions, sound can travel thousands of kilometers underwater. The circumference of Ganymede is around 10,000km, which may be too far. Europa's is around 6,000km, which as I understand it is on the edge of what experiments in Earth's oceans have been able to detect (pretty impressive, that).

We've already postulated that sound would be one of the more important senses in such a world. It doesn't seem impossible that someone would notice that especially loud noises tended to be followed a couple of hours later by similar noises from all directions. This assumes that the the oceans are unobstructed, but if they're not, you have landmarks to measure by, which would make the original "circumnavigate and tell that you did it" less of a challenge.

Second, if some sort of light-production and light-detection evolve, then one of the cues the Greeks used is indeed available, at least in theory: one can see things disappear over the horizon. To actually make use of this one would need to be able to see far enough to tell that the object was disappearing due to curvature and not behind some obstacle, or simply because it was too far away to see. The exact details depend on how smooth the inner surface is.

Humans noticed the effect with ships because a calm sea is quite flat, that is to say, quite close to perfectly round. If the inner surface of the ocean is rough, one might have to float at a considerable distance from it, and thus wait for the object to recede a considerable distance, to be sure of the effect. On the other hand, floating a considerable distance from the inner surface would be much easier than floating the same distance from the surface of the earth. For that matter, it would also be possible to note what's visible and what's not at different distances from the inner surface.

A little back-of-the-envelope estimation suggests that one would have to be able to see objects kilometers or tens of kilometers away. By way of comparison, Earth's oceans are quite dark at a depth of one kilometer, so this seems like a longshot. Nor does it help that it's possible to hear long distances, since sound doesn't necessarily propagate in a straight line (neither does light, but that's a different can of worms).

As I originally disclaimed, it's not a good idea to rule something out as impossible just because you can't think of a way to do it. The inhabitants of a subsurface ocean would have thousands, if not millions, of years to figure things out, even if they wouldn't have the advantage of already knowing approximately what their world looks like.

Wednesday, August 26, 2015

Grammars ... not so fast

In the previous post, I made the rather bold statement that the effort to describe natural languages with formal grammars had been "without great success". I probably could have put that better.

I was referring at the time to the effort to define a "universal grammar" that would capture the essence of linguistic competence, that is, people's knowledge of language, independent of performance, that is, what people actually say. A grammar, in that sense, is a set of rules that will produce all, and only, the sentences that speakers of a given language know how to produce.

In that sense, the search for grammars for natural languages has not had great success.

However, there is another line of inquiry that has had notable success, namely the effort to parse natural language, that is, to take something that someone actually said or wrote and give an account of how the words relate to each other that matches well with what actual people would say about the same sentence.

For example, in Colorless green ideas sleep furiously, we would say that colorless and green are describing ideas, ideas are what's sleeping, and furiously modifies sleep. And so would any number of parsers that have been written over the years.

This is an important step towards actually understanding language, whatever that may mean, and it indicates that, despite what I said previously, there may be such a thing as syntax independent of semantics. We can generally handle a sentence like Colorless green ideas sleep furiously the same as Big roan horses run quickly, without worrying about what it might mean for an idea to be both green and colorless or for it to sleep.

So what gives? There are at least two major differences between the search for a universal grammar and the work on parsers.

First, the two efforts have significantly different goals. It's not entirely clear just what a universal grammar might be, which was one of the main points of the previous post, not to mention many more knowledgeable critiques. If universal grammar is a theory, what facts is it being asked to explain?

As far as I understand it, the idea is to explain why people say and comprehend what they do, by producing a grammar that enumerates what sentences they are and aren't competent to say, keeping in mind that people may make mistakes.

However, narrowing this down to "all and only" the "right" sentences for a given language has proved difficult, particularly since it's hard to say where the line between "competence" and "performance" lies. If someone says or understands something that a proposed universal grammar doesn't generate, that is, something outside their supposed "competence", what does that mean? If competence is our knowledge of language, then is the person somehow saying something they somehow don't know how to say?

The work on parsing has very well-defined goals. Typically, algorithms are evaluated by running a large corpus of text through them and comparing the results to what people came up with. As always, one can argue with the methodology, and in particular there is a danger of "teaching to the test", but at least it's very clear what a parser is supposed to do and whether a particular parser is doing it. Research in parsing is not trying to characterize "knowledge", but rather to replicate particular behavior, in this case how people will mark up a given set of sentences.

Second, the two efforts tend to use different tools. Much work on universal grammar has focused on phrase-structure grammars, transformational grammars and in general (to the small extent I understand the "minimalist program") on nested structures: A sentence consists of a verb part and noun parts, a verb part includes a main verb and modifying clauses, which may in turn consist of smaller parts, etc..

While there are certainly parsing approaches based on this work, including some fairly successful ones, several successful natural language parsers are based on dependency grammars, which focus on the relations among words rather than a nesting of sentence parts. Where a phrase-structure grammar would say that colorless green ideas is a noun phrase and so forth, a dependency grammar would say that colorless and green depend on ideas (in the role of adjectives). Dependency grammars can trace their roots back to some of the earliest known work in grammar hundreds of years ago, but for whatever reason they seemed to fall out of favor for much of the late 20th century.

Leaving aside some interesting questions about issues like "non-projective dependencies" (where the lines from the dependent words to the words they depend on cross when the sentence is laid out in its written or spoken order), it's easy to build parsers for dependency grammars using basically the same technique (shift-reduce parsing) as compilers for computer languages use. These parsers tend to be quite fast, and about as accurate as parsers based on phrase-structure grammars.

In short, there is a lot of interesting and successful work going on concerning the syntax of natural languages, just not in the direction (universal grammar and such) that I was referring to in the previous post.

Tuesday, August 18, 2015

What, if anything, is a grammar?

This is going to be one of the more technical posts. As always, I hope it will be worth wading through the details.

There are hundreds, if not thousands, of computer languages. You may have heard of ones like Java, C#, C, C++, ECMAScript (aka JavaScript, sort of), XML, HTML (and various other *ML, not to mention ML, which is totally different), Scheme, Python, Ruby, LISP, COBOL, FORTRAN etc., but there are many, many others. They all have one thing in common: they all have precise grammars in a certain technical sense: A set of rules for generating all, and only, the legal programs in the given language.

This is true even if no one ever wrote the grammar down. If you can actually use a language, there is an implementation that takes code and tells a computer to follow its instructions. That implementation will accept some set of inputs and reject some other set of inputs, meaning it encodes the grammar of the language.

Fine print, because I can't help myself: I claim that if you don't have at least a formal grammar or an implementation, you don't really have a language. You could define a language that, say, accepts all possible inputs, even if it doesn't do anything useful with them. I'd go math major on you then and claim that the set of "other code" it rejects is empty, but a set nonetheless. I'd also argue that if two different compilers are nominally for the same language but behave differently, not that that would ever happen, you have two variants of the same language, or technically, two different grammars.

Formal grammars are great in the world of programming languages. They tell implementers what to implement. They tell programmers what's legal and what's not, so you don't get into (as many) fights with the compiler. Writing a formal grammar helps designers flush out issues that they might not have thought of. There are dissertations written on programming language grammars. There are even tools that can take a grammar and produce a "parser", which is code that will tell you if a particular piece of text is legal, and if it is, how it breaks down into the parts the grammar specifies.

Here's a classic example of a formal grammar:

expr ::= term | term '+' term
term ::= factor | factor '*' factor
factor ::= variable | number | '(' expr ')'

In English, this would be

an expression is either a term by itself or a term, a plus sign, and another term
a term is either a factor by itself or a factor, a multiplication sign ('*') and another factor
a factor is either a variable, a number or an expression in parentheses

Strictly speaking, you'd have to give precise definitions of variable and number, but you get the idea.

Let's look at the expression (x + 4)*(6*y + 3). Applying the grammar, we get

the whole thing is a term
that term is the product of two factors, namely (x + 4) and (6*y + 3)
the first factor is an expression, namely x+4, in parentheses
that expression is the sum of two terms, namely x and 4
x is a factor by itself, namely the variable x
4 is a factor by itself, namely the number 4
6*y + 3 is the sum of two terms, namely 6*y and 3
and so forth

One thing worth noting is that, because terms are defined as the product of factors, 6*y + 3 is automatically interpreted the way it should be, with the multiplication happening first. If the first two rules had been switched, it would have been interpreted as having the addition happening first. This sort of thing makes grammars particularly useful for designing computer languages, as they give precise rules for resolving ambiguities. For the same reason they also make it slightly tricky to make sure they're defining what you think they're defining, but there are known tools and techniques for dealing with that, at least when it comes to computer languages.

The grammar I gave above is technically a particular kind of grammar, namely a context-free grammar, which is a particular kind of phrase structure grammar. There are several other kind. One way to break them down is the Chomsky hierarchy, which defines a series of types of grammar, each more powerful than the last in the sense that it can recognize more kinds of legal input. A context-free grammar is somewhere in the middle. It can define most of the rules for real programming languages, but typically you'll need to add rules like "you have to say what kind of variable x is before you can use it" that won't fit in the grammar itself.

At the top of the Chomsky hierarchy is a kind of grammar that is as powerful as a Turing machine, meaning that if you can't specify it with such a grammar, no computer can compute whether or not a given sentence is legal according to the grammar or not. From a computing/AI point of view (and even from a theoretical point of view), it sure would be nice if real languages could be defined by grammars somewhere in the Chomsky hierarchy. As a result, decades of research have been put into finding such grammars for natural languages.

Without great success.

At first glance, it seems like a context-free grammar, or something like it, would be great for the job. The formal grammars in the Chomsky hierarchy are basically more rigorous versions of the grammars that were taught, literally, in grammar schools for centuries. Here's a sketch of a simple formal grammar for English:

S ::= NP VP (a sentence is a noun phrase followed by a verb phrase)
NP ::= N | Adj NP (a noun phrase is a noun or an adjective followed by a noun phrase)
VP ::= V | VP Adv (a verb phrase is a verb or a verb phrase followed by an adverb)

Put this together with a list of nouns, verbs, adjectives and adverbs, and voila: English sentences. For example, if ideas is a noun, colorless and green are adjectives, sleep is a verb and furiously is an adverb, then we can analyze Colorless green ideas sleep furiously with this grammar just like we could analyze (x + 4)*(6*y + 3) above.

Granted, this isn't a very good grammar as it stands. It doesn't know how to say A colorless green idea sleeps furiously or I think that colorless green ideas sleep furiously or any of a number of other perfectly good sentences, but it's not hard to imagine adding rules to cover cases like these.

Likewise, there are already well-studied notions of subject-verb agreement, dependent clauses and such. Translating them into formal rules doesn't seem like too big a step. If we can just add all the rules, we should be able to build a grammar that will describe "all and only" the grammatical sentences in English, or whatever other language we're analyzing.

Except, what do we mean by "grammatical"?

In the world of formal grammar, "grammatical" has a clear meaning: If the rules will generate a sentence, it's grammatical. Otherwise it's not.

When it comes to people actually using language, however, things aren't quite so clear. What's OK and what's not depends on context, whom you're talking to, who's talking and all manner of other factors. Since people use language to communicate in the real world, over noisy channels and often with some uncertainty over what the speaker is trying to say and particularly over how the listener will interpret it, it's not surprising that we allow a certain amount of leeway.

You may have been taught that constructs like ain't gonna and it don't are ungrammatical, and you may not say them yourself, but if someone says them to you, you'll still know what they mean. If a foreign-born speaker says something that makes more sense in their language than yours, something like I will now wash myself the hands, you can still be pretty certain what they're trying to say. Even if someone says something that seems like nonsense, say That's a really call three, but there's a lot of noise, you'll probably assume they meant something else, maybe That's a really tall tree, if they're pointing at a tall tree.

In short, there probably isn't any such thing as "grammatical" vs. "ungrammatical" in practical use, particularly once you get away from the notion that "grammatical" means "like I was taught in school" -- and as far as that goes, different schools teach different rules and there is plenty of room for interpretation in any particular set of rules. To capture such uncertainty, linguistics recognizes a distinction between competence (one's knowledge of language) and performance (how people actually use language), though not all linguists recognize a sharp distinction between the two.

When there is variation among individuals and over time, science tends to take a statistical view. It's often possible to say very precisely what's going on statistically even if it's impossible to tell what will happen in any particular instance. When there is communication in the presence of noise and uncertainty, information theory can be a useful tool. It doesn't seem outrageous that a grammar aimed at describing real language would take a statistical approach. Whatever approach it takes, it should at least provide a general account of what the analysis meant in information theoretic terms.

[I didn't elaborate on this in the original post, but as I understand it, Chomsky has strongly disputed both of these notions. Peter Norvig quotes Chomsky as saying "But it must be recognized that the notion of probability of a sentence is an entirely useless one, under any known interpretation of this term." This is certainly forceful, but not convincing, even after following the link to the original context of the quote, which contains such interesting assertions as "On empirical grounds, the probability of my producing some given sentence of English [...] is indistinguishable from the probability of my producing some given sentence of Japanese. Introduction of the notion of "probability relative to a situation" changes nothing." This must be missing something, somewhere. If I'm a native English speaker and my hair is on fire, it's much more likely that I'm about to say "My hair is on fire!" than "閑けさや　岩にしみいる　蝉の声" or even "My, what lovely weather we're having".

The entire article by Norvig is well worth reading. But I digress.]

Even if you buy into the idea of precise, hard-and-fast rules describing a language, there are several well-known reasons to think that there is more to the picture than what formal grammars describe. For example, consider these two sentences:

The horse raced past the barn fell.
The letter sent to the president disappeared.

Formally, these sentences are essentially identical, but chances are you had more trouble understanding the first one, because raced can be either transitive, as in I raced the horse past the barn (in the sense of "I was riding the horse in a race"), or intransitive, as in The horse raced past the barn (the horse was moving quickly as it passed the barn -- there's also "I was in a race against the horse", but never mind). On reading The horse raced, it's natural to think of the second sense, but the whole sentence only makes sense if you use the first sense of raced, but in passive voice.

Sentences like The horse raced past the barn fell are called "garden path" sentences, as they "lead you down the garden path" to an incorrect analysis and you then have to backtrack to figure them out.

A purely formal grammar describes only the structure of sentences, not any mechanism for parsing them, that is, recovering the structure of the sentence from its words. There's good experimental data, though, to suggest that we parse The horse raced past the barn fell differently from The letter sent to the president disappeared.

A purely formal grammar doesn't place any practical limits on the sentence that it generates. From a formal point of view, The man whom the woman whom the dog that the cat that walked by the fence that was painted blue scratched barked at waved to was tall is a perfectly good sentence, and would still be a perfectly good sentence if we replaced the fence that was painted blue with the fence that went around the yard that surrounded the house that was painted blue, giving The man whom the woman whom the dog that the cat that walked by the fence that went around the yard that surrounded the house that was painted blue scratched barked at waved to was tall.

You could indeed diagram such a sentence, breaking it down into its parts, but I doubt very many people would say they understood it if it were spoken to them at a normal rate. If you asked "Was it the house or the fence that was painted blue?", you might not get an answer better than guessing (I'm sure such experiments have been done, but I don't have any data handy). Again, the mechanism used for understanding the sentence places limits on what people will say in real life.

In fact, while most languages, if not all, allow for dependent clauses like that was painted blue, they're used relatively rarely. Nested clauses like the previous example are even rarer. I recall recently standing in a fast food restaurant looking at the menu, signs, ads, warnings and so forth and realizing that a large portion of what I saw wasn't even composed of complete sentences, much less complex sentences with dependent clauses: Free side of fries with milkshake ... no climbing outside of play structure ... now hiring. Only when I looked at the fine-print signs on the drink machine and elsewhere did I see mostly "grammatical" text, and even that was in legalese and clearly meant to be easily ignored.

In short, there are a number of practical difficulties in applying the concepts of formal grammar to actual language. In the broadest sense, there isn't any theoretical difficulty. Formal grammars can describe anything computable, and there's no reason to believe that natural language processing is uncomputable, even if we haven't had great success with this or that particular approach. The problems come in when you try to apply that theory to actual language use.

Traditionally, the study of linguistics has been split into several sub-disciplines, including

phonology, the study of how we use sounds in language, for example why we variously use the sounds -s, -z and -ez to represent plurals and similar endings in English (e.g., cats, beds and dishes)
morphology, the study of the forms of words, for example how words is composed of word and a plural marker -S.
syntax, the study of how words fit together in sentences. Formal grammars are a tool for analyzing syntax
semantics, the study of how the meaning of a sentence can be derived from its syntax and morphology
pragmatics, the study of how language is actually used -- are we telling someone to do something? asking for information? announcing our presence? shouting curses at the universe?

It's not a given that actual language respects these boundaries. Fundamentally, we use language pragmatically, sometimes to convey fine shades of meaning but sometimes in very basic ways. We tolerate significant variation in syntax and morphology. It's somewhat remarkable that we recognize a distinction between words and parts of words at all.

In fact, it's somewhat remarkable that we recognize discrete words. On a sonogram of normal speech, it's not at all obvious where words start and end. Clearly our speech-processing system knows something that our visual system -- which is itself capable of many remarkable things -- can't pick out of a sonogram.

Nonetheless, we can in fact recognize words and parts of words. We know that walk, walks, walking and walked are all forms of the same word and that -s, -ing, and -ed are word-parts that we can use on other words. If someone says to you I just merped, you can easily reply What do you mean, "merp"? What's merping? Who merps?

This ability to break down flowing speech into words and words (spoken or written) into parts appears to be universal across languages, so it's reasonable to say "Morphology is a thing" and take it as a starting point. Syntax, then, is the study of how we put morphological pieces together, and a grammar is a set of rules for describing that process.

The trickier question is where to stop. Consider these two sentences:

I eat pasta with cheese
I eat pasta with a fork

Certainly these mean different things. In the first sentence, with modifies pasta, as in Do they have pasta with cheese on the menu here? or It was the pasta with cheese that set the tone for the evening. In the second sentence, with modifies eat, as in I know how to eat with a fork or Eating with a fork is a true sign of sophistication. The only difference is what follows with. The indefinite article a doesn't provide any real clue. You can also say I eat pasta with a fine Chianti.

The real difference is semantic, and you don't know which case you have until the very last word of the sentence. A fork is an instrument used for eating. Cheese isn't. If the job of syntax is to say what words relate to what, it would appear that it needs some knowledge of semantics to do its job.

There are several possible ways around problems like this:

Introduce new categories of word to capture whatever semantic information is needed in understanding the syntax. Along with being a noun, fork can also be an instrument or even an eating instrument. We can then add syntactic rules to the effect that with <instrument> modifies verbs while with <non-instrument> modifies nouns. Unfortunately, people don't seem to respect these categories very well. If someone has just described a method of folding a slice of cheese and using it to scoop pasta, then cheese is an eating instrument. If we're describing someone with bizarre eating habits, a fork could be just one more topping for one's pasta.
Pick one or the other option as the syntactic structure, and leave it up to semantics to do with it as it will. In this case:

Say that, syntactically, with modifies pasta, or more generally the noun, but that this analysis may be modified by the semantic interpretation -- if with is associated with an instrument, it's really modifying the verb.
Say that, syntactically, with modifies eat, or more generally the verb, and interpret eat with cheese as meaning something like "the pasta is covered with cheese when you eat it". This may not be as weird as it sounds. What else do you eat with cheese? is a perfectly good question, though you can argue that in this case the object of eat is what, and with cheese modifies what.

Make syntax indeterminate. Instead of saying that with ... modifies eat or pasta, say that syntactically it modifies "eat or pasta, depending on the semantic interpretation". This provides a somewhat cleaner separation between syntax and semantics, since the semantics only has to decide what's being modified without knowing the exact structure of the sentence it was in.
Say that with modifies eat pasta as a unit, and leave it up to semantics to slice things more finely. This is much the same as the previous option. Technically it gives a definite syntactic interpretation, but only by punting on the seemingly syntactic question of what modifies what.
Say that the distinction between syntax and semantics is artificial, and in real life we use both kinds of structure -- how words are arranged and what they mean -- simultaneously to make sense of language. And while we're at it, the distinction between semantics and pragmatics may turn out not to be that sharp either.

There are two reasons one might want to separate syntax from semantics (and likewise for the other distinctions):

They're fundamentally separate. Perhaps one portion of the brain's speech circuitry handles syntax and another semantics.
On the other hand, separating the two may just be a convenient way of describing the world. One set of phenomena is more easily described with tools like grammars, while another set needs some other sort of tools. Perhaps they're all aspects of the same thing deep down, but this is the best explanation we have so far.

As much as we've learned about language, it's striking, though not necessarily surprising, that fundamental questions like "What is the proper role of syntax as opposed to semantics?" remain open. However, it seems hard to answer the question "What, if anything, is a grammar" without answering the deeper question.

[If you're finding yourself asking "But what about all the work being done in parsing natural languages?" or "But what about dependency grammars?", please see this followup]

Thursday, July 2, 2015

Do androids trip on electric acid?

Have a look at some of these images and take note of whatever adjectives come to mind. If other people's responses are anything to go by, there's a good chance they include some or all of "surreal", "disturbing", "dreamlike", "nightmarish" or "trippy". Particularly "trippy".

These aren't the first computer-generated images to inspire such descriptions. Notably, fractal images have been described in psychedelic terms at least since the Mandelbrot set came to general attention, and the newer, three-dimensional varieties seem particularly evocative. The neural-network generated images, however, are in a different league. What's going on?

Real neural systems appear to be rife with feedback loops. In experiments with in vitro neuron cultures -- nerve cells growing in dishes with electrodes attached here and there -- a system with no other input to deal with will amplify and filter whatever random noise there is (and there is always something) into a clear signal. This would be a "signal" in the information theory sense of something that's highly unlikely to occur by chance, not a signal in the sense of something conveying a particular meaning.

This distinction between the two senses of "signal" is important. Typically a signal in the information theory sense is also meaningful in some way. That's more or less why they call it "information theory". There are plenty of counterexamples, though. For example:

tinnitus (ringing of the ears), where the auditory system fills in a frequency that the ear itself isn't able to produce
pareidolia, where one sees images of objects in random patterns, such as faces in clouds
the gambler's fallacy, where one expects a random process to remember what it has already done and compensate for it ("I've lost the last three hands. I'm bound to get good cards now.")

and so forth. The common thread is that part of the brain is expecting to perceive something -- a sound, a face, a balanced pattern of "good" and "bad" outcomes -- and selectively processes otherwise meaningless input to produce that perception.

In the generated images, a neural network is first trained to recognize a particular kind of image -- buildings, eyes, trees, whatever -- and the input image is adjusted bit by bit to strengthen the signal to the recognizer. The code doing the adjustment knows nothing about what the recognizer expects. It just tries something, and if the recognizer gives it a stronger signal as a result, it keeps the adjustment. If you start with random noise, you end up with the kind of images you were looking for. If you start with non-random input, you get a weird mashup of what you had and what you were looking for.

Our brains almost certainly have this sort of feedback loop built in. Real input often provides noisy and ambiguous signals. Is that a predator behind those bushes, or just a fallen branch? Up to a point it's safer to provide a false positive ("predator" when it's really a branch) than a false negative ("branch", when it's really a predator), so if a predator-recognizer feeds "yeah, that might be a four-legged furry thing with teeth" back to the visual system in order to strengthen the signal, survival chances should be better than with a brain that doesn't do that. A difference in survival chances is exactly what natural selection needs to do its work.

At some point, though, too many false positives mean wasting energy, and probably courting other dangers, by jumping at every shadow. Where that point is will vary depending on all sorts of things. In practice, there will be a sliding scale from "too complacent" to "too paranoid", with no preset "right" amount of caution. Given that chemistry is a vital part of the nervous system's operation, it's not surprising that various chemicals could move such settings. If the change is in a useful direction, we call such chemicals "medicine". Otherwise we call them "drugs".

In other words -- and I'm no expert here -- it seems plausible that we call the images trippy because they are trippy, in the sense that the neural networks that produced them are hallucinating in a manner similar to an actual brain hallucinating. Clearly, there's more going on than that, but this is an interesting result.

When testing software, it's important to look at more than just the "happy" path. If you're testing code that divides numbers, you should see what it does when you ask it to divide by zero. If you're testing code that handles addresses and phone numbers, you should see what it does when you give it something that's not a phone number. Maybe you should feed it some random gibberish (noting exactly what that gibberish was, for future reference), and see what happens.

Testing models of perception (or of anything else), seems similar. It's nice if your neural network for recognizing trees can say that a picture of a tree is a picture of a tree. It's good, maybe good enough for the task at hand, if it's also good at not calling telephone poles or corn stalks trees. But if you're not just trying to recognize pictures, and you're actually trying to model how brains work in general, it's very interesting if your model shows the same kind of failure modes as an actual brain. A neural network that can hallucinate convincingly might just be on to something.