Monday, September 14, 2020

How real are real numbers?

There is always one more counting number.  That is, no matter how high you count, you can always count one higher.  Or at least in principle.  In practice you'll eventually get tired and give up.  If you build a machine to do the counting for you, eventually the machine will break down or it will run out of capacity to say what number it's currently on.  And so forth.  Nevertheless there is nothing inherent in the idea of "counting number" to stop you from counting higher.

In a brief sentence, which after untold work by mathematicians over the centuries we now have several ways to state completely rigorously, we've described something that can exceed the capacity of the entire observable universe as measured in the smallest units we believe to be measurable.  The counting numbers (more formally, the natural numbers) are infinite, but they can be defined not only by finite means, but fairly concisely.

There are levels of infinity beyond the natural numbers.  Infinitely many, in fact.  Again, there are several ways to define these larger infinities, but one way to define the most prominent of them, based on the real numbers, involves the concept of continuity or, more precisely, completeness in the sense that the real numbers contain any number that you can get arbitrarily close to.

For example, you can list fractions that get arbitrarily close to the square root of two: 1.4 (14/10) is fairly close, 1.41 (141/100) is even closer, 1.414 (1414/1000) is closer still, and if I asked for a fraction within one one-millionth, or trillionth, or within 1/googol, that is, one divided by ten to the hundredth power, no problem.  Any number of libraries you can download off the web can do that for you.

Nonetheless, the square root of two is not itself a the ratio of two natural numbers, that is, it is not a rational number or fraction.  The earliest recorded proof of this goes back to the Pythagoreans.  It's not clear exactly who else also figured this out when, but the idea is certainly ancient.  No matter how closely you approach the square root of two with fractions, you'll never find a fraction whose square is exactly two.

OK, but why shouldn't the square root of two be a number?  If you draw a right triangle with legs one meter long, the hypotenuse certainly has some length, and by the Pythagorean theorem, that length squared is two.  Surely that length is a number?

Over time, there were some attempts to sweep the matter under the rug by asserting that, no, only rational numbers are really numbers and there just isn't a number that squares to two.  That triangle? Dunno, maybe its legs weren't exactly one meter long, or it's not quite a right triangle?

This is not necessarily as misguided as it might sound.  In real life, there is always uncertainty, and we only know the angles and the lengths of the sides approximately.  We can slice fractions as finely as we like, so is it really so bad to say that all numbers are rational, and therefore you can't ever actually construct a right triangle with both legs exactly the same length, even if you can get as close as you like?

Be that as it may, modern mathematics takes the view that there are more numbers than just the rationals and that if you can get arbitrarily close to some quantity, well, that's a number too.  Modern mathematics also says there's a number that squares to negative one, which has its own interesting consequences, but that's for some imaginary other post (yep, sorry, couldn't help myself).  The result of adding all these numbers-you-can-get-arbitrarily-close-to to the original rational numbers (every rational number is already arbitrarily close to itself) is called the real numbers.

It turns out that (math-speak for "I'm not going to tell you why") in defining the real numbers you bring in not only infinitely many more numbers, but so infinitely many more numbers that the original rational numbers "form a set of measure zero", meaning that the chances of any particular real number being rational are zero (as usual, the actual machinery that allows you to apply probabilities here is a bit more involved).

To recap, we started with the infinitely many rational numbers -- countably infinite since it turns out that you can match them up one-for-one with the natural numbers -- and now we have an uncountably infinite set of numbers, infinitely too big to match up with the naturals.

But again we did this with a finite amount of machinery.  We started with the rule "There is always one more counting number", snuck in some rules about fractions and division, and then added "if you can get arbitrarily close to something with rational numbers, then that something is a number, too".  More concisely, limits always exist (with a few stipulations, since this is math).

One might ask at this point how real any of this is.  In the real world we can only measure uncertainly, and as a result we can generally get by with only a small portion of even the rational numbers, say just those with a thousand decimal digits or fewer, and for most purposes probably just those with a dozen or so digits (a while ago I discussed just how tiny a set like this is).  By definition anything we, or all of the civilizations in the observable universe, can do is literally as nothing compared to infinity, so are we really dealing with an infinity of numbers, or just a finite set of rules for talking about them?

One possible reply comes from the world of quantum mechanics, a bit ironic since the whole point of quantum mechanics is that the world, or at least important aspects of it, is quantized, meaning that a given system can only take on a specific set of discrete states (though, to be fair, there are generally a countable infinity of such states, most vanishingly unlikely).  An atom is made of a discrete set of particles, each with an electric charge that's either 1, 0 or -1 times the charge of the electron, the particles of an atom can only have a discrete set of energies, and so forth (not everything is necessarily quantized, but that's a discussion well beyond my depth).

All of this stems from the Schrödinger EquationThe discrete nature of quantum systems comes from there only being a discrete set of solutions to that equation for a particular set of boundary conditions.  This is actually a fairly common phenomenon.  It's the same reason that you can only get a certain set of tones by blowing over the opening of a bottle (at least in theory).

The equation itself is a partial differential equation defined over the complex numbers, which have the same completeness property as the real numbers (in fact, a complex number can be expressed as a pair of real numbers).  This is not an incidental feature, but a fundamental part of the definition in at least two ways: Differential equations, including the Schrödinger equation, are defined in terms of limits, and this only works for numbers like the reals or the complex numbers where the limits in question are guaranteed to exist.  Also, it includes π, which is not just irrational, but transcendental, which more or less means it can only be defined as a limit of an infinite sequence.

In other words, the discrete world of quantum mechanics, our best attempt so far at describing the behavior of the world under most conditions, depends critically on the kind of continuous mathematics in which infinities, both countable and uncountable, are a fundamental part of the landscape.  If you can't describe the real world without such infinities, then they must, in some sense, be real.

Of course, it's not actually that simple.

When I said "differential equations are defined in terms of limits", I should have said "differential equations can be defined in terms of limits."  One facet of modern mathematics is the tendency to find multiple ways of expressing the same concept.  There are, for example, several different but equivalent ways of expressing the completeness of the real numbers, and several different ways of defining differential equations.

One common technique (a technique is a trick you use more than once) is to start with one way of defining a concept, find some interesting properties, and then switch perspective and say that those interesting properties are the actual definition.

For example, if you start with the usual definition of the natural numbers: zero and an "add one" operation to give you the next number, you can define addition in terms of adding one repeatedly -- adding three is the same as adding one three times, because three is the result of adding one to zero three times.  You can then prove that addition gives the same result no matter what order you add numbers in (the commutative property).  You can also prove that adding two numbers and then adding a third one is the same as adding the first number to the sum of the other two (the associative property).

Then you can turn around and say "Addition is an operation that's commutative and associative, with a special number 0 such that adding 0 to a number always gives you that number back."  Suddenly you have a more powerful definition of addition that can apply not just to natural numbers, but the reals, the complex numbers, the finite set of numbers on a clock face, rotations of a two-dimensional object, orderings of a (finite or infinite) list of items and all sorts of other things.  The original objects used to define addition -- the natural numbers 0, 1, 2 ... -- are no longer needed.  The new definition works for them, too, of course, but they're no longer essential to the definition.

You can do the same thing with a system like quantum mechanics.  Instead of saying that the behavior of particles is defined by the Schrödinger equation, you can say that quantum particles behave according to such-and-such rules, which are compatible with the Schrödinger equation the same way the more abstract definition of addition in terms of properties is compatible with the natural numbers.

This has been done, or at least attempted, in a few different ways (of course).  The catch is these more abstract systems depend on the notion of a Hilbert Space, which has even more and hairier infinities in it than the real numbers as described above.

How did we get from "there is always one more number" to "more and hairier infinities"?

The question that got us here was "Are we really dealing with an infinity of numbers, or just a finite set of rules for talking about them?"  In some sense, it has to be the latter -- as finite beings, we can only deal with a finite set of rules and try to figure out their consequences.  But that doesn't tell us anything one way or another about what the world is "really" like.

So then the question becomes something more like "Is the behavior of the real world best described by rules that imply things like infinities and limits?"  The best guess right now is "yes", but maybe the jury is still out.  Maybe we can define a more abstract version of quantum physics that doesn't require infinities in the same way that defining addition doesn't require defining the natural numbers.  Then the question is whether that version is in some way "better" than the usual definition.

It's also possible that, as well-tested as quantum field theory is, there's some discrepancy between it and the real world that's best explained by assuming that the world isn't continuous and therefore the real equations should be based on a discrete number system.  I haven't the foggiest idea how that could happen, but I don't see any fundamental logical reason to rule it out.

For now, however, it looks like the world is best described by differential equations like the Schrödinger equation, which is built on the complex numbers, which in turn are derived from the reals, with all their limits and infinities.  The verdict for now: the real numbers are real.

Sunday, September 13, 2020

Entropy and time's arrow

When contemplating the mysteries of time ... what is it, why is it how it is, why do remember the past but not the future ... it's seldom long before the second law of thermodynamics comes up.

In technical terms, the second law of thermodynamics states that the entropy of a closed system increases over time.  I've previously discussed what entropy is and isn't.  The short version is that entropy is a measure of uncertainty about the internal details of a system.  This is often shorthanded as "disorder", and that's not totally wrong, but it probably leads to more confusion than understanding.  This may be in part because uncertainty and disorder are both related to the more technical concept of symmetry, which may not mean what you might expect.  At least, I found some of this surprising when I first went over it.

Consider an ice cube melting.  Is a puddle of water more disordered than an ice cube?  One would think.  In an ice cube, each atom is locked into a crystal matrix, each atom in its place.  An atom in the water is bouncing around, bumping into other atoms, held in place enough to keep from flying off into the air but otherwise free to move.

But which of the two is more symmetrical?  If your answer is "the ice cube", you're not alone.  That was my reflexive answer as well, and I expect that it would be for most people.  Actually, it's the water.  Why?  Symmetry is a measure of what you can do to something and still have it look the same.  The actual mathematical definition is, of course, a bit more technical, but that'll do for now.

An irregular lump of coal looks different if you turn it one way or another, so we call it asymmetrical.  A cube looks the same if you turn it 90 degrees in any of six directions, or 180 degrees in any of three directions, so we say it has "rotational symmetry" (and "reflective symmetry" as well).  A perfect sphere looks the same no matter which way you turn it, including, but not limited to, all the ways you can turn a cube and have the cube still look the same.  The sphere is more symmetrical than the cube, which is more symmetrical than the lump of coal.  So far so good.

A mass of water molecules bouncing around in a drop of water looks the same no matter which way you turn it.  It's symmetrical the same way a sphere is.  The crystal matrix of an ice cube only looks the same if you turn it in particular ways.  That is, liquid water is more symmetrical, at the microscopic level, than frozen water.  This is the same as saying we know less about the locations and motions of the individual molecules in liquid water than those in frozen water.  More uncertainty is the same as more entropy.

Geometrical symmetry is not the only thing going on here.  Ice at -100C has lower entropy than ice at -1C, because molecules in the colder ice have less kinetic energy and a narrower distribution of possible kinetic energies (loosely, they're not vibrating as quickly within the crystal matrix and there's less uncertainty about how quickly they're vibrating).  However, if you do see an increase in geometrical symmetry, you are also seeing an increase in uncertainty, which is to say entropy. The difference between cold ice and near-melting ice can also be expressed in terms of symmetry, but a more subtle kind of symmetry.  We'll get to that.

As with the previous post, I've spent more time on a sidebar than I meant to, so I'll try to get to the point by going off on another sidebar, but one more closely related to the real point.

Suppose you have a box with, say, 25 little bins in it arranged in a square grid.  There are five marbles in the box, one in each bin on the diagonal from upper left to lower right.  This arrangement has "180-degree rotational symmetry".  That is, you can rotate it 180 degrees and it will look the same.  If you rotate it 90 degrees, however, it will look clearly different.

Now put a lid on the box, give it a good shake and remove the lid.  The five marbles will have settled into some random assortment of bins (each bin can only hold one marble).  If you look closely, this random arrangement is very likely to be asymmetrical in the same way a lump of coal is: If you turn it 90 degrees, or 180, or reflect it in a mirror, the individual marbles will be in different positions than if you didn't rotate or reflect the box.

However, if you were to take a quick glimpse at the box from a distance, then have someone flip a coin and turn the box 90 degrees if the coin came up heads, then take another quick glimpse, you'd have trouble telling if the box had been turned or not.  You'd have no trouble with the marbles in their original arrangement on the diagonal.  In that sense, the random arrangement is more symmetrical than the original arrangement, just like the microscopic structure of liquid water is more symmetrical than that of ice.

[I went looking for some kind of textbook exposition along the lines of what follows but came up empty, so I'm not really sure where I got it from.  On the one hand, I think it's on solid ground in that there really is an invariant in here, so the math degree has no objections, though I did replace "statistically symmetrical" with "symmetrical" until I figure out what the right term, if any, actually is.

On the other hand, I'm not a physicist, or particularly close to being one so this may be complete gibberish from a physicist's point of view.  At the very least, any symmetries involved have more to do with things like phase spaces, and "marbles in bins" is something more like "particles in quantum states".]

The magic word to make this all rigorous is "statistical".  That is, if you have a big enough grid and enough marbles and you just measure large-scale statistical properties, and look at distributions of values rather than the actual values, then an arrangement of marbles is more symmetrical if these rough measures measures don't change when you rotate the box (or reflect it, or shuffle the rows or columns, or whatever -- for brevity I'll stick to "rotate" here).

For example, if you count the number of marbles on each diagonal line (wrapping around so that each line has five bins), then for the original all-on-one-diagonal arrangement, there will be a sharp peak: five marbles on the main diagonal, one on each of the diagonals that cross that main diagonal, and zero on the others.  Rotate the box, and that peak moves.  For a random arrangement, the counts will all be more or less the same, both before and after you rotate the box.  A random arrangement is more symmetrical, in this statistical sense.

The important thing here is that there are many more symmetrical arrangements than not.  For example, there are ten wrap-around diagonals in a 5x5 grid (five in each direction) so there are ten ways to put five marbles in that kind of arrangement.  There are 53,130 total ways to put 5 marbles in 25 bins, so there are approximately 5,000 times as many more-symmetrical, that is, higher-entropy, arrangements.  Granted, some of these are still fairly unevenly distributed, for example four marbles on one diagonal and one off it, but even taking that into account, there are many more arrangements that look more or less the same if you rotate the box than there are that look significantly different.

This is a toy example.  If you scale up to, say, the number of molecules in a balloon at room temperature, "many more" becomes "practically all".  Even if the box has 2500 bins in a 50x50 grid, still ridiculously small compared to the trillions of trillions of molecules in a typical system like a balloon, or a vase, or a refrigerator or whatever, the odds that all of the balls line up on a diagonal are less than one in googol (that's ten to the hundredth power, not the search engine company). You can imagine all the molecules in a balloon crowding into one particular region, but for practical purposes it's not going to happen, at least not by chance in a balloon at room temperature.

If you start with the box of marbles in a not-very-symmetrical state and shake it up, you'll almost certainly end up with a more symmetrical state, simply because there are many more ways for that to happen.  Even if you only change one part of the system, say by taking out one marble and putting it back in a random empty bin adjacent to its original position, there are still more cases than not in which the new arrangement is more symmetrical than the old one.

If you continue making more random changes, whether large or small, the state of the box will get more symmetrical over time.  Strictly speaking, this is not an absolute certainty, but for anything we encounter in daily life the numbers are so big that the chances of anything else happening are essentially zero.  This will continue until the system reaches its maximum entropy, at which point large or small random changes will (essentially certainly) leave the system in a state just as symmetrical as it was before.

That's the second law -- as a closed system evolves, its entropy will essentially never decrease, and if it starts in a state of less than maximum entropy, its entropy will essentially always increase until it reaches maximum entropy.

And now to the point.

The second law gives a rigorous way to tell that time is passing.  In a classic example, if you watch a film of a vase falling off a table and shattering on the floor, you can tell instantly if the film is running forward or backward: if you see the pieces of a shattered vase assembling themselves into an intact vase, which then rises up and lands neatly on the table, you know the film is running backwards.  Thus it is said that the second law of thermodynamics gives time its direction.

As compelling as that may seem, there are a couple of problems with this view.  I didn't come up with any of these, of course, but I do find them convincing:

  • The argument is only compelling for part of the film.  In the time between the vase leaving the table and it making contact with the floor, the film looks fine either way.  You either see a vase falling, or you see it rising, presumably having been launched by some mechanism.  Either one is perfectly plausible, while the vase assembling itself from its many pieces is totally implausible.  But the lack of any obvious cue like pottery shards improbably assembling themselves doesn't stop time from passing.
  • If your recording process captured enough data, beyond just the visual image of the vase, you could in principle detect that the entropy of the contents of the room increases slightly in one direction, but that doesn't actually help because entropy can increase locally without violating the second law.  For example, you can freeze water in a freezer or by leaving it out in the cold.  Its entropy decreases, but that's fine because entropy overall is still increasing, one way or another (for example, a refrigerator produces more entropy by dumping heat into the surrounding environment than it removes in cooling its contents).  If you watch a film of ice melting, there may not be any clear cues to tell you that you're not actually watching a film of ice freezing, running backward.  But time passes regardless of whether entropy is increasing or decreasing in the local environment.
  • Most importantly, though, in an example like a film running, we're only able to say "That film of a vase shattering is running backward" because we ourselves perceive time passing.  We can only say the film is running backward because it's running at all.  By "backward", we really mean "in the other direction from our perception of time".  Likewise, if we measure the entropy of a refrigerator and its contents, we can only say that entropy is increasing as time as we perceive it increases.
In other words, entropy increasing is a way that we can tell time is passing, but it's not the cause of time passing, any more than a mile marker on a road makes your car move.  In the example of the box of marbles, we can only say that the box went from a less symmetrical to more symmetrical state because we can say it was in one state before it was in the other.

If you printed a diagram of each arrangement of marbles on opposite sides of a piece of paper, you'd have two diagrams on a piece of paper.  You couldn't say one was before the other, or that time progressed from one to the other.  You can only say that if the state of the system undergoes random changes over time, then the system will get more symmetrical over time, and in particular the less symmetrical arrangement (almost certainly) won't happen after the more symmetrical one.  That is, entropy will increase.

You could even restate the second law as something like "As a system evolves over time, all state changes allowed by its current state are equally likely" and derive increasing entropy from that (strictly speaking you may have to distinguish identical-looking potential states in order to make "equally likely" work correctly -- the rigorous version of this is the ergodic hypothesis).  This in turn depends on the assumptions that systems have state, and that state changes over time.  Time is a fundamental assumption here, not a by-product.

In short, while you can use the second law to demonstrate that time is passing, you can't appeal to the second law to answer questions like "Why do we remember the past and not the future?"  It just doesn't apply.

Saturday, September 12, 2020

What part of consciousness is social?

I think a lot of questions about consciousness fall into one of two categories:

  • What is it, that is, what features does it have, what states of consciousness are there, what are reasonable tests of whether something is conscious or not (given that we can't directly experience any consciousness but our own)?
  • How does it happen, that is, what causes things (like us, for example) to have conscious experiences?
Reading that over, I'm not sure it really captures the distinction I want to make.  The first item deals in experiments people know how to do right now, and there has been quite a lot of exciting work on the first type of question, falling under rubrics like "cognitive science" and "neural correlates of consciousness".

I mean for the second item to represent "the hard problem of consciousness", the "Why does anyone experience anything at all?" kind of question.  It's not clear whether one can conduct experiments about questions like this at all and, as far as I know, no one has an answer to that isn't ultimately circular.

For example, "We have consciousness because we have a soul" by itself doesn't answer "What is a soul?" and "How does it give us consciousness?" or clearly suggest an experiment that could confirm or refute it.  Instead, it states a defining property (typically among others): A soul is something which gives us consciousness.  The discussion doesn't necessarily end there, but if there's an answer to How does consciousness happen in it, it's not in the mere assertion that souls give us consciousness.

Similarly, if we substitute more mechanistic terms like "quantum indeterminacy" or "chaos of non-linear systems" or whatever else for "soul" in "We have consciousness because ...", we haven't explained why that leads to the subjective experience of consciousness or provided a way to test the assertion.  We may well be able to demonstrate that some aspect or another of consciousness is associated with some structure -- some collection of neurons, one might expect -- where quantum indeterminacy or chaos plays a significant role, but that doesn't explain why that structure correlates with consciousness rather than being just another structure along with the gall bladder, earlobe or whatever else.

If we were able to pinpoint some complex of neural circuits that fire exactly when a person is conscious, or perhaps more realistically, in a particular state of waking consciousness, or consciousness of a particular experience, it would be tempting, then, to say "Aha! We've found the neural circuits that cause consciousness," but that's not really accurate, for a couple of reasons.

First, correlation doesn't imply cause, which is why we speak of neural correlates of consciousness, not causes.  Second, even if there's a good case that the neural pattern we locate really is a cause -- for example, maybe it can be demonstrated that if the pattern is disrupted the person loses consciousness, as opposed to the other way around -- we still don't know what is causing a person to have the subjective experience of consciousness.  We can talk with some confidence about patterns of neurons firing, or even of subjects reporting particular experiences, but we can't speak with confidence about people actually experiencing things.

If we didn't already know that subjective experiences existed (or, at least, I know my subjective experiences exist), there's nothing about the experiment that would tell us that they did, much less why.  All we know is that if neurons are firing in such-and-such a state, the subject reports conscious experiences.

Since we do experience consciousness, it's blindingly obvious to us that the subject must be as well, but again that just shifts the problem back a level: We're convinced that we have found something that causes the subject to experience what we experience, but that doesn't explain why we experience anything to begin with.  If we were all "philosophical zombies" that exhibited all the outward signs of consciousness without actually experiencing it, the experiment would run exactly the same -- except that no one would actually experience it happening.

That's more than I meant to say about the second bullet point.  I actually meant to explore the first one, so let's try that.

Suppose you're hanging out in your hammock on a pleasant afternoon (note to self: how did I let the summer go by without that?).  You hear the wind in the trees, maybe birds chirping or dogs barking or kids playing, or cars going by, or whatever.  You are alone with your own thoughts, but for a while even those die down and you're just ... being.  Are you conscious?  Unless you've actually drifted off to sleep, I think most people would answer yes.  If someone taps you on your shoulder or shouts your name, you'll probably respond, though you might be a bit slow to come back up to speed.  If it starts to rain, you'll feel it.  If something makes a loud noise and you manage to regain your meditative state, you're still liable to remember the noise.

On the other hand, it's something of a different state of consciousness than much of our usual existence.  There's nothing verbal going on.  There's no interaction with other people, none of the constant evaluation  (much of which we're generally not aware of) concerning what people might be thinking, or whether they heard or understood you, or whether you're understanding them, or what their motives might be, or their opinions of you or others around, or what they might be aware of or unaware of.  You're not having an inner conversation with yourself or that jerk who cut you off at the intersection, and there's little to no self consciousness, if you're only focusing on the sensory experience of the moment (indeed, this is a major reason people actively seek such a meditative state).

I've become more and more convinced over time that we often underestimate how conscious other beings are.  I don't subscribe to the sort of literal panpsychism that holds that a brick has a consciousness, that "It is something (to a brick) to be a brick".  I doubt this is a particularly widely held position anyway, so much as the anchor at one end of a spectrum between it and "nothing is actually conscious at all".  However, I am open to the idea that anything with a certain minimum complement of capabilities which can be measured fairly objectively, including particularly senses and memory, has some sort of consciousness, and, as a corollary, that there are many different kinds or components of consciousness that different things have at different times.

For example, a hawk circling over a field waiting for a mouse to pop out of its burrow likely has some sort of experience of doing this, and if it spots a mouse, it has some sort of awareness of there now being prey to pursue with the goal of eating it or, if there are no mice, an awareness of being hungry.  This wouldn't be awareness on a verbal, reflective level we experience when we notice we are hungry and tell someone about it, but something more akin to that "I'm relaxing in a hammock and things are just happening" kind of awareness.  I also wouldn't claim that this awareness is serving any particular purpose.  Rather, it's a side effect of having the sort of mental circuitry a hawk has and being embodied in a universe where time exists -- another mystery that may well be deeply connected to the hard problem of consciousness.

I think this is in some sense the simplest hypothesis, given that we have the same general kind of neural machinery as hawks and that we can experience things happening.  It still presupposes that there's some sort of structural difference between things with at least some subjective experiences and things with no such experiences at all, but that "something" becomes a fairly general and widely-shared capacity for sensing the world and retaining some memory of it rather than a specialized facility unique to us.  The difference between us and a hawk is not that we're conscious and hawks aren't, but that we have a different set of experiences from hawks.  For the most part this would be a larger set of experiences, but, if you buy the premise of hawks having experiences at all, there are almost certainly some that they have but we don't.

Which leads me back to the title of this post.

I suspect that if you polled a bunch of people about consciousness in other animals, you'd see more "yes" answers to "is a chimpanzee conscious" or "is a dog conscious" than to "is a hawk conscious" or "is a salmon conscious".  Some of this is probably due to our concept of intelligence in other animals.  Most people probably think that chimps and dogs are "smart animals", while hawks and salmon are "just regular animals".

However, I think our judgment of that is strongly colored by chimps and dogs being more social animals than hawks or fish (even fish that school are probably not social in the same way we are -- I'd go into why I think that, but this post is already running a bit long).  It doesn't take much observation of chimps and dogs interacting with their own species and with humans to conclude that they have some awareness of individual identities and social structure, the ability to persuade others to do what they want (or at least try), and other aspects of behavior that are geared specifically toward interaction with those around them.  Other animals do interact with each other, but social animals like chimps, dogs and humans normally do so on a daily basis as a central part of life.

This social orientation produces its own set of experiences beyond "things are happening in the physical world" experiences like hunger and an awareness that some potential food just popped out of a burrow.  I think it's this particular kind of experience that we tend to gravitate toward when we think of conscious experience.  More specifically, self-awareness is often held out as the hallmark of "true consciousness", and I think there's a good case that self-awareness is closely connected to the sort of "what is that one over there thinking and what do they want" calculation that comes of living as a social animal.

To some extent this is a matter of definition.  If you define consciousness as self-awareness, then it's probably relatively rare, even if several species are able to pass tests like the mirror test (Can the subject tell that the animal in the mirror is itself?).  However, if you define consciousness as the ability to have subjective experiences, then I think it's hard to argue that it's not widespread.  In that formulation, self-awareness is a particular kind of subjective experience limited to relatively few kinds of being, but only one kind of experience among many.

Tuesday, March 10, 2020

Memory makes you smarter

Another sidebar working up to talking about the hide-and-seek demo.

Few words express more exasperation than "I just told you that!", and -- fairly or not -- there are few things that can lower someone's opinion of another person's cognitive function faster than not remembering simple things.

Ironically for systems that can remember much more data much more permanently and accurately than we ever could, computers often seem to remember very little.  For example, I just tried a couple of online AI chatbots, including one that claimed to have passed a Turing test.  The conversations went something like this:
Me: How are you?
Bot: I'm good.
Me: That's great to hear.  My name is Fred.  My cousin went to the store the other day and bought some soup.
<a bit of typical AI bot chat, pattern-matching what I said and parroting it back, trying stock phrases etc.>
Me: By the way, I just forgot my own name.  What was it?
<some dodge, though one did note that it was a bit silly to forget one's own name>
Me: Do you remember what my cousin bought the other day?
<some other dodge with nothing to do with what I said>
The bots are not even trying to remember the conversation, even in the rudimentary sense of scanning back over the previous text.  They appear to have little to no memory of anything before the last thing the human typed.

Conversely, web pages suddenly got a lot smarter when sites started using cookies to remember state between visits and again when browsers started to be able to remember things you'd typed in previously.  There's absolutely nothing anyone would call AI going on, but it still makes the difference between "dumb computer" and "not so bad".

When I say "memory" here, I mean the memory of things that happen while the program is running.  Chess engines often incorporate "opening books" of positions that have occurred in previous games, so they can play the first few moves of a typical game without doing any calculation.  Neural networks go through a training phase (whether guided by humans or not).  One way or another, that training data is incorporated into the weightings that determine the networks behavior.

In some sense, those are both a form of memory -- they certainly consume storage on the underlying hardware -- but they're both baked in beforehand.  A chess engine in a tournament is not updating its opening book.  As I understand it, neural network-based chess engines don't update their weights while playing in a tournament, but can do so between rounds (but if you're winning handily, how much do you really want to learn from your opponents' play?).

Likewise, a face recognizer will have been trained on some particular set of faces and non-faces before being set loose on your photo collection.  For better or worse, its choices are not going to change until the next release (unless there's randomization involved).

Chess engines do use memory to their advantage in one way: they tend to remember a "cache" of positions they've already evaluated in determining previous moves.  If you play a response that the engine has already evaluated in detail, it will have a head start in calculating its next move.  This is standard in AB engines, at least (though it may be turned off during tournaments).  I'm not sure how much it applies for NN engines.   To the extent it does apply, I'd say this absolutely counts as "memory makes you smarter".

Overall, though, examples of what we would typically call memory seem to be fairly rare in AI applications.  Most current applications can be framed as processing a particular state of the world without reference to what happened before.  Recognizing a face is just recognizing a face.

Getting a robot moving on a slippery surface is similar, as I understand it.  You take a number of inputs regarding the position and velocity of the various members and whatever visual input you have, and from that calculate what signals to send to the actuators.  There's (probably?) a buffer remembering a small number of seconds worth of inputs, but beyond that, what's past is past (for that matter, there's some evidence that what we perceive as "the present" is basically a buffer of what happened in the past few seconds).

Translating speech to text works well enough a word or phrase at a time, even if remembering more context might (or might not) help with sorting out homonyms and such.   In any case, translators that I'm familiar with clearly aren't gathering context from previous sentences.  It's not even clear they can remember all of the current sentence.

One of the most interesting things about the hide-and-seek demo is that its agents are capable of some sort of more sophisticated memory.  In particular, they can be taught some notion of object permanence, usually defined as the ability to remember that objects exist even when you can't see them directly, as when something is moved behind an opaque barrier.  In purely behavioral terms, you might analyze it as the ability to change behavior in response to objects that aren't directly visible, and the hide-and-seek agents can definitely do that.  Exactly how they do that and what that might imply is what I'm really trying to get to here ...

Sunday, March 1, 2020

Intelligence and intelligence

I've been meaning for quite a while to come back to the hide-and-seek AI demo, but while mulling that over I realized something about a distinction I'd made in the first post.  I wanted to mention that brief(-ish-)ly in its own post, since it's not directly related to what I wanted to say about the demo itself.

In the original post, I distinguished between internal notions of intelligence, concerning what processes are behind the observed behavior, and external notions which focus on the behavior itself (note to self: find out what terms actual cogsci/AI researchers use -- or maybe structural and functional would be better?).

Internal definitions on the order of "Something is intelligent if it's capable of learning and dealing with abstract concepts" seem satisfying, even self-evident, until you try to pin down exactly what is meant by "learning" or "abstract concept".  External definitions are, by construction, more objective and measurable, but often seem to call things "intelligent" that we would prefer not to call intelligent at all, or call intelligent in a very limited sense.

The classic example would be chess (transcribing speech and recognizing faces would be others).  For quite a while humans could beat computers at chess, even though even early computers could calculate many more positions than a human, and the assumption was that humans had something -- abstract reasoning, planning, pattern recognition, whatever -- that computers did not have and might never have.  Therefore, humans would always win until computers could reason abstractly, plan, recognize patterns or whatever else it was that only humans could do. In other words, chess clearly required "real intelligence".

Then Deep Blue beat Kasparov through sheer calculation, playing a "positional" style that only humans were supposed to be able to play.  Clearly a machine could beat even the best human players at chess without having anything one could remotely call "learning" or "abstract concepts".  As a corollary, top-notch chess-playing is not a behavior that can be used to define the kind of intelligence we're really interested in.

This is true even with the advent of Alpha Zero and similar neural-network driven engines*. Even if we say, for the sake of the argument, that neural networks are intelligent like we are, the original point still holds.  Things that are clearly unintelligent can play top-notch chess, so "plays top-notch chess" does not imply "intelligent like we are".  If neural networks are intelligent like we are, it won't be because they can play chess, but for other reasons.

The hide-and-seek demo is exciting because on the one hand, it's entirely behavior based.  The agents are trained on the very simple criterion of whether any hiders are visible to the seekers.  On the other hand, though, the agents can develop capabilities, particularly object permanence, that have been recognized as hallmarks of intelligence since before there were computers (there's a longer discussion behind this, which is exactly what I want to get to in the next post on the topic).

In other words, we have a nice, objective external definition that matches up well with internal definitions.  Something that can
  • Start with only basic knowledge and capabilities (in this case some simple rules about movement and objects in the simulated environment)
  • Develop new behaviors in a competition against agents with the same capabilities
is pretty clearly intelligent in some meaningful sense, even if it doesn't seem as intelligent as us.

If we want to be more precise about "develop new behaviors", we could either single out particular behaviors, like fort building or ramp jumping, or just require that any new agent we're trying to test starts out by losing heavily to the best agents from this demo but learns to beat them, or at least play competitively.

This says nothing about what mechanisms such an agent is using, or how it learns.  This means we might some day run into a situation like chess where something beats the game without actually appearing intelligent in any other sense, maybe some future quantum computer that can simultaneously try out all a huge variety of possible strategies.  Even then, we would learn something interesting.

For now, though, the hide-and-seek demo seems like a significant step forward, both in defining what intelligence might be and in producing it artificially.

* I've discussed Alpha Zero and chess engines in general at length elsewhere in this blog.  My current take is that the ability of neural networks to play moves that appear "creative" to us and to beat purely calculation based (AB) engines doesn't imply intelligence, and that the ability to learn the game from nothing, while impressive, doesn't imply anything like what we think of as human intelligence, even though it's been applied to a number of different abstract games.  That isn't a statement about neural networks in general, just about these particular networks being applied to the specific problem of chess and chess-like games.  There's a lot of interesting work yet to be done with neural networks in general.

Sunday, February 23, 2020

What good is half a language?

True Wit is Nature to advantage dress'd
What oft was thought, but ne'er so well express'd
-- Alexander Pope

How did humans come to have language?

There is, to put it mildly, a lot we don't know about this.  Apart from the traditional explanations from various cultures, which are interesting in their own right, fields including evolutionary biology, cognitive science and linguistics have had various things to say about the question, so why shouldn't random bloggers?

In what follows, please remember that the title of this blog is Intermittent Conjecture.  I'm not an expert in any of those three fields, though I've had an amateur interest in all three for years and years.  Real research requires careful gathering of evidence and checking of sources, detailed knowledge of the existing literature, extensive review and in general lots of time and effort.  I can confidently state that none of those went into this post, and anything in here should be weighed accordingly.  Also, I'm not claiming any original insight.  Most likely, all the points here have already been made, and better made, by someone else already.

With that said ...

In order to talk about how humans came to have language, the first question to address is what does it mean to have language at all.  Language is so pervasive in human existence that it's surprisingly hard to step back and come up with an objective definition that captures the important features of language and doesn't directly or indirectly amount to "It's that thing people do when they talk (or sign, or write, or ...) in order to communicate information."

We want to delimit, at least roughly, something that includes all the ways we use language, but excludes other activities, including things that we sometimes call "language", but that we somehow know aren't "really" language, say body language, the language of flowers or, ideally, even computer languages, which deliberately share a number of features with human natural languages.

Since language is often considered something unique to humans, or even something that makes us human, it might be tempting to actively try to exclude various ways that other animals communicate, but it seems better to me just to try to pin down what we mean by human language and let the chips fall where they may when it comes to other species.

For me, some of the interesting features of language are
  • It can communicate complex, arbitrary structures from one mind to another, however imperfectly.
  • It is robust in the face of noise and imperfection (think of shouting in a loud music venue or talking with someone struggling with a second language).
  • It tolerates ambiguity, meaning that (unlike in computer languages and other formal systems) ambiguity doesn't bring a conversation to a halt.  In some cases it's even a useful feature.
  • Any given language provides multiple ways to express the same basic facts, each with its own particular connotations and emphasis.
  • Different languages often express the same basic facts in very different ways.
  • Related to these, language is fluid across time and populations.  Usage changes over time and varies across populations.
  • It can be communicated by a variety of means, notably speech, signing and writing.
  • From an evolutionary point of view, it has survival value.
I'd call these functional properties, meaning that they relate mainly to what language does without saying anything concrete about how it does it.  Structurally (from here on I'll tend to focus on spoken/written language, with the understanding that it's not the whole story),
  • Language is linear.
That is, whatever the medium, words are produced and received one at a time, though there can be a number of "side channels" such as pitch and emphasis, facial expressions and hand gestures.
  • The mapping between a word and its meaning is largely arbitrary (though you can generally trace a pretty elaborate history involving similar words with similar meanings).
  • Vocabulary is extensible.
We can coin words for new concepts.  This is true only for certain kinds of words, but where it can happen it happens easily.
  • Meaning is also extensible
We can apply existing words in new senses and again this happens easily.
  • The forms used adjust to social conditions.
You speak differently with your peers after work than you would to your boss at work, or to your parents as a child, or to your prospective in-laws, and so forth
  • The forms used adjust to particular needs of the conversation, for example which details you want to emphasize (or obscure).
  • Some concepts seem to more tightly coupled to the structure of a particular language than others.
For example, when something happened or will happen in relation to when it is spoken of is generally part of the grammar, or marked by a small, closed set of words, or both.
  • On the other hand, there is wide variety in exactly how such things are expressed.
Different languages emphasize different distinctions.  For example, some languages don't specially mark singular/plural, or past/present, though of course they can still express that there was more than one of something or that something happened yesterday rather than today.  Different languages use different devices to convey basic information like when something happened or what belongs to whom.
  • Syntax, in the form of word order and inflection (changing the forms of words, as with changing dog to dogs or bark to barked or barking), collectively seem to matter in all languages, but the exact way in which they matter, and the degree to which each matters, seem to be unique to any given language.  Even closely related languages generally differ in the exact details.
There are plenty of other features that could each merit a separate post, such as honorifics (Mr. Hull) and diminutives (Davey), or how accent and vocabulary are such devastatingly effective in-group markers, or how metaphors work, or what determines when and how we choose to move words around to focus on a topic, or why some languages build up long words that equate to whole sentences of short words in other languages, or why in some languages directional words like to and of take on grammatical meaning, or why different languages break down verb tenses in different ways, or can use different words for numbers depending on what's being counted, and so on and so on ...

Many of these features of language have to do with the interplay between cognition -- how we think -- and language -- how we express thoughts.  The development of cognition must have been both a driver and a limiting factor in the development of language, but we are almost certainly still in the very early stages of understanding this relationship.

For example, languages generally seem to have a way of nesting one clause inside another, as in The fence that went around the house that was blue was red.  How would this arise?  In order to understand such a sentence, we need some way of setting aside The fence while we deal with that went around the house that was blue and then connecting was red with it in order to understand that the fence is red and the house is blue.  To a compugeek, this means something like a stack, a data structure for storing and retrieving things such that the last thing stored is the first thing retrieved.

Cognitively, handling such a sentence is like veering of a path on some side trip and returning to pick up where you left off, or setting aside a task to handle some interruption and then returning to the original task.  Neither of these abilities is anywhere near unique to humans, so they must older than humanity, even though we are the only animals that we know of that seem to use them in communication.

These cognitive abilities are also completely separate from a large number of individual adaptations of our vocal apparatus, which do seem to be unique to us, notably fine control of breathing and of the position of the tongue and shape of the mouth.  While these adaptations are essential to our being able to speak as fluently as we do, they don't have anything to do with what kinds of sentences we can express, just how well we can do so using spoken words.  Sign languages get along perfectly well without them.

In other words, it's quite possible we were able to conceive of structures like "I saw that the lion that killed the wildebeest went around behind that hill over there" without being able to put them into words, and that ability only came along later.  There's certainly no shortage, even in modern humans, of things that are easy to think but hard to express (I'd give a few examples, but ...).  The question here, then, is not "How did we develop the ability to think in nested clauses?" but "How did we come to use the grammatical structures we now see in languages to communicate such thoughts?"

There's a lot to evolution, and it has to be right up there with quantum mechanics as far as scientific theories that are easy to oversimplify, draw unwarranted conclusions from or get outright wrong, so this next bit is even less precise than what I've already said.  For example, I'm completely skirting around major issues of population genetics -- how a gene spreads, or doesn't, in a population, whether it's useful or not.

Let's try to consider vocabulary in an evolutionary context.  I pick vocabulary to start with because it's clearly distinct from grammar.  Indeed one of the useful features of a grammar is that you can plug an arbitrary set of words into it.  Conversely, one requirement for developing language as we know it is the ability to learn and use a large and expandable vocabulary.  Without that, and regardless of the grammatical apparatus, we do not account for the way people actually use language.

Suppose some animal has the ability to make one kind of call when a it spots particular predator and a different call for another predator, in such a way that is conspecifics (animals of the same species) can understand and react appropriately.  That's two calls (three if you count not making any call) and it's easy to see how that could be useful in not getting eaten.  Again, this is far from unique to us (see here, and search for "vervets", for example).

Now suppose some particular animal is born with the ability to make a third call for some other hazard, say a large branch falling (this is more than a bit contrived, but bear with me).  A large branch falls, the animal cries out ... and no one does anything.  The ability to make new calls isn't particularly useful without the ability to understand new calls.  But suppose that nobody did anything because they didn't know what the new call meant, but they were able to connect "that oddball over there made a funny noise" with "a big branch fell".  The next time a big branch falls and our three-call-making friend cries out, everyone looks out and scatters to safety.  Progress.

I'm more than a bit skeptical that the ability to make three calls rather than two would arise by a lucky mutation, but I think there are still two valid points here:

First, the ability to comprehend probably runs ahead of the ability to express, and certainly new ways to express are much less likely to catch on if no one understands what they mean.  Moreover, comprehension is useful in and of itself.  Whether or not my species is able to make calls that signal specific scenarios, being able to understand other species' calls is very useful, as is the ability to match up new calls with their meanings from context and examples.

In other words, the ability to understand a large vocabulary is liable to develop even without the ability to express a large vocabulary.  For a real-life example, at least some domestic dogs can understand many more human words than (as far as anyone can tell) they can produce distinct barks and similar sounds, and certainly more human words than they can themselves produce.

Second, this appears to be a very common pattern in evolution.  Abilities that are useful in one context (distinguishing the different calls of animals around you) become useful in other contexts (developing a system of specialized calls within your own species).  The general pattern is known as exaptation (or cooption, or formerly and more confusingly as pre-adaptation).

Let's suppose that the local population of some species can potentially understand, say, dozens of distinct calls (whether their own or those of other species), but its ability to produce distinct calls is limited.  If some individual comes along with the gift of being able to produce more distinct calls, then that will probably increase that individual's chances of surviving -- because its conspecifics will learn the new calls and so increase everyone's chance of survival -- and at least potentially its chances of reproducing, if only because there will be more potential mates around if fewer of them get eaten. 

If that particular individual fails to survive and reproduce, the conditions are still good for some other individual to come along with the ability to produce a bigger vocabulary, perhaps through some entirely different mechanism.  This is important, because if there is more than one way to develop an ability, there can potentially be more ways to inherit it once it is established (I'm pretty sure, but I don't know if an actual biologist would agree).

If the community as a whole develops the tendency to find larger vocabularies attractive, so much the better, though the math starts to get hairy at this point.  Sexual selection is a pretty good way of driving traits to extremes -- think peacocks and male walruses -- so it's quite plausible that a species that starts to develop larger and larger vocabularies of calls could take this quite far, past the point of immediate usefulness.  You then have a population with a large vocabulary ready for an environment where it makes more of a difference.

In short, even some ability to produce distinct calls for different situations is useful, and it's no surprise many animals have it.  The ability to produce a large and expandable variety of distinct calls for different situations also looks useful, but also seems harder to evolve, considering that it's fairly rare.  Taking this a step further, we appear to be unique in our ability to produce and distinguish thousands of distinct vocabulary items, though as always there's quite a bit we still don't know about communication in other species.

It's clear that other animals can distinguish, and in some cases produce, non-trivial vocabularies, even if it's not particularly common.  How do you get from there to our as-far-as-we-know-unique abilities?  I think the answer is "a piece at a time".

In order to find a (very hypothetical) evolutionary pathway from an extensible collection of specialized calls to what we call language today, we want to find a series of small steps that each add something useful to what's already there without requiring major restructuring.  Some of those, in no strict order except where logically necessary, might be:
  • The ability to refer to a class of things without reference to a particular instance
This is one aspect of what one might call "abstract concepts".  As such, it doesn't require any new linguistic machinery beyond the ability to make and distinguish a large set of calls (which I'll call words from here on out), but it does require a cognitive shift.  The speaker has to be able to think of, say, wolf as a class of things rather than a particular wolf trying to sneak up.  The listener has to realize that someone saying "wolf" may not be referring to a wolf currently sneaking up on them. Instead, if the speaker is pointing to a set of tracks it might mean "a wolf went here", or if pointing in a particular direction, maybe "wolves come from over there".

This may seem completely natural to us, but it's not clear who, if anyone else besides us, can do this.   Lots of animals can distinguish different types of things, but being able to classify is different from being aware that classes exist.  An apple-sorting machine can sort big from small without understanding "big" or "small".  I say "it's not clear" because devising an experiment to tell if something does or doesn't understand some aspect of abstraction is difficult, in no small part because there's a lot of room for interpretation of the results.
  • The ability to designate a quality such as "big" or "red" without reference to any particular thing with that quality.
This is similar to the previous item, but for adjectives rather than nouns.  From a language standpoint it's important because it implies that you can mix and match qualities and things (adjectives and nouns).  A tree can be big, a wolf can be big and a wolf can be gray without needing a separate notion of "big tree", "big wolf" and "gray wolf".  An adjective is a predicate that applies to something rather than standing alone as a noun does.

As I understand it, the widely-recognized stages of language development in humans are babbling, single words, two-word sentences and "all hell breaks loose".  A brain that can handle nouns and predicates is ready for two-word sentences consisting of a predicate and something it applies to.  This is a very significant step in communication and it appears to be quite rare, but linguistically it's nearly trivial.  A grammar to describe it has one rule and no recursion (rules that refer, directly or indirectly, to themselves).

As a practical matter, producing a two-word sentence means signifying a predicate and an object that it applies to (called an argument).  Understanding it means understanding the predicate, understanding the argument and, crucially, understanding that the predicate applies to the argument.  If you can distinguish predicates from objects, order doesn't even matter.  "Big wolf!" is just as good as "Wolf big!" or even a panicked sequence of "Wolf wolf big wolf big big wolf!" (which, to be fair, would require recursion to describe in a phrase-structure grammar).

From a functional point of view, the limiting factor to communicating such concepts is not grammar but the ability to form and understand the concepts in the first place.

Where do we go from predicate/argument sentences to something resembling what we now call language?  Some possible next steps might be
  • Predicates with more than one argument.
The important part here is that you need a way to distinguish the arguments.  In wolf big, you know that big is the predicate and wolf is the argument and that's all you need, but in see rabbit wolf, where see is the predicate and rabbit and wolf are arguments, how do we tell if the wolf sees the rabbit or the rabbit sees the wolf?  There are two solutions, given that you're limited to putting words together in some particular order

Either the order of words matters, so see rabbit wolf means one thing and see wolf rabbit means the other, or there's a way of marking words according to what role they play, so for example see wolf-at rabbit means the rabbit sees the wolf and see wolf rabbit-at means the wolf sees the rabbit.  There are lots of possible variations, and the two approaches can be combined.  Actual languages do both, in a wide variety of ways.

From a linguistic point of view, word order and inflection (ways of marking words) are the elements of syntax, which (roughly speaking) provides structure on top of a raw stream of words.  Languages apply syntax in a number of ways, allowing us to put together complex sentences such as this one, but you need the same basic tools even for simple three-word sentences.  Turning that around, if you can solve the problem of distinguishing the meaning of a predicate and two arguments, you have a significant portion of the machinery needed for more complex sentences.
  • Pronouns, that is, a way to designate a placeholder for something without saying exactly what that something is, and connect it with a specific meaning separately.
Cognitively, pronouns imply some form of memory beyond the scope of a simple sentence. Linguistically, their key property is that their meaning can be redefined on the fly.  A noun like wolf might refer to different specific wolves at different times, but it will always refer to some wolf.  A pronoun like it is much less restrained.  It could refer to any noun, depending on context.

Pronouns allow for more compact sentences, which is useful in itself since you don't have to repeat some long descriptive phrase every time you want to say something new about, say, the big red house across the street with the oak tree in the yard.  You can just say that house or just it if the context is clear enough.

More than this, though, by equating two things in separate sentences they allow linear sequences of words to describe non-linear structures, for example I see a wolf and it sees me.  By contrast, in I see a wolf and a wolf sees me it's not clear whether it's the same wolf and we don't necessarily have the circular structure of two things seeing each other.
  • The ability to stack up arbitrarily many predicates: big dogbig red dogbig red hairy dog, etc.
I left this for last because it leads into a bit of a rabbit hole concerning the role of nesting and recursion in language.  A common analysis of phrases like big red hairy dog uses a recursive set of rules like

a noun phrase can be a noun by itself, or
a noun phrase can be an adjective followed by a noun phrase

This is much simpler than a full definition of noun phrase, and it's not the only way to analyze noun phrases, but it shows the recursive pattern that's generally used in such an analysis.  The second definition of noun phrase refers to noun phrase recursively.  The noun phrase on the right-hand side will be smaller, since it has one less adjective, so there's no infinite regress.  The example, big red hairy dog breaks down to big modifying red hairy dog, which breaks down to red modifying hairy dog, which breaks down to hairy modifying dog, and dog is a noun phrase by itself.  In all there are four noun phrases, one by the first rule and three by the second.

On the other hand, if you can conceive of a dog being big, red and hairy at the same time, you can just as well express this with two-word sentences and a pronoun:  dog big. it red. it hairy.  The same construction could even make sense without the pronouns: dog big. red. hairy.  Here a listener might naturally assume that red and hairy have to apply to something, and the last thing we were talking about was a dog, so the dog must be red and hairy as well as big.

This is not particularly different from someone saying I saw the movie about the duck.  Didn't like it, where the second sentence clearly means I didn't like it and you could even just say Didn't like and still be clearly understood, even if Didn't like by itself sounds a bit odd.

From a grammatical standpoint (at least for a constituency grammar) these all seem quite different.  In big red hairy dog, there's presumed to be a nested structure of noun phrases.  In dog big.  it red. it hairy you have three sentences with a simple noun-verb structure and in dog big. red. hairy. you have one two-word sentence and two fragments that aren't even sentences.

However, from the point of view of "I have some notion of predicates and arguments, and multiple predicates can apply to the same argument, now how do I put that in words?", they seem pretty similar.  In all three cases you say the argument and the predicates that apply to it and the listener understands that the predicates apply to the argument because that's what predicates do.

I started this post with the idea of exploring how language as we now know it could develop from simpler pieces such as those we can see in other animals.  The title is a nod to the question of What good is half an eye? regarding the evolution of complex eyes such as we see in several lineages, including our own and (in a different form) in cephalopods.  In that case, it turns out that there are several intermediate forms which provide an advantage even though they're not what we would call fully-formed eyes, and it's not hard to trace a plausible pathway from basic light-sensitive "eye spots" to what we and many other animals have.

The case of language seems similar.  I think the key points are
  • Cognition is crucial.  You can't express what you can't conceive of.
  • The ability to understand almost certainly runs ahead of the ability to express.
  • There are plausibly a number intermediate stages between simple calls and complex language (again, I don't claim to have identified the actual steps precisely or completely).
  • Full grammar, in the sense of nested structures described by recursive rules, may not be a particularly crucial step.
  • A purely grammatical analysis may even obscure the picture, both by failing to make distinctions (as with the jump from "this wolf right there" to "wolf") and by drawing distinctions that aren't particularly relevant (as with the various forms of big red hairy dog).

Friday, January 10, 2020

Is the piano a percussion instrument?

Well, is the piano a percussion instrument?

This is one of those questions that can easily devolve into "Well technically" ... "Oh yeah, well actually" and so forth.  I'm not aware of an official designator of instrument categories, but more to the point I'm not interested in a right or wrong answer here.  I'm interested in why the question should be tricky in the first place.

The answer I learned from high school orchestra or thereabouts was "Yes, it's a percussion instrument, because the strings are hit by hammers."  The answer I personally find more convincing is "No, because it's a piano, duh."

OK, maybe that's not particularly convincing.  Maybe a better way to phrase it would be "No, it's a keyboard instrument.  Keyboard instruments are their own class, separate from strings, woodwinds, brass and percussion."  By this reasoning, the pipe organ is a keyboard instrument, not a wind instrument, the harpsichord is a keyboard instrument, not a string instrument, and a synthesizer is a keyboard instrument, assuming it has a keyboard (not all do).

The intuition behind this is that being played by way of a keyboard is more relevant than the exact method for producing the sounds.  Even though a marimba, xylophone, vibraphone or glockenspiel has an arrangement of things to hit that looks a lot like a keyboard, the fact that you're limited to mallets in two hands has a big effect on what you can play.  Likewise, a harpsichord and a guitar or banjo produce somewhat similar sounds, but fretting a one or more of a few strings is different from pressing one or more of dozens of keys.

It's a lot easier to play a four-part fugue on a harpsichord than a marimba, and a seven-note chord is going to present real problems on a five-string banjo.  Different means of playing make different things easy and hard, and that affects what actually gets played.

At this point, I could put forth a thesis that how you play an instrument is more important in classifying it than how the sounds are ultimately produced and be done with it, but that's not what got me typing in the first place.  To be clear, I like the thesis.  It's easier to play a saxophone if you know how to play a clarinet, easier to play a viola or even a guitar if you can play violin, and so forth.  What got me thinking, though, was the idea of how any classification on the order of string/woodwind/brass/percussion or keyboard/bow/plectrum/mallet/etc. tends to break down on contact with real objects to classify.

For example, there are lots of ways to produce sound from a violin.  There are several different "ordinary" ways to bow, but you can also bounce the wooden part of the bow on the strings, or pluck the strings (with either hand).  Independently of what you do with the bow, you can put a mute on the bridge to get a kind of ethereal, spooky sound.  You can rest a finger lightly on the string to get a "harmonic" with a purer tone (and generally higher pitch) than if you pressed the string to the fingerboard.  Beyond all that, you can tap on the body of the violin with your finger, or the stick of the bow or the end of the bow.  You could even tap the violin on something else, or use its strings as a bow for another instrument.

Does tapping on a violin make it a percussion instrument?  I'd say it is when you're tapping on it, otherwise not.  But if you ask, "Is the violin a percussion instrument," I'd say "no" (or, if I'm feeling cagy, "not normally").

How about an electric guitar?  Obviously, it's a string instrument, except there's more to playing an electric guitar than picking and fretting the strings.  The effects and the amp make a big difference.  It's probably best to think of electric guitar plus amp and effects as both a string instrument and an electronic instrument, both in its construction and in how you play it.  The guitar, amp and effects together are one instrument -- that's certainly how guitarists tend to see it, and they can spend quite a bit of time telling you the details of their rigs.

There are plenty of other examples to pick from -- a morsing, a glass harp, a musical saw, a theremin ... if you had to pick, you could probably call a morsing or even a glass harp a percussion instrument -- I mean, if a piano is, why not?  A musical saw would be, um, a string instrument?  A theremin would be ... I don't know, let's say brass because there are metal parts?

But why pick?  Clearly the four sections of an orchestra work fine for the instruments they were originally intended to classify, and they provide useful information in that context.  If you're putting together an orchestra, you can expect a percussionist to handle the bass drum, snare drum and tympani but not a trumpet, oboe or cello.  If you're composing for orchestra, you should know that wind players need to breathe and that a string instrument can play more than one note at a time, but only within fairly strict limits.  In neither case do you really care that someone might consider a piano a percussion instrument.  For the purposes of hiring players and composing music, a piano is a keyboard instrument.

If your purpose is to classify instruments by common properties, there are much better systems.  Wikipedia likes the Hornbostel Sachs classification, which takes into account what produces the sound, how the sound is produced, the general form of the instrument and other factors.  For my money, it does a pretty good job of putting similar instruments together while making meaningful distinctions among them.  For example (based on this 2011 revision of the classification):
  • violin 321.322-71 (Box lute sounded by a bow)
  • cello 321.322-71 (Same)
  • guitar 321.322-5 or -6 (Box lute sounded by bare fingers(5) or plectrum(6))
  • French horn 423.232.12 (Valved horn with narrow bore and long air column)
  • oboe 422.112-71 (Reedpipe with double reeds and conical bore, with keys)
  • bass drum: 211.212.12 (Individual double-skin cylindrical drums, both heads played)
  • piano 314.122-4-8 (Box zither sounded by hammers, with keyboard)
  • harpsichord  314.122-6-8 (Box zither sounded by plectrum, with keyboard)
  • morsing 121.2 (plucked idiophone with frame, using mouth cavity as resonator)
  • glass harp 133.2 (set of friction idiophones)
  • musical saw 151  (metal sheet played by friction)
  • theremin 531.1 (Analogue synthesizers and other electronic instruments with electronic valve/vacuum tube based devices generating and/or processing electric sound signals)
There's certainly room for discussion here.  Playing a cello is significantly different from playing a violin -- the notes are much farther apart on the longer strings, the cello is held vertical, making the bowing much different, and as a consequence of both, the bow is much bigger and held differently.  Clearly the analogue synthesizer section could stand to be a bit more detailed, and there's clearly some latitude within these (Wikipedia has a musical saw as 132.22 (idiophone with direct friction).

It's also interesting that a guitar is counted as a slightly different instrument depending on whether it's played with bare fingers or a plectrum, but that fits pretty well with common usage.  Fingerpicking and flatpicking require noticeably different skills and many guitarists specialize in one or the other.  The only sticking point is that a lot of fingerstyle guitarists use fingerpicks, at least when playing a steel-string acoustic ...

Nonetheless, I'd still say Hornbostel-Sachs does a decent job of classifying musical instruments.  Given the classification number, you have a pretty good idea of what form the instrument might take, who might be able to play it and, in many if not all cases, how it might sound.  There are even provisions for compound instruments like electric guitar plus effects, though I don't know how well-developed or effective those are.

The string/woodwind/brass/percussion system also provides a decent idea of form, sound and who might play, within the context of a classical orchestra, but if you're familiar with the classical orchestra you should already know what a french horn or oboe sounds like.

Which leads back to the underlying question of purpose.  Classification systems, by nature, are systems that we impose on the world for our own purposes.  A wide-ranging and detailed system like Hornbostel-Sachs is meant to be useful to people studying musical instruments in general, for example to compare instrumentation in folk music across the world's cultures.

There are a lot more local variations of the bass drum or box lute family than theremin variants -- or even musical saw variants -- so even if we knew nothing else we might have an objective reason to think that drums and box lutes are older, and we might use the number of varieties in particular places to guess where an instrument originated (places of origin, in general, tend to have more variants).  Or there might be an unexpected correlation between latitude and the prevalence of this or that kind of instrument, and so forth.  Having a detailed classification system based on objective properties allows researchers to explore questions like this in a reasonably rigorous way.

The classification of instruments in the orchestra is more useful in the day-to-day running of an orchestra ("string section will rehearse tomorrow, full orchestra on Wednesday") and in writing classical music.  Smaller ensembles, for example, tend to fall within a particular section (string quartet, brass quintet) or provide a cross-section in order to provide a variety of timbral possibilities (the Brandenburg concertos use a harpsichord and a string section with various combinations of brass and woodwinds -- strictly speaking the harpsichord can be replaced by other instruments when it's acting as a basso continuo).

Both systems are useful for their own purposes, neither covers every possible instrument completely and unambiguously (though Hornbostel-Sachs comes fairly close) and neither is inherently "correct".   As far as I can tell, this is all true of any interesting classification system, and probably most uninteresting ones as well.

No one seems to care much whether a pipe organ or harpsichord is a percussion instrument.   I'm not sure why.  Both have been used in orchestral works together with the usual string/woodwind/brass/percussion sections.

Tuesday, October 29, 2019

More on context, tool use and such

In the previous post I claimed that (to paraphrase myself fairly loosely) whether we consider behaviors that technically look like "learning", "planning", "tool use" or such to really be those things has a lot to do with context.  A specially designed robot that can turn a door handle and open the door is different from something that sees a door handle slightly out of reach, sees a stick on the ground, bends the end of the stick so it can grab the door handle and proceeds to open the door by using the stick to turn the handle and then to poke the door open.  In both cases a tool is being used to open a door, but we have a much easier time calling the second case "tool use".  The robot door-opener is unlikely to exhibit tool use in the second case.

With that in mind, it's interesting that the team that produced the hide-and-seek AI demo is busily at work on using their engine to play a Massively Multiplayer Online video game.  They argue at length, and persuasively, that this is a much harder problem than chess or go.  While the classic board games may seem harder to the average person than mere video games, from a computing perspective MMOs are night-and-day harder in pretty much every dimension:
  • You need much more information to describe the state of the game at any particular point (the state space is much larger).  A chess or go position can be described in well under 100 bytes.  To describe everything that's going on at a given moment in an MMO takes more like 100,000 bytes (about 20,000 "mostly floating point" numbers)
  • There are many more choices at any given point (the action space is much larger).  A typical chess position has a few dozen possible moves.  A typical go position may have a couple hundred.  In a typical MMO, a player may have around a thousand possible actions at a particular point, out of a total repertoire of more than 10,000.
  • There are many more decisions to make, in this case running at 30 frames per second for around 45 minutes, or around 80,000 "ticks" in all.  The AI only observes every fourth tick, so it "only" has to deal with 20,000 decision points.  At any given point, an action might be trivial or might be very important strategically.  Chess games are typically a few dozen moves long.  A go game generally takes fewer than 200 (though the longest possible go game is considerably longer).  While some moves are more important than others in board games, each requires a similar amount and type of calculation.
  • Players have complete information about the state of a chess or go game.  In MMOs, players can only see a small part of the overall universe.  Figuring out what an unseen opponent is up to and otherwise making inferences from incomplete data is a key part of the game.
Considered as a context, an MMO is, more or less by design, much more like the kind of environment that we have to plan, learn and use tools in every day.  Chess and go, by contrast, are highly abstract, limited worlds.  As a consequence, it's much easier to say that something that looks like it's planning and using tools in an MMO really is planning and using tools in a meaningful sense.

It doesn't mean that the AI is doing so the same way we do, or at least may think we do, but that's for a different post.

Tool use, planning and AI

A recent story in MIT Technology Review carries the headline AI learned to use tools after nearly 500 million games of hide and seek, and the subhead OpenAI’s agents evolved to exhibit complex behaviors, suggesting a promising approach for developing more sophisticated artificial intelligence.  This article, along with several others, is based on a blog post on OpenAI's site.  While the article is a good summary of the blog post, the blog post is just as readable while going into somewhat more depth and technical detail.  Both the article and the blog post are well worth reading, but as always the original source should take precedence.

There is, as they say, quite a bit to unpack here, and before I'm done this may well turn into another Topic That Ate My Blog.  At the moment, I'm interested in two questions:
  • What does this work say about learning and intelligence in general?
  • To what extent or in what sense do terms like "tool use" and "planning" describe what's going on here?
My answers to both questions changed significantly between reading the summary article and reading the original blog post.

As always, lurking behind stories like this are questions of definition, in particular, what do we mean by "learning", "planning" and "tool use"?  There have been many, many attempts to pin these down, but I think for the most part definitions fall into two main categories, which I'll call internal and external here.  Each has its advantages and drawbacks.

By internal definition I mean an attempt to formalize the sort of "I know it when I do it" kind of feeling that a word like learning might trigger.  If I learn something, I had some level of knowledge before, even if that level was zero, and after learning I could rattle off a new fact or demonstrate a new skill.  I can say "today I learned that Madagascar is larger than Iceland" or "today I learned how to bake a soufflé".

If I talk about planning, I can say "here's my plan for world domination" (like I'd actually tell you about the robot army assembling itself at ... I've said too much) or "here's my plan for cleaning the house".  If I'm using a tool, I can say "I'm going to tighten up this drawer handle with a Philips screwdriver", and so forth.  The common thread is here is a conscious understanding of something particular going on -- something learned, a plan, a tool used for a specific purpose.

This all probably seems like common sense, and I'd say it is.  Unfortunately, common sense is not that helpful when digging into the foundations of cognition, or, perhaps, of anything else interesting.  We don't currently know how to ask a non-human animal to explain its thinking.  Neither do we have a particularly good handle on how a trained neural network is arriving at the result it does.  There may well be something encoded in the networks that control the hiders and seekers in the simulation, which we could point at and call "intent", but my understanding is we don't currently have a well-developed method for finding such things (though there has been progress).

If we can't ask what an experimental subject is thinking, then we're left with externally visible behavior.  We define learning and such in terms of patterns of behavior.  For example, if we define success at a task by some numerical measure, say winning percentage at hide and seek, we can say that learning is happening when behavior changes and the winning percentage increases in a way that can't be attributed to chance (in the hide-and-seek simulation, the percentage would tilt one way or another as each side learned new strategy, but this doesn't change the basic argument).

This turns learning into a pure numerical optimization problem: find the weights on the neurons that produce the best winning percentage.  Neural-network training algorithms are literally doing just such an optimization.  Networks in the training phase are certainly learning, by definition, but certainly not in the sense that we learn by studying a text or going to a lecture.  I suspect that most machine learning researchers are fine with that, and might also argue that studying and lectures are not a large part of how we learn overall, just the part we're most conscious of as learning per se.

This tension between our common understanding of learning and the workings of things that can certainly appear to be learning goes right to why an external definition (more or less what we call an operational definition) can feel so unsatisfying.  Sure, the networks look like they're learning, but how do we know they're really learning?

The simplest answer to that is that we don't.  If we define learning as optimizing a numerical value, then pretty much anything that does that is learning.  If we define learning as "doing things that look to us like learning", then what matters is the task, not the mechanism.  Learning to play flawless tic-tac-toe might be explained away as "just optimizing a network" while learning to use a ramp to peer over the wall of a fort built by a group of hiders sure looks an awful lot like the kind of learning we do -- even though the underlying mechanism is essentially the same.

I think the same reasoning applies to tool use: Whether we call it tool use or not depends on how complex the behavior appears to be, not on the simple use of an object to perform a task.  I remember reading about primates using a stick to dig termites as tool use and thinking "yeah, but not really".  But why not, exactly?  A fireplace poker is a tool.  A barge pole is a tool.  Why not a termite stick?  The only difference, really, is the context in which they are used.  Tending a fire or guiding a barge happen in the midst of several other tools and actions with them, however simple in the case of a fireplace and andirons.  It's probably this sense of the tool use being part of a larger, orchestrated context that makes our tool use seem different.  By that logic, tool use is really just a proxy for being able to understand larger, multi-part systems.

In my view this all reinforces the point that "planning", "tool use" and such are not binary concepts.  There's no one point at which something goes from "not using tools" to "using tools", or if there is, the dividing line has to be fairly arbitrary and therefore not particularly useful.  If "planning" and "tool use" are proxies for "behaving like us in contexts where we consider ourselves to be planning and using tools", then what matters is the behavior and the context.  In the case at hand, our hiders and seekers are behaving a lot like we would, and doing it in a context that we would certainly say requires planning and intelligence.

As far as internal and external definitions, it seems we're looking for contexts where our internal notions seem to apply well.  In such contexts we have much less trouble saying that behavior that fits an external definition of "tool use", "planning", "learning" or whatever is compatible with those notions.