Saturday, July 22, 2017

Yep. Tron.

It was winter when I started writing this, but writing posts about physics is hard, at least if you're not a physicist.  This one was particularly hard because I had to re-learn what I thought I knew about the topic, and then realize that I'd never really understood it as well as I'd thought, then try to learn it correctly, then realize that I also needed to re-learn some of the prerequisites, which led to a whole other post ... but just for the sake of illustration, let's pretend it's still winter.

If you live near a modest-sized pond or lake, you might (depending on the weather) see it freeze over at night and thaw during the day.  Thermodynamically this can be described in terms of energy (specifically heat) and entropy.  At night, the water is giving off heat into the surrounding environment and losing entropy (while its temperature stays right at freezing).  The surrounding environment is taking on heat and gaining entropy.  The surroundings gain at least as much entropy as the pond loses, and ultimately the Earth will radiate just that bit more heat into space.  When you do all the accounting, the entropy of the universe increases by just a tiny bit, relatively speaking.

During the day, the process reverses.  The water takes on heat and gains entropy (while its temperature still stays right at freezing).  The surroundings give off heat, which ultimately came from the sun, and lose entropy.  The water gains at least as much entropy as the surroundings lose, and again the entropy of the universe goes up by just that little, tiny bit, relatively speaking.

So what is this entropy of which we speak?  Originally entropy was defined in terms of heat and temperature.  One of the major achievements of modern physics was to reformulate entropy in a more powerful and elegant form, revealing deep and interesting connections, thereby leading to both enlightenment and confusion.  The connections were deep enough that Claude Shannon, in his founding work on information theory, defined a similar concept with the same name, leading to even more enlightenment and confusion.

The original thermodynamic definition relies on the distinction between heat and temperature.  Temperature, at least in the situations we'll be discussing here, is a measure of how energetic individual particles -- typically atoms or molecules -- are on average.  Heat is a form of energy, independent of how many particles are involved.

The air in an oven heated to 500K (that is, 500 Kelvin, about 227 degrees Celsius or 440 degrees Fahrenheit) and a pot full of oil at 500K are, of course, at the same temperature, but you can safely put your hand in the oven for a bit.  The oil, not so much.  Why?  Mainly because there's a lot more heat in the oil than in the air.  By definition the molecules in the oven air are just as energetic, on average, as a the molecules the oil, but there are a lot more molecules of oil, and therefore a lot more energy, which is to say heat.

At least, that's the quick explanation for purposes of illustration.  Going into the real details doesn't change the basic point: heat is different from temperature and changing the temperature of something requires transferring energy (heat) to or from it.  As in the case of the pond freezing and melting, there are also cases where you can transfer heat to or from something without changing its temperature.  This will be important in what follows.

Entropy was originally defined as part of understanding the Carnot cycle, which describes the ideal heat-driven engine (the efficiency of a real engine is usually given as a percentage of what the Carnot cycle would produce, not as a percentage of the energy it uses).  Among the principal results in classical thermodynamics is that the Carnot cycle was as good as you can get even in principle, but not even it can ever be perfectly efficient, even in principle.

At this point it might be helpful to read that earlier post on energy, if you haven't already.  Particularly relevant parts here are that the state of the working fluid in a heat engine, such as the steam in a steam engine, can be described with two parameters, or, equivalently, as a point in a two-dimensional diagram, and that the cycle an engine goes through can be described by a path in that two-dimensional space.

Also keep in mind the ideal gas law: In an ideal gas, the temperature of a given amount of gas is proportional to pressure times volume.  If you've ever noticed a bicycle pump heat up as you pump up a tire, that's (more or less) why.  You're compressing air, that is, decreasing its volume, so (unless the pump is able to spill heat with perfect efficiency, which it isn't) the temperature has to go up.  For the same reason the air coming out of a can of compressed air is dangerously cold.  The air is expanding rapidly so the temperature drops sharply.

In the Carnot cycle you first supply heat a to gas (the "working fluid") while maintaining a perfectly constant temperature by expanding the container it's in.  You're heating the gas in the engine, in the sense of supplying heat, but not in the sense of raising its temperature.  Again, heat and temperature are two different things.

To continue the Carnot cycle, let the container keep expanding, but now in such a way that it neither gains nor loses heat (in technical terms, adiabatically).  In these first two steps, you're getting work out of the engine (for example, by connecting a rod to the moving part of a piston and attaching the other part of that rod to a wheel).  The gas is losing energy since it's doing work on the piston, and it's also expanding, so the temperature and pressure are both dropping, but no heat is leaving the container in the adiabatic step.

Work is force times distance, and force in this case is pressure times the area of the surface that's moving.    Since the pressure, and therefore the force, is dropping during the second step you'll need to use calculus to figure out the exact amount of work, but people know how to do that.

The last two steps of the cycle reverse the first two.  In step three you compress the gas, for example by changing the direction the piston is moving, while keeping the temperature the same.  This means the gas is cooling in the sense of giving off heat, but not in the sense of dropping in temperature.  Finally, in step four, compress the gas further, without letting it give off heat.  This raises the temperature.  The piston is doing work on the gas and the volume is decreasing.  In a perfect Carnot cycle the gas ends up in the same state -- same pressure, temperature and volume -- as it began and you can start it all over.

As mentioned in the previous post, you end up putting more heat in at the start then you end up getting back in the third step, and you end up getting more work out in the first two steps than you put in in the last two (because the pressure is higher in the first two steps).  Heat gets converted to work (or if you run the whole thing backwards, you end up with a refrigerator).

If you plot the Carnot cycle on a diagram of pressure versus volume, or the other two combinations of pressure, volume and temperature, you get a a shape with at least two curved sides, and it's hard to tell whether you could do better.  Carnot proved that this cycle is the best you can do, in terms of how much work you can get out of a given amount of heat, by choosing two parameters that make the cycle into a rectangle.  One is temperature -- steps one and three maintain a constant temperature.

The other needs to make the other two steps straight lines.  To make this work out, the second quantity has to remain constant while the temperature is changing, and change when temperature is constant.  The solution is to define a quantity -- call it entropy -- that changes, when temperature is constant, by the amount of heat transferred, divided by that temperature (ΔS = ΔQ/T -- the deltas (Δ) say that we're relating changes in heat and entropy, not absolute quantities).  When there's no heat transferred, entropy doesn't change.  In step one, temperature is constant and entropy increases.  In step two, temperature decreases while entropy remains constant, and so forth.

To be clear, entropy and temperature can, in general, both change at the same time.  The Carnot cycle is a special case where only one changes at a time.  For example, if you heat a gas at constant volume, then pressure, temperature and entropy all go up.

Knowing the definition of entropy, you can convert, say, a pressure/volume diagram to a temperature/entropy diagram and back.  In most cases the temperature/entropy version won't show vertical and horizontal lines -- that is, in most cases both change at the same time.  The Carnot cycle is exactly the case where the lines are perfectly horizontal and vertical.

This definition of entropy in terms of heat and temperature says nothing at all about what's going on in the gas, but it's enough, along with some math I won't go into here (but which depends on the cycle being a rectangle), to prove Carnot's result: The portion of heat wasted in a Carnot cycle is the ratio of the cold temperature to the hot temperature (on an absolute temperature scale).  You can only have zero loss -- 100% efficiency -- if the cold temperature is absolute zero.  Which it won't be.

Any cycle that deviates from a perfect rectangle will be less efficient yet.  In real life this is inevitable.  You can come pretty close on all the steps, but not perfectly close.  In real life you don't have an ideal gas, you can't magically switch from being able to put heat into the gas to perfectly insulating it, you won't be able to transfer all the heat from your heat source to the gas, you won't be able to capture all the heat from the third step of the cycle to reuse in the first step of the next cycle, some of the energy of the moving piston will be lost to friction (that is, dissipated into the surroundings as heat) and so on.

The problem-solving that goes into minimizing inefficiencies in real engines is why engineering came to be called engineering and why the hallmark of engineering is getting usefulness out of imperfection.



There are other cases where heat is transferred at a constant temperature, and we can define entropy in the same way as for a gas.  For example, temperature doesn't change during a phase change such as melting or freezing.  As our pond melts and freezes, the temperature stays right at freezing until the pond completely freezes, at which point it can get cooler, or melts entirely, at which point it can get warmer.

If all you know is that some water is at the freezing point, you can't say how much heat it will take to raise the temperature above freezing without knowing how much of it is frozen and how much is liquid.  The concept of entropy is perfectly valid here -- it relates directly to how much of the pond is liquid -- and we can define "entropy of fusion" to account for phase transitions.

There are plenty of other cases that don't look quite so much like the ideal gas case but still involve changes of entropy.  Mixing two substances increases overall entropy.  Entropy is a determining factor in whether a chemical reaction will go forward or backward and in ice melting when you throw salt on it.


Before I go any further about thermodynamic entropy, let me throw in that Claude Shannon's definition of entropy in information theory is, informally, a measure of the number of distinct messages that could have been transmitted in a particular situation.  On the other blog, for example, I've ranted about bits of entropy for passwords.  This is exactly a measure of how many possible passwords there are in a given scheme for picking passwords.

What in the world does this have to do with transferring heat at a constant temperature?  Good question.

Just as the concept of energy underwent several shifts in understanding on the way to its current formulation, so did entropy.  The first major shift came with the development of statistical mechanics.  Here "mechanics" refers to the behavior of physical objects, and "statistical" means you've got enough of them that you're only concerned about their overall behavior.

Statistical mechanics models an ideal gas as a collection of particles bouncing around in a container.  You can think of this as a bunch of tiny balls bouncing around in a box, but there's a key difference from what you might expect from that image.  In an ideal gas, all the collisions are perfectly elastic, meaning that the energy of motion (called kinetic energy) remains the same before and after.  In a real box full of balls, the kinetic energy of the  balls gets converted to heat and sooner or later the balls stop bouncing.

But the whole point of the statistical view of thermodynamics is that heat is just the kinetic energy of the particles the system is made up of.  When actual bouncing balls lose energy to heat, that means that the kinetic energy of the large-scale motion of the balls themselves is getting converted into kinetic energy of the small-scale motion of the particles the balls are made of, and of the air in the box, and of the walls of the box, and eventually the surroundings.  That is, the large scale motion we can see is getting converted into a lot of small-scale motion that we can't, which we call heat.

When two particles, say two oxygen molecules, bounce off each other, the kinetic energy of the moving particles just gets converted into kinetic energy of differently moving-particles, and that's it.  In the original formulation of statistical mechanics, there's simply no other place for that energy to go, no smaller-scale moving parts to transfer energy to (assuming there's no chemical reaction between the two -- if you prefer, put pure helium in the box).

When a particle bounces off the wall of the container, it imparts a small impulse -- an instantaneous force -- to the walls.  When a whole lot of particles continually bounce off the walls of a container, those instantaneous forces add up to a continuous force (for all practical purposes) -- that is, pressure.

Temperature is the average kinetic energy of the particles and volume is, well, volume.  That gives us our basic parameters of temperature, pressure and volume.

But what is entropy, in this view?  In statistical mechanics, we're concerned about the large-scale (macroscopic) state of the system, but there are many different small-scale (microscopic) states that could give the same macroscopic picture.

Once you crank through all the math, it turns out that entropy is a measure of how many different microscopic states, which we can't measure, are consistent with the macroscopic state, which we can measure.  In fuller detail, entropy is actually proportional to the logarithm of that number -- the number of digits, more or less -- both because the raw numbers are ridiculously big, and because that way the entropy of two separate systems is the sum of the entropy of the individual systems.

The actual formula is S = k ln(W), where k is Boltzmann's constant and W is the total number of possible microstates, assuming they're all equally probable.  There's a slightly bigger formula if they're not.  Note that, unlike the original thermodynamic definition, this formula deals in absolute quantities, not changes.

When ice melts, entropy increases.  Water molecules in ice are confined to fixed positions in a crystal.  We may not know the exact energy of each individual molecule, but we at least know more or less where it is, and we know that if the energy of such a molecule is too high, it will leave the crystal (if this happens on a large scale, the crystal melts).  Once it does, we know much less about its location or energy.

Even without a phase change, the same sort of reasoning applies.  As temperature -- the average energy of each particle -- increases, the range of energies each particle can have increases.  How to translate this continuous range of energies into a number we can count is a bit of a puzzle, but we can handwave around that for now.

Entropy is often called a measure of disorder, but more accurately it's a measure of uncertainty (as theoretical physicist Sabine Hossenfelder puts it: "a measure for unresolved microscopic details"), that is, how much we don't know.  That's why Shannon used the same term in information theory.  The entropy of a message measures how much we don't know about it just from knowing its size (and a couple of other macroscopic parameters).  Shannon entropy is also logarithmic, for the same reasons that thermodynamic entropy is.

The formula for Shannon entropy in the case that all possible messages are equally probable is H = k ln(M), where M is the number of messages.  I put k there to account for the logarithm usually being base 2 and because it emphasizes the similarity to the other definition.  Again, there's a slightly bigger formula if the various messages aren't all equally probable, and it too looks an awful lot like the corresponding formula for thermodynamic entropy.

The original formulation of statistical mechanics assumed that physics at the microscopic scale followed Newton's laws of motion.  One indication that statistical mechanics was on to something is that when quantum mechanics completely reformulated what physics looks like at the microscopic scale, the statistical formulation not only held up, but became more accurate with the new information available.

In our current understanding, when two oxygen molecules bounce off each other, their electron shells interact (there's more going on, but let's start there), and eventually their energy gets redistributed into a new configuration.  This can mean the molecules traveling off in new paths, but it could also mean that some of the kinetic energy gets transferred to the electrons themselves, or some of the electrons' energy gets converted into kinetic energy.

Macroscopically this all looks the same as the old model, if you have huge numbers of molecules, but in the quantum formulation we have a more precise picture of entropy.  This makes a difference in extreme situations such as extremely cold crystals.  Since energy is quantized, there is a finite (though mind-bendingly huge) number of possible quantum states a typical system can have, and we can stop handwaving about how to handle ranges of possible energy.  This all works whether you have a gas, a liquid, an ordinary solid or some weird Bose-Einstein condensate.  Entropy measures that number of possible quantum states.

Thermodynamic entropy and information theoretic entropy are measuring basically the same thing, namely the number of specific possibilities consistent with what we know in general.  In fact, the modern definition of thermodynamic entropy specifically starts with a raw number of possible states and includes a constant factor to convert from the raw number to the units (energy over temperature) of classical thermodynamics.

This makes the two notions of entropy look even more alike -- they're both based on a count of possibilities, but with different scaling factors.  Below I'll even talk, loosely, of "bits worth of thermodynamic entropy" meaning the number of bits in the binary number for the number of possible quantum states.

Nonetheless, they're not at all the same thing in practice.

Consider a molecule of DNA.  There are dozens of atoms, and hundreds of subatomic particles, in a base pair.  I really don't know how many possible states a phosphorous atom (say) could be in under typical conditions, but I'm going to guess that there are thousands of bits worth of entropy in a base pair at room temperature.  Even if each individual particle can only be on one of two possible states, you've still got hundreds of bits.

From an information-theoretic point of view, there are four possible states for a base pair, which is two bits, and because the genetic code actually includes a fair bit of redundancy in the form of different ways of coding the same amino acid and so forth, it's actually more like 10/6 of a bit, even without taking into account other sources of redundancy.

But there is a lot of redundancy in your genome, as far as we can tell, in the form of duplicated genes and stretches of DNA that might or might not do anything.  All in all, there is about a gigabyte worth of base pairs in a human genome, but the actual gene-coding information can compress down to a few megabytes.  The thermodynamic entropy of the molecule that encodes those megabytes is much, much, larger.  If each base pair represents about a thousand bits worth of thermodynamic entropy under typical conditions, then the whole strand is into the hundreds of gigabytes.

I keep saying "under typical conditions" because thermodynamic entropy, being thermodynamic, depends on temperature.  If you have a fever, your body, including your DNA molecules in particular, has higher entropy than if you're sitting in an ice bath.  The information theoretic entropy, on the other hand, doesn't change.

But all this is dwarfed by another factor.  You have billions of cells in your body (and trillions of bacterial cells that don't have your DNA, but never mind that).  From a thermodynamic standpoint, each of those cells -- its DNA, its RNA, its proteins, lipids, water and so forth -- contributes to the overall entropy of your body.  A billion identical strands of DNA at a given temperature have the same information content as a single strand but a billion times the thermodynamic entropy.

If you want to compare bits to bits, the Shannon entropy of your DNA is inconsequential compared to the thermodynamic entropy of your body.  Even the change in the thermodynamic entropy of your body as you breathe is enormously bigger than the Shannon entropy of your DNA.

I mention all this because from time to time you'll see statements about genetics and the second law of thermodynamics.  The second law, which is very well established, states that the entropy of a closed system cannot decrease over time.  One implication of it is that heat doesn't flow from cold to hot, which is a key assumption in Carnot's proof.

Sometimes the second law is taken to mean that genomes can't get "more complex" over time, since that would violate the second law.  The usual response to this is that living cells aren't closed systems and therefore the second law doesn't apply.  That's perfectly valid.  However, I think a better answer is that this confuses two forms of entropy -- thermodynamic entropy and Shannon entropy -- which are just plain different.  In other words, entropy doesn't work that way.

From an information point of view, the entropy of a genome is just how many bits it encodes once you compress out any redundancy.  Longer genomes typically have more entropy.  From a thermodynamic point of view, at a given temperature, more of the same substance has higher entropy than less as well, but we're measuring different quantities.

A live elephant has much, much higher entropy than a live mouse, and likewise for a live human versus a live mouse.  As it happens, a mouse genome is roughly the same size as a human genome, even though there's a huge difference in thermodynamic entropy between a live human and a live mouse.  The mouse genome is slightly smaller than ours, but not a lot.  There's no reason it couldn't be larger, and certainly no thermodynamic reason.  Neither the mouse nor human genome is particularly large.  Several organisms have genomes dozens of times larger, at least in terms of raw base pairs.

From a thermodynamic point of view, it hardly matters what exact content a DNA molecule has.  There are some minor differences in thermodynamic behavior among the particular base pairs, and in some contexts it makes a slight difference what order they're arranged in, but overall the gene-copying machinery works the same whether the DNA is encoding a human digestive protein or nothing at all.  Differences in gene content are dwarfed by the thermodynamic entropy change of turning one strand of DNA into two, that in turn is dwarfed by everything else going on in the cell, and that in turn is dwarfed by the jump from one cell to billions.

For what it's worth, content makes even less thermodynamic difference in other forms of storage.  A RAM chip full of random numbers has essentially the same thermodynamic entropy, at a given temperature, as one containing all zeroes or all ones, even though those have drastically different Shannon entropies.  The thermodynamic entropy changes involved in writing a single bit to memory are going to equate to a lot more than one bit.

Again, this is all assuming it's valid to compare the two forms of entropy at all, based on their both being measures of uncertainty about what exact state a system is in, and again, the two are not actually comparable, even though they're similar in form.  Comparing the two is like trying to compare a football score to a basketball score on the basis that they're both counting the number of times the teams involved have scored goals.


There's a lot more to talk about here, for example the relation between symmetry and disorder (more disorder means more symmetry, which was not what I thought until I sat down to think about it), and the relationship between entropy and time (for example, as experimental physicist Richard Muller points out, local entropy decreases all the time without time appearing to flow backward), but for now I think I've hit the main points:
  • The second law of thermodynamics is just that -- a law of thermodynamics
  • Thermodynamic entropy as currently defined and information-theoretic (Shannon) entropy are two distinct concepts, even though they're very similar in form and derivation.
  • The two are defined in different contexts and behave entirely differently, despite what we might think from them having the same name.
  • Back at the first point, the second law of thermodynamics says almost nothing about Shannon entropy, even though you can, if you like, use the same terminology in counting quantum states.
  • All this has even less to do with genetics.

Wednesday, July 19, 2017

The human perspective and its limits

A couple more points occurred to me after I hit "publish" on the previous post.  Both of them revolve around subjectivity versus objectivity, and to what extent we might be limited by our human perspective.


In trying to define whether a kind of behavior is simple or complex, I gave two different notions which I claimed were equivalent: how hard it is to describe and how hard it is to build something to copy it.

The first is, in a sense, subjective, because it involves our ability to describe and understand things.  Since we describe things using language, it's tied to what fits well with language.  The second is much more objective.  If I build a chess-playing robot, something with no knowledge of human language or of chess could figure out what it was doing, at least in principle.

One of the most fundamental results in computer science is that there are a number of very simple computing models (stack machines, lambda calculus, combinators, Turing machines, cellular automata, C++ templates ... OK, maybe not always so simple) which are "functionally complete".  That means that any of them can compute any "total recursive function". This covers a wide range of problems, from adding numbers to playing chess to finding cute cat videos and beyond.

It doesn't matter which model you choose.  Any of them can be used to simulate any of the others.  Even a quantum computer is still computing the same kinds of functions [um ... not 100% sure about that ... should run that down some day --D.H.].  The fuss there is about the possibility that a quantum computer could compute certain difficult functions exponentially faster than a non-quantum computer.

Defining a totally recursive function for a problem basically means translating it into mathematical terms, in other words, describing it objectively.  Computability theory says that if you can do that, you can write a program to compute it, essentially building something to perform the task (generally you tell a general-purpose computer to execute the code you wrote, but if you really want to you can build a physical circuit to do the what the computer would do).

So the two notions, of describing a task clearly and producing something to perform it are, provably, equivalent.  There are some technical issues with the notion of complexity here that I'm going to gloss over.  The whole P = NP thing revolves around whether being able to state a problem simply means being able to solve it simply, but when it comes to deciding whether recognizing faces is harder than walking, I'm going to claim we can leave that aside.

The catch here is that my notion of objectivity -- defining a computable function -- is ultimately based on mathematics, which in turn is based on our notion of what it means to prove something (the links between computing and theorem proving are interesting and deep, but we're already in deep enough as it is).  Proof, in turn, is -- at least historically -- based on how our minds work, and in particular how language works.  Which is what I called "subjective" at the top.

So, is our notion of how hard something is to do mechanically -- my ostensibly "objective" definition -- limited by our modes of reasoning, particularly verbal reasoning, or is verbal/mathematical reasoning a fundamentally powerful way of describing things that we happened to discover because we developed minds capable of apprehending it?  I'd tend to think the latter, but then maybe that's just a human bias.



Second, as to our tendency to think that particularly human things like language and house-building are special, that might not just be arrogance, even if we're not really as special as we'd like to think.  We have a theory of mind, and not just of human minds.  We attribute very human-like motivations to other animals, and I'd argue that in many, maybe most, cases we're right.  Moreover, we also attribute different levels of consciousness to different things (things includes machines, which we also anthropomorphize).

There's a big asymmetry there: we actually experience our own consciousness, and we assume other people share the same level of consciousness, at least under normal circumstances, and we have that confirmed as we communicate our consciousnesses to each other.  It's entirely natural, then, to see our own intelligence and consciousness, which we see from the inside in the case of ourselves and close up in the case of other people, as particularly richer and more vivid.  This is difficult to let go of when trying to study other kinds of mind, but it seems to me it's essential at least to try.

Monday, July 17, 2017

Is recognizing faces all that special?

I've seen some headlines recently saying that fish can be taught to recognize human faces.  It's not clear why these would be circulating now, since the original paper appeared in 2016, but it's supposed to be newsworthy because fish weren't thought to have the neural structures needed to recognize faces.  In particular, they lack a neocortex (particularly the fusiform gyrus), or anything clearly analogous to it, which is what humans and primates use in recognizing faces.  Neither do the fish in question normally interact with humans, unlike, say, dogs, which might be expected to have developed an innate ability to recognize people.

The main thesis of the paper appears to be that there's nothing particularly special about recognizing faces.  As a compugeek, I'd say that the human brain is optimized for recognizing faces, but that doesn't mean that a more general approach can't work.  It makes sense that we'd have special machinery for faces.  Recognizing human faces is important to humans, though it's worth pointing out that there are plenty of people who don't seem to have this optimization (the technical term is prosopagnosia).

The authors of the paper also point out that recognizing faces is tricky:
[F]aces share the same basic components and individuals must be discriminated based on subtle differences in features or spatial relationships.
To be sure that the fish are performing the same recognition task we do, though presumably through different means, the experimental setup uses the same skin tone in all the images and crops them to a uniform oval.  Frankly, I found it hard to pick out the differences in what was left, but my facial recognition seems to be weaker than average in real life as well.

This is interesting work and the methodology seems solid, but should we really be surprised?  Yes, recognizing faces is tricky, but so is picking out a potential predator or prey, particularly if it's trying not to be found.

The archerfish used in the experiments normally provide for themselves by spitting jets of water at flies and small animals, then collecting them when they fall.  This means seeing the prey through the distortion of the air/water boundary, contracting various muscles at just the right rate and time, and finding the fallen prey.  For bonus points, don't waste energy shooting down dead leaves and such.

Doing all that requires the type of neural computation that seems easy until you actually try to duplicate it.  Did I mention that archerfish have a range on the order of meters, a dozen or so times their body length? It's not clear why recognizing faces should be particularly hard by comparison.

Computer neural networks can recognize faces using far fewer neurons than a fish has (Wikipedia says an adult zebrafish has around 10 million).  Granted, the fish has other things it needs to do with those neurons, and you can't necessarily compare virtual neurons directly to real ones, but virtual neurons are pretty simple -- they basically add a bunch of numbers, each multiplied by a "weight", and fiddle the result slightly.  Real neurons do much the same thing with electrical signals, hence the name "neural network".

It doesn't seem like recognizing shapes as complex as human faces should require a huge number of neurons.  The question, rather, is what kinds of brains are flexible enough to repurpose their shape recognition to an arbitrary task like figuring out which image of a face to spit at in order to get a tasty treat.

Again, is it surprising that a variety of different brains should have that kind of flexibility?  Being able to recognize new types of shape in the wild has pretty clear adaptive value, as does having flexible brain wiring in general.  Arguably the surprise would be finding an animal that relies strongly on its visual system that couldn't learn to recognize subtle differences in arbitrary shapes.

And yet, this kind of result does seem counterintuitive to many, and I'd include myself if I hadn't already seen similar results.  Intuitively we believe that some things take a more powerful kind of intelligence than others.  Playing chess or computing the derivative of a function is hard.  Walking is easy.

We also have a natural understanding of what kinds of intelligence are particularly human.  We naturally want to draw a clear line between our sort of intelligence and everyone else's.  Clearly those uniquely human abilities must require some higher form of intelligence.  Language with features like pronouns, tenses and subordinate clauses seems unique to us (though there's a lot we don't know about communication in other species), so it must be very high level.  Likewise for whatever we want to call the kind of planning and coordination needed to, say, build a house.

Recognizing each other's faces is a very human thing to do -- notwithstanding that several other kinds of animal seem perfectly capable of it -- so it must require some higher level of intelligence as well.

Now, to be clear, I'm quite sure that there is a constellation of features that, taken together, is unique and mostly universal to humanity, even if we share a number of particular features in that constellation with other species.  No one else we're aware of produces the kind of artifacts we do ... jelly donuts, jet skis, jackhammers, jugs, jujubes, jazz ...  or forms quite the same kind of social structures, or any of a number of other things.

However, that doesn't mean that these things are particularly complex or special.  We're also much less hairy than other primates, but near-hairlessness isn't a complex trait.  Our feet (and much of the rest of our bodies) are specialized for standing up, but that doesn't seem particularly different from specializing to swing through trees, or gallop, or hop like a kangaroo, or whatever else.

Our intuitions about what kind of intelligence is complex, or "of a higher order" are just not very reliable.  Playing chess is not particularly complicated.  It just requires bashing out lots and lots of different possible moves.  Calculating derivatives from a general formula is easy.  Walking, on the other hand, is fiendishly hard.  Language is ... interesting ... but many of the features of language, particularly stringing together combinations of distinct elements in sequence, are quite simple.

What do I mean by "simple" here?  I mean one of two more or less equivalent things: How hard is it to describe accurately, and how hard is it to build something to perform the task.  In other words, how hard is it to objectively model something, in the sense that you'll get the same result no matter who or what is following the instructions.

This is not necessarily the same question as how complex a brain do you need in order to perform the task, but this is partly because brains have developed in response to particular features of their environment.  Playing chess or taking the derivative of a polynomial shouldn't take a lot of neurons in principle, but it's hard for us because we don't have any neurons hardwired for those tasks.  Instead we have to use the less-hardwired parts of our brain pull together pieces that originally arose for different purposes.

Recognizing faces seems like something that requires a modest amount of machinery of the type that most visually-oriented animals should have available, and probably available in a form that can be adapted to the task, even if recognizing human faces isn't something the animal would normally have to do.  Cataloging what sorts of animals do it well seems interesting and ultimately useful in helping us understand our own brains, but we shouldn't be surprised if that catalog turns out to be fairly large.


Sunday, July 16, 2017

Discovering energy

If you get an electricity bill, you're aware that energy is something that can be quantified, consumed, bought and sold.   It's something real, even if you can't touch it or see it.  You probably have a reasonable idea of what energy is, even if it's not a precisely scientific definition, and an idea of some of the things you can do with energy: move things around, heat them or cool them, produce light, transmit information and so forth.

When something's as everyday-familiar as energy it's easy to forget that it wasn't always this way, but in fact the concept of energy as a measurable quantity is only a couple of centuries old, and the closely related concept of work is even newer.

Energy is now commonly defined as the ability to do work, and work as a given force acting over a given distance.  For example, lifting a (metric) ton of something one meter in the Earth's surface gravity requires exerting approximately 9800 Newtons of force over that distance, or approximately 9800 Newton-meters of work altogether.  A Joule of energy is the ability to do one Newton-meter of work, so lifting one ton one meter requires approximately 9800 Joules of energy, plus whatever is lost to inefficiency.

As always, there's quite a bit more going on if you start looking closely.  For one thing, the modern physical concept of energy is more subtle than the common definition, and for another energy "lost" to inefficiency is only "lost" in the sense that it's now in a form (heat) that can't directly do useful work.  I'll get into some, but by no means all, of that detail later in this post and probably in others as well.

I'm not going to try to give an exact history of thermodynamics or calorimetry here, but I do want to call out a few key developments in those fields.  My main aim is to trace the evolution of energy as a concept from a concrete, pragmatic working definition born out of the study of steam engines to the highly abstract concept that underpins the current theoretical understanding of the physical world.



The concept of energy as we know it dates to somewhere around the turn of the 19th century, that is, the late 1700s and early 1800s.   At that point practical steam engines had been around for several decades, though they only really took off when Watt's engine came along in 1781.  Around the same time a number of key experiments were done, heat was recognized as a form of energy and a full theory of heat, work and the relationship between the two was formulated.

What makes things hot?  This is one of those "why is the sky blue?" questions that quickly leads into deep questions that take decades to answer properly.  The short answer, of course, is "heat", but what exactly is that?  A perfectly natural answer, and one of the first to be formalized into something like what we would call a theory, is that heat is some sort of substance, albeit not one that we can see, or weigh, or any of a number of other things one might expect to do with a substance.

This straightforward answer makes sense at first blush.  If you set a cup of hot tea on a table, the tea will get cooler and the spot where it's sitting on the table will get warmer.  The air around the cup also gets warmer, though maybe not so obviously.  It's completely reasonable to say that heat is flowing from the hot teacup to its surroundings, and to this day "heat flow" is still an academic subject.

With a little more thought it seems reasonable to say that heat is somehow trapped in, say, a stick of wood, and that burning the wood releases that heat, or that the Sun is a vast reservoir of heat, some of which is flowing toward us, or any of a number of quite reasonable statements about heat considered as a substance.  This notional substance came to be known as caloric, from the Latin for heat.

As so often happens, though, this perfectly natural idea gets harder and harder to defend as you look more closely.  For example, if you carefully weigh a substance before and after burning it, as Lavoisier did in 1772, you'll find that it's actually heavier after burning.  If burning something releases the caloric in it, then does that mean that caloric has negative weight?  Or perhaps it's actually absorbing cold, and that's the real substance?

On the other hand, you can apparently create as much caloric as you want without changing the weight of anything.  In 1797 Benjamin Thompson, Count Rumford, immersed an unbored cannon in water, bored it with a specially dulled borer and observed that the water was boiling hot after about two and a half hours.  The metal bored from the cannon was not observably different from the remaining metal of the cannon, the total weight of the two together was the same as the original weight of the unbored cannon, and you could keep generating heat as long as you liked.  None of this could be easily explained in terms of heat as a substance.

Quite a while later, in the 1840s, James Joule did precise measurements how much heat was generated by a falling weight powering a stirrer in a vat of water.  Joule determined that heating a pound of water one degree Fahrenheit requires 778.24 foot-pounds of work (e.g., letting a 778.24 pound weight fall one foot, or a 77.824 weight fall ten feet, etc.). Ludwig Colding did similar research, and both Joule and Julius Robert von Mayer published the idea that heat and work can each be converted to the other.  This is not just significant theoretically.  Getting five digits of precision out of stirring water with a falling weight is pretty impressive in its own right.

At this point we're well into the development of thermodynamics, which Lord Kelvin eventually defined in 1854 as "the subject of the relation of heat to forces acting between contiguous parts of bodies, and the relation of heat to electrical agency."  This is a fairly broad definition, and the specific mention of electricity is interesting, but a significant portion of thermodynamics and its development as a discipline centers around the behavior of gasses, particularly steam.


In 1662, Robert Boyle published his finding that the volume of a gas, say, air in a piston, is inversely proportional to the pressure exerted on it.  It's not news, and wasn't at the time, that a gas takes up less space if you put it under pressure.  Not having a fixed volume is a defining property of a gas.  However, "inversely proportional" says more.   It says if you double the pressure on a gas, its volume shrinks by half, and so forth.  Another way to put this is that pressure multiplied by volume remains constant.

In the 1780s, Jacque Charles formulated (but didn't publish) the idea that the volume of a gas was proportional to its temperature.  In 1801 and 1802, John Dalton and Joseph Louis Guy-Lussac published experimental results showing the same effect.  There was one catch: you had to measure temperature on the right scale.  A gas at 100 degrees Fahrenheit doesn't have twice the volume of a gas at 50 degrees, nor does it if you measure in Celsius.

However, if you plot volume vs. temperature on either scale you get a straight line, and if you put the zero point of your temperature scale where that line would show zero volume -- absolute zero -- then the proportionality holds.  Absolute zero is quite cold, as one might expect.  It's around 273 degrees below zero Celsius (about 460 degrees below zero Fahrenheit).  It's also unobtainable, though recent experiments in condensed matter physics have come quite close.

Put those together and you get the combined gas law: Pressure times volume is proportional to temperature.

In 1811 Amedeo Avogadro hypothesized that two samples of the same gas at the same temperature, pressure and volume contained the same number of molecules.  This came to be known as Avogadro's Law.  The number of molecules in a typical system is quite large.  It is usually expressed in terms of Avogadro's Number, approximately six followed by twenty-three zeroes, one of the larger numbers that one sees in regular use in science.

Put that together with the combined gas law and you have the ideal gas law:
PV = nRT
P is the pressure, V is the volume, n is the number of molecules, T is the temperature and R is the gas constant that makes the numbers and units come out right.

The really important abstraction here is state.  If you know the parameters in the ideal gas law -- the pressure, volume, temperature and how many gas particles there are, then you know its state.  This is all you need to know, and all you can know, about that gas as far as thermodynamics is concerned.  Since the number of gas particles doesn't change in a closed system like a steam engine (or at least an idealized one), you only need to know pressure, volume and temperature to know the state.

Since the ideal gas law relates those, you really only need to keep track of two of the three.  If you measure pressure, volume and temperature once to start with, and you observe that the volume remains constant while the pressure increases by 10%, you know that the temperature must be 10% higher than it was.  If the volume had increased by 20% but the pressure had dropped by 10%, the temperature must now be 8% higher (1.2 * 0.9 = 1.08).  And so forth.

You don't have to track pressure and volume particularly, or even two of {pressure, volume, temperature}.  There are other measures that will do just as well (we'll get to one of the important ones in a later post), but no matter how you define your measures you'll need two of them to account for the thermodynamic state of a gas and (as long as they aren't essentially the same measure in disguise), two will be enough.  Technically, there are two degrees of freedom.

This means that you can trace the thermodynamic changes in a gas on a two-dimensional diagram called a phase diagram.  Let's pause for a second to take that in.  If you're studying a steam engine (or in general, a heat engine) that converts heat into work (or vice versa) you can reduce all the movements of all the machinery, all the heating and cooling, down to a path on a piece of paper.  That's a really significant simplification.


In theory, the steam in a steam engine (or generally the working fluid in a heat engine), will follow a cycle over and over, always returning to the same point in the phase diagram (that is, the same state).    In practice, the cycle won't repeat exactly, but it will still follow a path through the phase diagram that repeats the same basic pattern over and over, with minor variations.

The heat source heats the steam and the steam expands.  Expanding means exerting force against the walls of whatever container it's in, say the surface of a piston.  That is, it means doing work.  The steam is then cooled, removing heat from it, and the steam returns to its original pressure, volume and temperature.  At that point, from a thermodynamic point of view, that's all we know about the steam.  We can't know, just from taking measurements on the steam, how many times it's been heated and cooled, or anything else about its history or future.  All you know is its current thermodynamic state.

As the steam contracts back to its original volume, its surroundings are doing work on it.  The trick is to manipulate the pressure, temperature and volume in such a way that the pressure, and thus the force, is lower on the return stroke than the power stroke, and the steam does more work expanding than is done on it contracting.  Doing so, it turns out, will involve putting more heat into the heating than comes out in the cooling.  Heat goes in, work comes out.


This leads us to one of the most important principles in science.  If you carefully measure what happens in real heat engines, and the ways you can trace through a path in a phase diagram, you find that you can convert heat to work, and work to heat, and that you will always lose some waste heat to the surroundings, but when you add it all up (in suitable units and paying careful attention to the signs of the quantities involved), the total amount of heat transferred and work done never changes.  If you put in 100 Watts worth of heat, you won't get more than 100 Watts worth of work out.  In fact, you'll get less.  The difference will be wasted heating the surroundings.

This is significant enough when it comes to heat engines, but that's only the beginning.  Heat isn't the only thing you can convert into work and vice versa.  You can use electricity to move things, and moving things to make electricity.  Chemical reactions can produce or absorb heat or produce electrical currents, or be driven by them.   You can spin up a rotating flywheel and then, say, connect it to a generator, or to a winch.

Many fascinating experiments were done, and the picture became clearer and clearer: Heat, electricity, the motion of an object, the potential for a chemical reaction, the stretch in a spring, the potential of a mass raised to a given height, among other quantities, can all be converted to each other, and if you measure carefully, you always find the total amount to be the same.

This leads to the conclusion that all of these are different forms of the same thing -- energy -- and that this thing is conserved, that is, never created or destroyed, only converted to different forms.


As far-reaching and powerful as this concept is, there were two other important shifts to come.  One was to take conservation of energy not as an empirical result of measuring the behavior of steam engines and chemical reactions, but as a fundamental law of the universe itself, something that could be used to evaluate new theories that had no direct connection to thermodynamics.

If you have a theory of how galaxies form over millions of years, or how electrons behave in an atom, and it predicts that energy isn't conserved, you're probably not going to get far.  That doesn't mean that all the cool scientist kids will point and laugh (though a certain amount of that has been known to happen).  It means that sooner or later your theory will hit a snag you hadn't thought of and sooner or later the numbers won't match up with reality*.  When this happens over and over and over, people start talking about fundamental laws.


The second major shift in the understanding of energy came with the quantum theory, that great upsetter of scientific apple carts everywhere.  At a macroscopic scale, energy still behaves something like a substance, like the caloric originally used to explain heat transfer.  In Count Rumford's cannon-boring experiment, mechanical energy is being converted into heat energy.  Heat itself is not a substance, but one could imagine that energy is, just one that can change forms and lacks many of the qualities -- color, mass, shape, and so forth -- that one often associates with a substance.

In the quantum view, though, saying that energy is conserved doesn't assume some substance or pseudo-substance that's never created or destroyed.  Saying that energy is conserved is saying that the laws describing the universe are time-symmetric, meaning that they behave the same at all times.  This is a consequence of Noether's theorem, one of the deepest results in mathematical physics, which relates conservation in general to symmetries in the laws describing a system.  Time symmetry implies conservation of energy.  Directional symmetry -- the laws work the same no matter which way you point your x, y and axes -- implies conservation of angular momentum.

Both of these are very abstract.  In the quantum world you can't really speak of a particle rotating on an axis, yet you can measure something that behaves like angular momentum, and which is conserved just like the momentum of spinning things is in the macroscopic world.  Just the same, energy in the quantum world has more to do with the rates at which the mathematical functions describing particles vary over space and time, but because of how the laws are structured it's conserved and, once you follow through all the implications, energy as we experience it on our scale is as well.

This is all a long way from electricity bills and the engines that drove the industrial revolution, but the connections are all there.  Putting them together is one of the great stories in human thought.

* I suppose I can't avoid at least mentioning virtual particles here.  From an informal description, of particles being created and destroyed spontaneously, it would seem that they violate conservation of energy (considering matter as a form of energy).  They don't, though.  Exactly why they don't is beyond my depth and touches on deeper questions of just how one should interpret quantum physics, but one informal way of putting it is that virtual particles are never around for long enough to be detectable.  Heisenberg uncertainty is often mentioned as well.