Thursday, May 31, 2018

Cookies, HTTPs and OpenId

I finally got around to looking at the various notices that have accumulated on the admin pages for this blog.  As a result:

  • This blog is supposed to display a notice regarding cookies if you access it from the EU.  I'm not sure that this notice is actually appearing when it should (I've sent feedback to try to clarify), but as far as I can tell blogspot is handling cookies for this blog just like any other.  I have not tried to explicitly change that behavior.
  • I've turned on "redirect to https".  This means that if you try to access this blog via http://, it will be automatically changed to https://.  This shouldn't make any difference.  On the one hand, https has been around for many years and all browsers I know of handle it just fine.  On the other hand, this is a public blog, so there's no sensitive private information here.  It might maybe make a difference if you have to do some sort of login to leave comments, but I doubt it.
  • Blogger no longer supports OpenID.  I think this would only matter if I'd set up "trust these web sites" under the OpenId settings, but I didn't.
In other words, this should all be a whole lot of nothing, but I thought I'd let people know.

Wednesday, May 23, 2018

The stuff of dreams


... and then our hero woke up and it was all a dream ...

... has to rank among the most notorious pulled-out-of-thin-air deus ex machina twist endings in the book, along with "it was actually twins" and "it was actually the same person with multiple personalities".  As with all such tropes, there's nothing wrong with these plot twists per se.  The problem is generally the setup.

In a well-written "it was actually twins" twist, you have clues all along that there were actually two people -- maybe subtle shifts in behavior, or a detail of clothing that comes and goes, or the character showing up in an unexpected place that it seemed unlikely they'd be able to get to.  With a good setup, you're reaction is "Oh, so that's why ..." and not "Wait ... what?  Seriously?"

The same goes for "it was all a dream".  In a good setup, there are clues that it was all a dream.  Maybe things start out ok, then something happens that doesn't quite make sense, then towards the end things get seriously weird, but more in a "wait, what's going on here, why did they do that?" kind of way, as opposed to a "wait, was that a flying elephant I just saw with a unicyclist on its back?" kind of way, though that can be made to work as well.

There's a skill to making things dreamlike, particularly if you're trying not to give the game away completely.  Dream logic doesn't just mean randomly bizarre things happening.  Dreams are bizarre in particular ways which are not particularly well understood, even though people have been talking about and interpreting dreams probably for as long as there have been talking and dreams.

A while ago I ran across a survey by Jennifer Windt and Thomas Metzinger that has quite a bit to say about dreams and the dream state, both ordinary dreams and "lucid" dreams where the rules are somewhat different.  They compare the three states of ordinary dreaming, lucid dreaming and waking consciousness to try to tease out what makes each one what it is with, I found, fair success.  I'm not going to go into a detailed analysis of that paper here, but I did want to acknowledge it, if only as a starting point.


First, though, some more mundane observations about dreams.  We tend to dream several times a night, in cycles lasting around 90 minutes.  We don't typically remember this, but a subject who is awakened while exhibiting signs of a dream state can generally recall dreaming while a subject awakened under other conditions doesn't.  The dream state is marked by particular patterns of electrical activity in the brain, near-complete relaxation of the skeletal muscles and, probably best-known, Rapid Eye Movement, or REM.  REM is not a foolproof marker, but the correlation is high.

Dreams in early sleep cycles tend to be closely related to things that happened during the waking day.  Subjects who studied a particular skill prior to going to sleep, for example, tended to have dreams about that skill.  I've personally had dreams after coding intensely that were a sort of rehash of how I'd been thinking about the code in question, not so much in the concrete sense of writing or reading particular pieces as more abstractly navigating data and control structures.

Later dreams -- those closer to when you wake up -- tend to be more emotional and less closely associated with recent memories.  Since these are more likely to be the ones you remember unless someone is waking you up as part of a sleep experiment, these are the kind of dreams we tend to think of as "dreamlike".  These are the "I was in this restaurant having dinner with such-and-such celebrity, except it didn't look like them, and I could hear my third-grade teacher yelling something, but everyone just ignored it and then a huge ocean wave came crashing in and we all had to swim for it, even though the restaurant was in the Swiss Alps" kind of dreams.

In my experience this kind of dream can often be linked back to relevant events, but in a sort of mashed-up, piecemeal, indirect way.  Maybe you heard a news story about a tidal wave yesterday and a couple of days ago some relative or old friend had mentioned something that happened to you in grade school.  Celebrities, by definition, are frequently in the news, and it was the Swiss Alps just because.  That doesn't really explain what the dream might mean, if indeed it meant anything, but it does shed some light on why those particular elements might have been present.

But why that particular assemblage of elements?  Why wasn't the third grade teacher your dinner companion?  Why did all the other diners ignore the teacher?  Why wasn't the restaurant on the beach? And so on.

My personal theory on such things is pretty unsatisfying: it just is.  Whatever part of the mind is throwing dream elements together is distinct from the parts of the mind concerned with cause and effect and pulling together coherent narratives.

To draw a very crude analogy, imagine memory as a warehouse.  From time to time things have to be shuffled around in a warehouse in for various logistical reasons.  For example, if something that's been stored in the back for months now needs to be brought out, you may have to move other items around to get at it.  Those items were put there for their own reasons that may not have anything to do with the item that's being brought out.

Now suppose someone from management in a different part of the company -- say media relations -- comes in and starts observing what's going on.  A pallet of widgets gets moved from section 12D, next to the gadgets, to section 4B, next to the thingamajigs.  This goes on for a while and our curious manager may even start to notice patterns and make tentative notes on them.

Suppose upper-level management demands, for its own inscrutable reasons, a press release on the warehouse activity.  The media relations person writing the release is not able to contact the warehouse people to find out what's really going on and just has to go by the media relations manager's notes about widgets moving from next to the gadgets to next to the thingamajigs.  The resulting press release is going to try to tell a coherent story, but it's not going to make much sense.  It's almost certainly not going to say "We had to get the frobulator out of long-term storage for an upcoming project so we moved a bunch of stuff to get at it."

My guess is that something similar is going on in the brain with dreams.  In normal waking consciousness, the brain is receiving a stream of inputs from the outside world and putting them together into a coherent picture of what's going on.  There are glitches all the time for various reasons. The input we get is generally incomplete and ambiguous.  We can only pay attention to so much at a time.

In order to cope with this we constantly make unconscious assumptions based on expectations, and these vary from person to person since we all have different experiences.  The whole concept of consciousness is slippery and by no means completely understood, but for the purpose of this post consciousness (as opposed to any particular state of consciousness) means whatever weaves perception into a coherent picture of what's going on.

Despite all the difficulties in turning perception into a coherent reality, we still do pretty well.  Different people perceiving the same events can generally agree on at least the gist of what happened, so in turn we agree that there is such a thing as "objective reality" independent of the particular person observing it.  Things fall down.  The sun rises in the morning.  It rains sometimes.  People talk to each other, and so on.  Certainly there's a lot we disagree on, sometimes passionately with each person firmly believing the other just doesn't know the simple facts, but this doesn't mean there's no such thing as objective reality at all.



In the dream state, at least some of the apparatus that builds conscious experience is active, but it's almost completely isolated from the outside world (occasionally people will incorporate outside sounds or other sensory input into a dream, but this is the exception).  Instead it is being fed images from memories which, as in the warehouse analogy, are being processed according to however memory works, without regard to the outside world.  Presented with this, consciousness tries to build a narrative anyway, because that's what it does, but it's not going to make the same kind of sense as waking consciousness because it's not anchored to the objective, physical world.

If the early memory-processing is more concerned with organizing memories of recent events, early-cycle dreams will reflect this.  If later memory processing deals in larger-scale rearrangement and less recent, less clearly correlated memories, later-cycle dreams will reflect this.


As I understand it, Windt and Metzinger's analysis is broadly compatible with this description, but they bring in two other key concepts that are important to understanding the various states of consciousness: agency and phenomenal transparency.

Agency is just the power to act.  In waking consciousness we have a significant degree of agency.  In normal circumstances we can control what we do directly -- I choose to type words on a keyboard.  We can influence the actions of others to some extent, whether by force or persuasion.  We can move physical objects around, directly or indirectly.  If I push over the first domino in a chain, the others will fall.

In a normal dream the dreamer has no agency.  Things just happen.  Even things that the dreamer experiences as doing just happen.  You can recall "I was running through a field", but generally that's just a fact.  Even if your dream self decides to do something, as in "The water was rushing in so I started swimming", it's not the same as "I wanted to buy new curtains so I looked at a few online and then I picked these out".  Your dream self is either just doing things, or sometimes just doing things in a natural reaction to something that happened.

Even that much is a bit suspect.  It wouldn't be a surprise to hear "... a huge ocean wave came crashing in and then I was walking through this city, even though it was underwater".  In some fundamental way, in a dream you're not making things happen.  They just happen.

Likewise, one of the most basic forms of agency is directing one's attention, but in a dream you don't have any choice in that, either.  Instead, attention is purely salience based, meaning, more or less, that in a dream your attention is directed where it needs to be -- if that ocean wave bursts in you're paying attention to the water -- rather than where you want it to be.

Phenomenal transparency concerns knowing what state of consciousness you're in.  Saying that dreaming is phenomenally transparent is just a technical way of saying "when you're in a dream you don't know you're dreaming" (So why coin such a technical term for such a simple thing?  For the usual reasons.  On the one hand, repeating that whole phrase every time you want to refer to the concept -- which will be a lot if you're writing a paper on dreaming -- is cumbersome at best.  It's really convenient to have a short two-word phrase for "the-quality-of-not-knowing-you're-dreaming-when-you're dreaming".  On the other hand, defining a phrase and using it consistently makes it easier for different people to agree they're talking about the same thing.  But I digress.)

If someone is recalling a dream, they don't recall it as something that they dreamed.  The recall it as something that happened, and happened in a dream.  It "happened" just the same as something in waking consciousness "happened".  During the dream itself, it's completely real.  Only later, as we try to process the memory of a dream, do we understand it as a dream.  I've personally had a few fairly unsettling experiences of waking up still in a dreamlike state and feeling some holdover from the dream as absolutely real, before waking up completely and realizing ... it was all a dream (more on this below).  I expect this is something most people have had happen and this is why the "it was all a dream trope" can work at all.

In some sense this seems related to agency.  When you say "I dreamed that ..." it doesn't mean that you consciously decided to have thus-and-such happen in your dream.  It means that you had a dream, and thus-and-such happened in it.

Except when  it doesn't ...

Windt and Metzinger devote quite a bit of attention to lucid dreams. While the term lucid might suggest vividness and clarity, and this can happen, lucidity generally refers to being aware that one is dreaming (phenomenal transparency breaks down).  Often, but not always, the dreamer has a degree of control (agency) over the action of the dream.  In a famous experiment, lucid dreamers were asked to make a particular motion with their eyes, something like "when you realize you're in a dream, look slowly left and then right, then up, then left and right again", something that would be clearly different from normal REM.  Since the eyes can still move during a dream, even if the rest of the body is completely relaxed, experimenters were able to observe this and confirm that the dreamers were indeed aware and able to act.

Not everybody has lucid dreams, or at least not everyone is aware of having had them.  I'm not sure I've had any lucid dreams in the "extraordinarily clear and vivid" sense, but I've definitely had experiences drifting off to sleep and working through some problem or puzzle to solve, quite consciously, but blissfully unaware that I'm actually asleep and snoring.  I've also had experiences waking up where I was able to consciously replay what had just been happening in a dream and at least to some extent explore what might happen next.  I'm generally at least somewhat aware of my surroundings in such cases, at least intermittently, so it's not clear what to call dreaming and what to call remembering a dream.

In any case, I think this all fits in reasonably well with the idea of multiple parts of the brain doing different things, or not, none of them in complete control of the others.  Memory is doing whatever memory sorting it needs to do during sleep (it's clear that there's at least something essential going on during sleep, because going without sleep for extended periods is generally very bad for one's mental health).  Some level of consciousness's narrative building is active as well, doing its best to make sense of the memories being fed to it.  Some level of self awareness that "I'm here and I can do things" may or may not be active as well, depending on the dreamer and the particular circumstances.

This is nowhere near a formal theory of dreams.  Working those out is a full-time job.  I do think it's interesting, though, to try to categorize what does and doesn't happen in dream states and compare that to normal waking consciousness.  In particular, if you can have A happen without B happening and vice versa, then in some meaningful sense A and B are produced by different mechanisms.

If we draw up a little table of what can happen with or without what else...

Can there be ... without ...ConsciousnessAgencyPhenomenal transparency
Consciousnessyes1yes2
(Conscious) Agencyno?
Phenomenal transparencynoyes3
1 In ordinary dreams, but also, e.g., if paralyzed by fear
2 In ordinary dreams
3 In a lucid dream, if you're aware that you're dreaming but can't influence the dream

... it looks like things are pretty wide open.  I didn't mention it in the table, but agency doesn't require consciousness.  We do things all the time without knowing why, or even that, we're doing them.  However, conscious agency requires consciousness by definition.  So does phenomenal transparency -- it's consciousness of one's own state.

Other than that, everything's wide open except for one question mark: Can you have conscious agency without phenomenal transparency?  That is, can you consciously take an action without knowing whether you're awake or dreaming (or in some other mental state).  This isn't clear from lucid dreaming, since lucid dreaming means you know you're dreaming.  It isn't clear from ordinary dreaming.  Ordinary dreams seem passive in nature.

In a related phenomenon, though, namely false awakening, the dreamer can, while actually remaining asleep, awaken and start to do ordinary things.  In some cases, the dreamer becomes aware of the dream state, but in other cases the illusion of being awake lasts until the dreamer awakens for real.

All of this is just a long way of saying that our various faculties like consciousness, agency and awareness of one's state of consciousness seem to be mix and match.  The normal states are waking consciousness and ordinary dreaming, but anything between seems possible.  In other words, while these faculties generally seem to operate either together (waking consciousness), or with only consciousness (ordinary dreaming) they're actually independent.  It's also worth noting that nothing in the table above distinguishes waking from dreaming.  The difference there would seem to be in whether we're processing the real world or memories of it.

This is an interesting piece of information, one which would have been considerably harder to come by if we didn't have the alternate window into consciousness provided by dreams.

Thursday, May 3, 2018

Getting off the ground

Not long after I published the previous post about the Drake Equation, a couple of headlines surfaced about a paper by Michael Hippke with the admirably straightforward title Spaceflight from Super-Earths is difficult.  The paper is actually a light rewrite of what was originally an April Fool's joke, but the analysis is real, even if the author originally considered the topic frivolous.

The term Super-Earth itself is fairly loosely defined.  For concreteness, Hippke chooses Kepler-20b, with a radius of about 1.87R (Earth radii) and a mass of about 9.7 M (Earth masses).  Since gravity is proportional to mass and inversely proportional to the square of distance, the surface gravity of this planet would be about 2.8g (Earth gravity).  This is assuming that the measured radius is actually the radius of the surface.  There's a good chance that Kepler-20b is actually a "Mini-Neptune" with an extensive atmosphere rather than a Super-Earth with a rocky surface, but let's assume the Earth-like scenario here.

Hippke argues that it would be impractical for a civilization on such a planet to build rockets because the amount of fuel you need to reach escape velocity* increases exponentially in relation to that velocity.  This is exponential in the literal sense that doubling the velocity of a rocket means squaring the ratio of fuel to mass, not in the colloquial sense of "a lot".  Escape velocity in turn increases as the square root of the surface gravity.  For example, four times the surface gravity means twice the escape velocity, so square the ratio of fuel to dry mass.  Taking the square root doesn't make a lot of difference in the big picture.  The exponential part still dominates everything else.

In short, a somewhat bigger planet doesn't mean somewhat more fuel to get to escape velocity.  It can mean a lot more.

On Earth, a chemical rocket which magically had a weightless engine, fuel tank etc. would need to have 26 times as much fuel as payload in order to reach Earth's escape velocity of about 11 km/s**.  In real life that ratio is more like 50 or even 83 since the engine and so forth actually do weigh something.

Escape velocity for Kepler-20b would be about 2.3 times Earth's escape velocity, or around 25 km/s.  Hippke calculates that for a typical chemical rocket, that 26:1 ideal mass ratio is more like 2700:1 and the more realistic ratio of 83:1 would correspond to something like 9000:1.  To send a 1-ton payload out of the planet's gravity well would take 9000 tons of fuel.  By contrast, the Saturn V -- the largest rocket actually put into service so far -- had a mass of around 3000 tons, not all of which was fuel.

All this is fine, and surely more than enough for something that started out thoroughly tongue-in-cheek.  So let's take it at face value and try to poke holes in it anyway.

First, the calculations are for a single-stage rocket, though the real-life rockets used for comparison purposes are multi-stage.  In a multi-stage rocket you use a rocket with plenty of thrust (the first stage) to boost another rocket (the second stage) through the atmosphere quickly and then jettison that first stage.  At that point you no longer have to worry about the mass of the first stage and you consequently get more acceleration out of your remaining fuel.  You don't have to stop there.  The Saturn V, for example, was a three-stage rocket.  Five-stage rockets have been successfully launched.

This doesn't just make a difference in that a multi-stage rocket allows you get more acceleration out of the same mass ratio.  It also means that you don't have to use chemical rockets for all stages.  You could, for example, use an ion drive, which has a much higher effective velocity and therefore a much lower mass ratio, for the final stage and use chemical rockets to get it into orbit.  Ion drives produce very low thrust, far too little to launch from the ground, but they can do it for a very long time using very little fuel, eventually reaching much higher speeds than chemical rockets.  Once in orbit, a modestly-sized ion-driven vehicle could easily escape even Kepler 20b's gravity well.

In other words, getting to escape velocity in a single stage is a red herring.  You really just have to get a reasonable mass to orbital velocity, and you can use multiple stages if that helps.  At a given distance from the planet's center of mass, the orbital velocity is smaller than the escape velocity at the same distance by a factor of the square root of two.  In real life the orbit is -- of course -- further from the center of mass than the surface is.  If escape velocity at the surface is 25 km/s, a more reasonable orbital velocity would be 17 km/s, depending on how high up you have to go to get out of the atmosphere.  That would mean a mass ratio of more like 150 for an ideal rocket and 500 for a more realistic one.

That's still considerably more expensive than here on earth, but not nearly as discouraging as the 9000 figure in the paper.  A 500 ton rocket could put a ton in orbit, and you wouldn't even need to do that to get out of the gravity well.  Japan's ion-driven Hayabusa craft had a mass of about half a ton.  It was able to get to an asteroid, grab a sample and bring it back to Earth -- a pretty impressive piece of engineering if you ask me.  Our counterparts on Kepler 20b could do that with something like a 250 ton rocket.

The rocket that launched Sputnik was 267 tons (the rocket that actually launched Haybusa was around 140 tons, for a mass ratio of around 280).  Sputnik itself was only 84 kg, for a mass ratio of somewhat over 3000.  Small payloads generally mean higher mass ratios because it's not practical to shrink the launch system proportionately.

Leaving all that aside, you could also do multiple launches and assemble the final craft in orbit, if your robotics were good enough.  If you can launch half a ton with a reasonable-sized rocket, you can launch five tons with ten such rockets, and so forth.

Which brings up another point.  In the early stages of space exploration, before Kepler 20b puts its ion drive into orbit, they'll want to start small, using relatively big rockets to put relatively small things in orbit, and before that, to blast relatively small objects -- on Earth, that mainly meant weapons -- across large portions of the planet.

There doesn't seem to be any reason intelligent beings on Kepler 20b couldn't do that, assuming they're there.  Start with toy rockets, then weather rockets to explore the upper atmosphere, work up to ICBM-style systems, then orbit, then out of the gravity well, just as we did.  As far as I can tell, the difference on Kepler 20b would mainly be a matter of time, not a night-and-day difference between plausible and clearly impractical.  The benchmark of putting a ton or more directly on an escape trajectory doesn't seem particularly relevant to the question of whether or not this could happen, though, being concrete and understandable, it's still useful to think about.

There's another way to bring down the mass ratio: faster rocket fuel.  Hippke's calculations use an effective velocity of 3430 m/s, but hydrogen/oxygen delivers more like 4400.  That brings our ideal mass ratio down closer to 50 as opposed to 150.  As I understand it we only use hydrogen/oxygen in specific situations, due to various engineering considerations, but the tradeoffs will be different on Kepler 20b.  It might make sense to find ways to make the faster fuel work in more situations.

Even if chemical rockets weren't a practical way of getting into orbit, there are plenty of other options, some more speculative than others, for doing so.  Space elevators ... mass drivers ... blast wave accelerators ... space fountains.  Some of these require materials we don't know how to make yet or other not-so-proven technologies, but to some extent this is all a matter of economics.  Rockets are easy and cheap enough for us, so we use rockets.

Finally, it's probably worth pointing out that escaping a planet's gravity well is necessary for sending an interstellar mission, but hardly sufficient.  Kepler 20 is 950 light-years away.  To get here from there in, say, less than 10,000 years, you'll need to be going about a tenth the speed of light, or 30,000 km/s.  If you can do that, getting into orbit or even to escape velocity doesn't seem like a major problem.  Conversely, the most likely reason not to receive a visit from Kepler 20b is that it's just too far, not that it's too hard to get off the ground.





* I suppose I should acknowledge that "velocity" here actually means "speed" since it's a magnitude with no particular direction.  But everyone says "velocity" anyway.

** In real life you also have to deal with gravity losses until you reach orbital velocity.  For example, for every second you spend going straight up against Earth's surface gravity, you lose 9.8 m/s.  For Kepler 20b, that's more like 28 m/s.  If your initial stages take 180 seconds (three minutes), that's an extra 4000 m/s or so, except it's not really that simple since you don't spend all your time going straight up, particularly if the goal is to reach orbit.  I'm handwaving that, though it's quite a bit to handwave, just to keep the comparison with the ideal mass ratio of 26.  Part of the reason real rockets, even with multiple stages, needed a higher mass ratio than just the change in speed would suggest was to deal with gravity loss.

Tuesday, April 17, 2018

Detectability and the Drake Equation

I've argued in posts on the Drake Equation (really more a framework for trying to work out the odds of finding extraterrestrial life), that the L factor, representing the amount of time for which an intelligent civilization is detectable on a planet, is both underappreciated and overestimated.  That is, it's not just important whether a planet can develop life -- which is were a lot of attention is -- but just how long a planet with intelligent life is detectable as such.  If that time span is not very long, then there might well be intelligent civilizations out there that we don't know about, or have much hope of ever knowing about.

Radio transmissions are often used as a proxy for intelligent life.  Clearly, if we detect a radio signal coming from planet X with a structure we can't explain by natural means, we have to seriously consider the possibility that some intelligent life form sent the signal.  Artificial-looking radio signals strongly imply intelligent life, but lack of them doesn't imply lack of intelligent life.

Radio isn't the only way to go looking for intelligent life.  We're already able to get some idea of the atmospheres of exoplanets based on their effect on light from the parent star as they transit between us and that star, provided everything is in a favorable alignment.  That ability is liable to improve over time, to the point where we'll be able to detect whether a planet has a chemical composition that's likely to be produced by something like life as we know it.  That's pretty impressive, if you think about what it entails, and we're just getting started.  Astronomy has gotten really good at gleaning ridiculously faint signals from vast fields of noise, and while there are some fundamental limits to what we can gather, there's clearly a lot more we can do within those limits.

Likewise, if we can detect some signal related to a planet's surface, and we can observe the same planet from different angles (which is often possible since planets rotate) we can get some idea of any changes in the surface as seen from different angles.  Similar techniques were used to get a very rough map of Pluto's surface prior to the New Horizons mission.  It may also be possible to detect the polarization of light coming from a planet, and there are probably other sources of data.  Put together enough such hints and we may be able to measure whether a planet has anomalously dull or shiny or hot or cold regions or similar that might indicate ... something, maybe enough to say that there's probably a civilization something like ours on a given planet, and not just an odd configuration of protoplanetary dust.


So suppose that some twin Earth in the general vicinity -- say a few dozen or a few hundred light-years -- develops along similar lines to ours.  Suppose that some time after a species like ours arises it discovers radio, but not long after that it finds more efficient ways of communicating than blasting radio waves in all directions (including a tiny fraction headed towards us).  In my previous posts I argued that that's probably about it.  We won't be able to detect any signs of intelligent life (life, yes, civilization no) except for a tiny portion of the planet's existence, and so the odds are very low that we happen to be listening at the same time they're broadcasting.

But surely twin Earth will have cities and other large artifacts for longer than it has detectable radio emissions, and have them both before and after their radio era.  Our ability to detect such artifacts will only improve over time.  Suppose our techniques get to the point where we could detect the analog of a city of, say, 100,000 people by way of its structures and overall impact on the surrounding environment.  There are thousands of those on Earth now and, more to the point, there have been for quite a while.  Thousands of years, versus decades for radio transmission.  It's at least possible that there will continue to be cities for thousands of years more.  This is a couple of orders of magnitude longer than our detectable radio era might end up being, not something easy to write off.

Nonetheless, I don't know that it changes the picture much.  On the one hand, even this larger time window is still pretty small on a planetary time scale.  On the other hand, it's not at all clear to me that detecting a signature consistent with cities means that there are cities there.  I'd want to see a lot of work to rule out natural formations that we haven't thought of, and even then city-like collections of life don't necessarily mean intelligent civilization.  We are not the only life forms on Earth that can gather in numbers or have a significant environmental effect.

It's also entirely possible that we'd see the same effect with cities on Earth as with radio, just on a slower scale.  That is, we or our counterparts might have cities for a long time, but not have detectable cities for very long.  I'm not going to predict that humanity will necessarily lessen its overall impact on the environment over time, but it's possible.  If we become cleaner and (much the same thing) more efficient, we become harder to spot, and likewise for a hypothetical alien civilization.

Nonetheless, it seems dangerous to assume that whatever impact we do have, or an alien civilization has, will be undetectable from interstellar distances.  It will probably be detectable as an overall signature.  The question is what could we make of such a signature.  We'd probably be able to associate it with life, but what kind of life?

[ Re-reading an earlier post I see I already took this point into account, although in a more abstract way -- D.H.]

Thursday, December 7, 2017

Where should I file this, and do I care?

I used to love to browse the card catalog at the local library (so yep ... geek).  This wasn't just for the books, but for the way they were organized.  The local library, along with my middle and high school libraries, used the Dewey Decimal Classification (or "Dewey Decimal System" as I remember it actually being called).

This was, to my eyes, a beautiful way of sorting books.  The world was divided into ten categories, each given a range of a hundred numbers, from 000-099 for "information and general works" (now also including computer science) to 900-999 for history and geography.  Within those ranges, subjects were further divided by number.  Wikipedia gives a good example:
500 Natural sciences and mathematics
510 Mathematics
516 Geometry
516.3 Analytic geometries
516.37 Metric differential geometries
516.375 Finsler geometry
Finsler geometry is pretty specialized (a Finsler manifold is a differentiable manifold whose metric has particular properties -- I had to look that up).  Clearly you could keep adding digits as long as you like, slicing ever finer, though in practice there are never more than a few (maybe just three?) after the decimal point.

With the Dewey classification in place, you could walk into libraries around the country, indeed around the world, and quickly figure out where, say, the books on gardening, medieval musical instruments or truck repair were located.  If you or the librarian found a book lying around, you could quickly put it back in its proper place on the shelves.  If you found a book you liked, you could find other books on related topics near it, whether on the shelves or in the card catalog (what's that, Grandpa?).

On top of that, the field of library science, in which the Dewey classification and others like it* play a central role is one of the precursors of computer science as we know it.  This is true at several levels, from the very idea of organizing large amounts of information (and making it universally accessible and useful), to the idea of using an index that can easily be modified as new items are added.

There's one other very significant aspect of library classification systems like Dewey: hierarchy.

It's almost too obvious to mention, but in the Dewey Classification, and others like it, the world is organized into high-level categories (natural sciences and mathematics), which contain smaller, more specific categories (mathematics), and so on down to the bottom level (Finsler geometry).  There are lots and lots of systems like this -- federal/state/local government in the US and similar systems elsewhere; domain/kingdom/phylum/class/order/family/genus/species in taxonomy; supercluster/galaxy cluster/galaxy/star system/star in astronomy; top-level-domain/domain/.../host and so forth.

Strictly speaking, this sort of structure is a containment hierarchy, where higher levels contain lower levels.  There are other sorts of hierarchies, for example primary/secondary/tertiary colors.  However, containment hierarchies are the most prominent kind.  Even hierarchies such as rank generally have containment associated with them -- if a colonel reports to a general, then that general is ultimately in command of the colonel's units (and presumably others).  The term hierarchy itself comes from the Greek for "rule of a high priest".  One of the most notable examples, of course, is the hierarchy of the Catholic church.

Containment hierarchies organize the world into units that our minds seem pretty good at comprehending, which probably why we're willing to overlook a major drawback: containment hierarchies almost always leak.

There are some possible exceptions.  One that comes to mind is the hierarchy of molecule/atom/subatomic particle/quark implied by the Standard Model.  Molecules are always composed of atoms and atoms of subatomic particles.  Of the subatomic particles in an atom, electrons (as far as we know) are elementary, having no simpler parts, while protons and neutrons are composed of quarks which (as far as we know) are also elementary.

Even here there are some wrinkles.  There are other elementary particles besides electrons and quarks that are not parts of atoms.  Electrons, protons and neutrons can all exist independently of atoms.  Some elements can exist without forming molecules.  Electrons in some types of molecule may not belong to particular atoms.  Even defining which atoms belong to which molecules can get tricky.

Perhaps a better example would be the classification of the types of elementary particles.  All (known) particles are unambiguously quarks, leptons, gauge bosons or scalar bosons.  Leptons and quarks are subdivided into generations, again with no room for ambiguity.  There are similar hierarchies in mathematics and other fields.

For most hierarchies, though, you have more than a bit of a mess.  Cities cross state lines, and while the different parts are administratively part of separate states, there will typically be citywide organizations, some with meaningful authority, that cross state lines.  Defining species and other taxonomic groups is notoriously contentious**.  One of the key points of Darwin's Origin is that you can't always find a satisfactory boundary -- the whole point of Origin is to explain why we so often can.

In astronomy, the designations of supercluster, galaxy cluster, galaxy and star system can all become murky or even arbitrary when several are interacting -- is that one merged galaxy, or two galaxies in the process of merging?  The distinction between star and planet can be troublesome as well, so it may not always be clear whether you have a planet orbiting a star or two companion stars.

On the internet, the distinction in notation between nested domains and hosts is clear, but the same (physical or virtual) computer can have multiple identities, even in different domains, and multiple computers can share the same host identity.  On the internet, what matters is which packets you respond to (and no one knows you're a dog).

And, of course, organization charts, arguably the prototypical example of a containment hierarchy, are in real life more what you'd call guidelines.  Beyond "dotted-line reports" and such, most real work crosses team boundaries and if everyone waited for every decision to percolate up and down the chain of command appropriately, nothing would get done (I've seen this attempted.  It did not go well).


So why group things into hierarchies anyway?

Again, there's clearly something about our minds that finds them natural.  In the early days of PCs, some of the prominent players originally started out storing files in one "flat" space.  If a floppy disk typically only held a handful of files, or even a few dozen, there was no harm in just listing them all out.  It didn't take long, however, until that got unwieldy.  People wanted to group related files together and, just as importantly, keep unrelated files separate.  Before long, all the major players had ported the concept of a "directory" or "folder" from the earlier "mainframe" operating systems -- which had themselves gone through roughly the same evolution.

Since computer scientists love nothing more than recursion, folders themselves could contain folders, and so on as far as you liked.  Somehow it didn't seem to bother anyone that this couldn't possibly work in a physical folder in a physical file drawer.

This all brought a new problem -- how to put things into folders.  There are at least two varieties of this problem (hmm ... problem subdivided into varieties ...).

For various reasons, some files needed to appear in multiple folders in identical form.  This is a problem not only for space reasons, but because you'd really like a change in a common file to show up everywhere instead of having to make the same change in an unknown number of copies.   This led to the rediscovery of "shortcuts" and "symbolic links", again already part of older operating systems, which allowed you to show the same physical file under multiple folders at the same time.

When it comes to organizing human-readable information, there's a different problem -- it's not always clear what folder to put things in.  Does a personal financial document go in the "personal" folder or the "financial" folder?  This problem leads us right back to ontology (the study of, among other things, how to categorize things) and library science.  Library science has always had to deal with this problem as well.  Does a book on the history of mathematics go under history (900s) or mathematics (510s).

There are always cases where you just have to decide one way or another, and then try to make the same arbitrary decision consistently in the future, hoping that some previously-unseen common thread will emerge that can then be codified into a rule.


The upshot, I think, is that hierarchies are a useful tool for organizing things for the convenience of human minds, not a property of the universe itself (except, arguably, in cases such as subatomic particles as discussed above).  As with any tool, there are costs and benefits to its use and it's best to weigh them before charging ahead.  Imposing a hierarchy that doesn't fit well isn't just wasted effort.   It can actively obscure what's really going on.

Interestingly enough, I now work for a company that takes an entirely different approach to organizing knowledge.  Don't worry about where something should be, or what grouping it should be in.  Just search for what's in it.

This has been remarkably successful.  It may be hard to remember, but for a while there was a brisk business in manually curating and categorizing information.  It's still done, of course, because it's still a useful exercise in some contexts, but it's no longer the primary way we find information on the web.  Now we just search.

OK, time to hit Publish.  Oh wait ... what labels should I put on this post?



* Dewey isn't the only game in town, just the one most widely used in US primary and secondary schools.  The local university library uses the Library of Congress Classification, which uses letters and numbers in a way that made my brain melt, not so much for looking more complex, I think, as for not being Dewey.

** My understanding is that the idea of a clade -- all (living) organisms descended from a given ancestor -- has come to be at least as important as the traditional taxonomic groupings, at least in some contexts, but I'm not a biologist.

Thursday, November 9, 2017

syl·lab·i·fi·ca·tion

[Author's note: When I started this, I thought it was going to touch on deep questions of language and cognition.  It ended up kinda meandering around some random bits of computer word-processing.  This happens sometimes.  I'm posting it anyway since, well, it's already written.  --D.H.]

Newspaper and magazine articles are traditionally typeset in narrow, justified columns. "Justified" here means that every line is the same width (unlike, say, with most blog posts).  If the words aren't big enough to fill out a line, the typesetter will widen the spaces to fill it out.  If the words are a bit too long, the typesetter might move the last word to the next line and then add space to what's left.

Originally, a typesetter was a person who physically inserted pieces of lead type into a form.  Later, it was a person operating a Linotype™ or similar machine to do the same thing.  These days it's mostly done by software.

Technically, laying out a paragraph to minimize the amount of extra space is not trivial, but certainly feasible, the kind of thing that would make a good undergraduate programming exercise.  Several algorithms are available.  They may not always produce results as nice as an experienced human typesetter, but they do well enough for most purposes.

One option for getting better line breaks and better-looking paragraphs is to hyphenate.  If your layout looks funny because you've got floccinaucinihilipilification in the middle of a line, you might try breaking it up as, say floccinaucinihili-
pilification.  It will probably be easier to lay out those two pieces rather than trying to make room for one large one.

You can't just throw a hyphen in anywhere.  There's a strong tendency to read whatever comes before and after the hyphen as independent units, so you don't want to break at wee-
knights or pre-
aches.

In many languages, probably most, this isn't a big problem.  For example, Spanish has an official set of rules that gives a clear hyphenation for any word (actually there are several of these, depending on what country you're in).  It's hard for English, though, for the same reason that spelling is hard for English -- English spelling is historical, not phonetic, and has so far resisted attempts at standardisation standardization and fonetissizing.

So instead we have the usual suspects, particularly style guides produced by various academic and media organizations.  This leads to statements like this one from the Chicago Manual of Style:
Chicago favors a system of word division based on pronunciation and more or less demonstrated by the recommendations in Webster’s tenth.
The FAQ list that that comes from has a few interesting cases, though I'm not sure that "How should I hyphenate Josephine Bellver's last name?" actually qualifies as a frequently asked question.  The one that interests me here concerns whether it should be "bio-logy" or "biol-ogy".  CMOS opts for "biol-ogy", going by pronunciation rather than etymology.

Which makes sense, in that consistently going by pronunciation probably makes reading easiest.  But it's also a bit ironic, in that English spelling is all about etymology over pronunciation.

Either approach is hard for computers to cope with, since they both require specific knowledge that's not directly evident from the text.  It's common to teach lists of rules, which computers do deal with reasonably well, but the problem with lists of rules for English is that they never, ever work.  For example, it's going to be hard to come up with a purely rule-based approach that divides "bark-ing" but also "bar-keeper".

This is why style guides tend to fall back on looser guidance like "divide the syllables as they're pronounced".  Except -- whose pronunciation?  When I was a kid I didn't pronounce an l in also or an n in government (I've since absorbed both of those from my surroundings).  I'm pretty sure most American speakers don't pronounce a t in often.  So how do you hyphenate those according to pronunciation?


Fortunately, computers don't have to figure this out.  A hyphenation dictionary for 100,000 words will cost somewhere around a megabyte, depending on how hard you try to compress it.  That's nothing in modern environments where a minimal "Hello world" program can run into megabytes all by itself (it doesn't have to, but it's very easy to eat a few megabytes on a trivial program without anyone noticing).

But what if the hyphenator runs across some new coinage or personal name that doesn't appear in the dictionary -- for example, whoever put the dictionary together didn't know about Josephine Bellver?  One option is just not to try to hyphenate those.  A refinement of that would be to allow the author to explicitly add a hyphen.  This should be the special "optional hyphen" character, so that you don't get hyphens showing up in the middle of lines if you later edit the text.  That way if you invent a really long neologism, it doesn't have to mess up your formatting.

If there's a point to any of this, it's that computers don't have to follow specific rules, except in the sense that anything a computer does follows specific rules.  While it might be natural for a compugeek to try to come up with the perfect hyphenation algorithm, the better engineering solution is probably to treat every known word as a special case and offer a fallback (or just punt) when that fails.

This wasn't always the right tradeoff.  Memory used to be expensive, and a tightly-coded algorithm will be much smaller than a dictionary.  But even then, there are tricks to be employed.  One of my all-time favorite hacks compressed a spelling dictionary down to a small bitmap that didn't even try to represent the actual words.  I'd include a link, but the only reference I know for it, Programming Pearls by Jon Bentley, isn't online.

Saturday, November 4, 2017

Surrender, puny humans!

A while ago, Deep Mind's AlphaGo beat human champion Lee Sedol at the game of go.  This wasn't just another case of machines beating humans at games of skill.

Granted, from a mathematical point of view it was nothing special.  Games like go, chess, checkers/draughts and tic-tac-toe, can in theory be "solved" by simply bashing out all the possible combinations of moves and seeing which ones lead to wins for which players.

Naturally the technical definition of "games like go, etc." is a bit, well, technical, but the most important stipulations are
  • perfect information -- each player has the same knowledge of the game as the others
  • no random elements
That leaves out card games like poker and bridge (imperfect information, random element) and Parcheesi (random element) and which-hand-did-I-hide-the-stone-in (imperfect information), but it includes most board games (Reversi, Connect 4, Pente, that game where you draw lines to make squares on a field of dots, etc. -- please note that most of these are trademarked).

From a practical point of view, there is sort of a pecking order:
  • Tic-tac-toe is so simple that you can write down the best strategy on a piece of paper.   Most people grow bored of it quickly since the cat always wins if everyone plays correctly, and pretty much everyone can.
  • Games like ghost or Connect 4 have been "strongly solved", meaning that there's a known algorithm for determining whether a given position is a win, loss or draw for the player whose turn it is.  Typically the winning strategy is fairly complex, in some cases too complex for a human to reasonably memorize.  A human will have no chance of doing better than a computer for such games (unless the computer is programmed to make mistakes), but might be able to do as well.
  • Checkers is too complex for humans to play perfectly, but it has been "weakly solved".  This means that it's been proved that with perfect play, the result is always a draw, but, not all legal positions have been analyzed, and there is currently nothing that will always be able to tell you if a particular position is a win for either side, or a draw.  In other words, for a weakly solved game, we can answer win/loss/draw for the initial position, and typically many others, but not for an arbitrary position.
  • Chess has not been solved, even in theory, but computer chess players that bash out large numbers of sequences of moves can consistently beat even the best human players.
In most cases the important factor in determining where a game fits in this order is the "branching factor", which is the average number of moves available at any given point.  In tic-tac-toe, there are nine first moves, eight second moves, and so on, and since the board is symmetrical there are effectively even fewer.  In many positions there's really only one (win with three-in-a-row or block your opponent from doing that).

In Connect 4, there are up to six legal moves in any position.  In checkers there can be a dozen or so.  In chess, a couple dozen is typical.  As with tic-tac-toe there are positions where there is only one legal move, or only one that makes sense, but those are relatively rare in most games.

In go, there are typically more than a hundred different possible moves, and go positions tend not to be symmetrical.  Most of the time a reasonably strong human player will only be looking at a small portion of the possible moves.  In order to have any hope of analyzing a situation, a computer has to be able to narrow down the possibilities by a similar amount.  But to beat a human, it has to be able to find plays that a human will miss.

I've seen go described as a more "strategic" game, one that humans can develop a "feel" for that computers can't emulate, but that's not entirely true.  Tactics can be quite important.  Much of the strategy revolves around deciding which tactical battles to pursue and which to leave for later or abandon entirely.  At least, that's my understanding.  I'm not really a go player.

AlphaGo, and programs like it, solved the narrowing-down problem by doing what humans do: collecting advice from strong human players and studying games played by them.  Historically this has meant a programmer working with an expert player to formulate rules that computers can interpret, along with people combing through games to glean more rules.

As I understand it (and I don't know anything more about Deep Mind or AlphaGo than the public), AlphaGo used machine learning techniques to automate this process, but the source material was still games played by human players.  [Re-reading this in light of a more recent post, I see I left out a significant point: AlphaGo (and AlphaZero) encode their evaluation of positions -- their understanding of the game -- as neural networks rather than explicit rules.  While a competent coder could look at the code for explicit rules and figure out what they were doing, no on really knows how to decode what a neural network is doing, at least not to the same level of detail -- D.H. Jan 2019]

The latest iteration (AlphaGo Zero, of course) dispenses with human input.  Rather than studying human games, it plays against itself, notes what works and what doesn't, and tries again after incorporating that new knowledge.  Since it's running on a pretty hefty pile of hardware, it can do this over and over again very quickly.

This approach worked out rather well.  AlphaGo Zero can beat the AlphaGo that beat Lee Sedol, making it presumably the strongest go player in the world.  [and it has since done the same thing with chess and shogi, though its superiority in chess is not clear-cut.  See the link above for more details.  -- D.H. Jan 2019]

On the one hand, this is not particularly surprising.  It's a classic example of what I call "dumb is smarter" on the other blog, where a relatively straightforward approach without a lot of built in assumptions can outperform a carefully crafted system with lots of specialized knowledge baked in.  This doesn't mean that dumb is necessarily smartest, only that it often performs better than one might expect, because the downside to specialized knowledge is specialized blind spots.

On the other hand, this is all undeniably spooky.  An AI system with no baked-in knowledge of human thought is able, with remarkably little effort, to outperform even the very best of us at a problem that had long been held up as something unreachable by AI, something that only human judgement could deal with effectively.  If computers can beat us at being human, starting essentially from scratch (bear in mind that the hardware that all this is running on is largely designed and built by machine these days), then what, exactly are we meat bags doing here?

So let's step back and look at the actual problem being solved: given a position on a go board, find the move that is most likely to lead to capturing the most stones and territory at the end of the game.

Put that way, this is a perfectly well-posed optimization problem of the sort that we've been using computers to solve for decades.  Generations, really, at this point.  Granted, one particular solution -- bashing out all possible continuations from a given position -- is clearly not best suited, but so what?  Finding the optimum shape -- or at least a better one -- for an airplane wing isn't well suited to that either, but we've made good progress on it anyway using different kinds of algorithms.

So "chess-style algorithms suck at go, therefore go is inherently hard" was a bad argument from the get-go.

From what I've seen in the press, even taking potential hype with a grain of salt, AlphaGo Zero is literally rewriting the books on go, having found opening moves that have escaped human notice for centuries.  But that doesn't mean this is an inherently hard problem.  Humans failing to find something they're looking for for centuries means it's a hard problem for humans.

We humans are just really bad at predicting what kinds of problems are inherently hard, which I'd argue is the same as being hard to solve by machine*.  Not so long ago the stereotype of a genius was someone who "knew every word in the dictionary" or "could multiply ten-digit numbers immediately", both of which actually turned out to be pretty easy to solve by machine.

Once it was clear that some "genius" problems were easy for machines, attention turned to things that were easy for people but hard for machines.  There have been plenty of those -- walking, recognizing faces, translating between speech and text, finding the best move on the go board.  Those held out for quite a long time as "things machines will never be able to do", but the tide has been turning on them as well thanks, I think, to two main developments:
  • We can now build piles of hardware that have, in a meaningful sense, more processing power than human brains.
  • With these new piles of hardware, techniques that looked promising in the past but never really performed are now able to perform well, the main example being neural network-style algorithms
At this point, I'm convinced that trying to come up with ever fuzzier and more human things that only human brains will ever be able to do is a losing bet.  Maybe not now, but in the long run.  I will not be surprised at all if I live to see, say
  • Real time speech translation that does as well as a human interpreter.
  • Something that can write a Petrarchan sonnet on a topic of choice, say the futility of chasing perfection, that an experienced and impartial reviewer would describe as "moving", "profound" and "original".
  • Something that could read a novel and write a convincing essay on it comparing the story to specific experiences in the real world, and answer questions about it in a way that left no choice but to say that in some meaningful sense the thing "understood" what it read.
  • Something that it would be hard to argue didn't have emotions -- though the argument would certainly be made.
[On the other hand, I also won't be shocked if these don't totally pan out in the next few decades --D.H. Feb 2019]

These all shade into Turing test territory.  I've argued that, despite Alan Turing's genius and influence, the Turing test is not necessarily a great test of whatever we mean by intelligence, and in particular it's easy to game because people are predisposed to assume intelligence.  I've also argued that "the Singularity" is an ill-defined concept, but that's really a different thread.  Nevertheless, I expect that, sooner or later, we will be able to build things that pass a Turing test with no trickery, in a sense that most people can agree on.

And that's OK.

Or at least, we're going to have to figure out how to be OK with it.  Stopping it from happening doesn't seem like a realistic option.

This puts us firmly in the territory of I, Robot and other science fiction of its era and more recently (the modern Westworld reboot comes to mind), which is one reason I chose the cheesy title I did.  Machines can already do a lot of things better than we can, and the list will only grow over time.  At the moment we still have a lot of influence over how that happens, but that influence will almost certainly decrease over time (the idea behind the Singularity is that this will happen suddenly, in fact nearly instantaneously, once the conditions are right).

The question now is how to make best use of what influence we still have while we still have it.  I don't really have any good, sharp answers to that, but I'm pretty sure it's the right question.


* There's a very well-developed field, complexity theory, dealing in what kinds of problems are hard or easy for various models of computing in an absolute, quantifiable sense.  This is largely distinct from the question of what kinds of games or other tasks computers should be good at, or at least better than humans at, though some games give good examples of various complexity classes.  One interesting result is that it's often easy (in a certain technical sense) to produce good-enough approximate solutions to problems that are provably very hard to solve exactly.  Another interesting result is that it can be relatively tricky to find hard examples of problems that are known to be hard in general.

Saturday, July 22, 2017

Yep. Tron.

It was winter when I started writing this, but writing posts about physics is hard, at least if you're not a physicist.  This one was particularly hard because I had to re-learn what I thought I knew about the topic, and then realize that I'd never really understood it as well as I'd thought, then try to learn it correctly, then realize that I also needed to re-learn some of the prerequisites, which led to a whole other post ... but just for the sake of illustration, let's pretend it's still winter.

If you live near a modest-sized pond or lake, you might (depending on the weather) see it freeze over at night and thaw during the day.  Thermodynamically this can be described in terms of energy (specifically heat) and entropy.  At night, the water is giving off heat into the surrounding environment and losing entropy (while its temperature stays right at freezing).  The surrounding environment is taking on heat and gaining entropy.  The surroundings gain at least as much entropy as the pond loses, and ultimately the Earth will radiate just that bit more heat into space.  When you do all the accounting, the entropy of the universe increases by just a tiny bit, relatively speaking.

During the day, the process reverses.  The water takes on heat and gains entropy (while its temperature still stays right at freezing).  The surroundings give off heat, which ultimately came from the sun, and lose entropy.  The water gains at least as much entropy as the surroundings lose*, and again the entropy of the universe goes up by just that little, tiny bit, relatively speaking.

So what is this entropy of which we speak?  Originally entropy was defined in terms of heat and temperature.  One of the major achievements of modern physics was to reformulate entropy in a more powerful and elegant form, revealing deep and interesting connections, thereby leading to both enlightenment and confusion.  The connections were deep enough that Claude Shannon, in his founding work on information theory, defined a similar concept with the same name, leading to even more enlightenment and confusion.

The original thermodynamic definition relies on the distinction between heat and temperature.  Temperature, at least in the situations we'll be discussing here, is a measure of how energetic individual particles -- typically atoms or molecules -- are on average.  Heat is a form of energy, independent of how many particles are involved.

The air in an oven heated to 500K (that is, 500 Kelvin, about 227 degrees Celsius or 440 degrees Fahrenheit) and a pot full of oil at 500K are, of course, at the same temperature, but you can safely put your hand in the oven for a bit.  The oil, not so much.  Why?  Mainly because there's a lot more heat in the oil than in the air.  By definition the molecules in the oven air are just as energetic, on average, as a the molecules the oil, but there are a lot more molecules of oil, and therefore a lot more energy, which is to say heat.

At least, that's the quick explanation for purposes of illustration.  Going into the real details doesn't change the basic point: heat is different from temperature and changing the temperature of something requires transferring energy (heat) to or from it.  As in the case of the pond freezing and melting, there are also cases where you can transfer heat to or from something without changing its temperature.  This will be important in what follows.

Entropy was originally defined as part of understanding the Carnot cycle, which describes the ideal heat-driven engine (the efficiency of a real engine is usually given as a percentage of what the Carnot cycle would produce, not as a percentage of the energy it uses).  Among the principal results in classical thermodynamics is that the Carnot cycle was as good as you can get even in principle, but not even it can ever be perfectly efficient, even in principle.

At this point it might be helpful to read that earlier post on energy, if you haven't already.  Particularly relevant parts here are that the state of the working fluid in a heat engine, such as the steam in a steam engine, can be described with two parameters, or, equivalently, as a point in a two-dimensional diagram, and that the cycle an engine goes through can be described by a path in that two-dimensional space.

Also keep in mind the ideal gas law: In an ideal gas, the temperature of a given amount of gas is proportional to pressure times volume.  Here and in the rest of this post, "gas" means "a substance without a fixed shape or volume" and not what people call "gasoline" or "petrol".

If you've ever noticed a bicycle pump heat up as you pump up a tire, that's (more or less) why.  You're compressing air, that is, decreasing its volume, so (unless the pump is able to spill heat with perfect efficiency, which it isn't) the temperature has to go up.  For the same reason the air coming out of a can of compressed air is dangerously cold.  The air is expanding rapidly so the temperature drops sharply.

In the Carnot cycle you first supply heat a to gas (the "working fluid", for example steam in a steam engine) while maintaining a perfectly constant temperature by expanding the container it's in.  You're heating that gas, in the sense of supplying heat, but not in the sense of raising its temperature.  Again, heat and temperature are two different things.

To continue the Carnot cycle, let the container keep expanding, but now in such a way that it neither gains nor loses heat (in technical terms, adiabatically).  In these first two steps, you're getting work out of the engine (for example, by connecting a rod to the moving part of a piston and attaching the other part of that rod to a wheel).  The gas is losing energy since it's doing work on the piston, and it's also expanding, so the temperature and pressure are both dropping, but no heat is leaving the container in the adiabatic step.

Work is force times distance, and force in this case is pressure times the area of the surface that's moving.    Since the pressure, and therefore the force, is dropping during the second step you'll need to use calculus to figure out the exact amount of work, but people know how to do that.

The last two steps of the cycle reverse the first two.  In step three you compress the gas, for example by changing the direction the piston is moving, while keeping the temperature the same.  This means the gas is cooling in the sense of giving off heat, but not in the sense of dropping in temperature.  Finally, in step four, compress the gas further, without letting it give off heat.  This raises the temperature.  The piston is doing work on the gas and the volume is decreasing.  In a perfect Carnot cycle the gas ends up in the same state -- same pressure, temperature and volume -- as it began and you can start it all over.

As mentioned in the previous post, you end up putting more heat in at the start then you end up getting back in the third step, and you end up getting more work out in the first two steps than you put in in the last two (because the pressure is higher in the first two steps).  Heat gets converted to work (or if you run the whole thing backwards, you end up with a refrigerator).

If you plot the Carnot cycle on a diagram of pressure versus volume, or the other two combinations of pressure, volume and temperature, you get a a shape with at least two curved sides, and it's hard to tell whether you could do better.  Carnot proved that this cycle is the best you can do, in terms of how much work you can get out of a given amount of heat, by choosing two parameters that make the cycle into a rectangle.  One is temperature -- steps one and three maintain a constant temperature.

The other needs to make the other two steps straight lines.  To make this work out, the second quantity has to remain constant while the temperature is changing, and change when temperature is constant.  The solution is to define a quantity -- call it entropy -- that changes, when temperature is constant, by the amount of heat transferred, divided by that temperature (ΔS = ΔQ/T -- the deltas (Δ) say that we're relating changes in heat and entropy, not absolute quantities; Q stands for heat and S stands for entropy, because reasons).  When there's no heat transferred, entropy doesn't change.  In step one, temperature is constant and entropy increases.  In step two, temperature decreases while entropy remains constant, and so forth.

To be clear, entropy and temperature can, in general, both change at the same time.  For example, if you heat a gas at constant volume, then pressure, temperature and entropy all go up.  The Carnot cycle is a special case where only one changes at a time.

Knowing the definition of entropy, you can convert, say, a pressure/volume diagram to a temperature/entropy diagram and back.  In real systems, the temperature/entropy version won't show absolutely straight vertical and horizontal lines -- that is, there will be at least some places where both change at the same time.  The Carnot cycle is exactly the case where the lines are perfectly horizontal and vertical.

This definition of entropy in terms of heat and temperature says nothing at all about what's going on in the gas, but it's enough, along with some math I won't go into here (but which depends on the cycle being a rectangle), to prove Carnot's result: The portion of heat wasted in a Carnot cycle is the ratio of the cold temperature to the hot temperature (on an absolute temperature scale).  You can only have zero loss -- 100% efficiency -- if the cold temperature is absolute zero.  Which it won't be.

Any cycle that deviates from a perfect rectangle will be less efficient yet.  In real life this is inevitable.  You can come pretty close on all the steps, but not perfectly close.  In real life you don't have an ideal gas, you can't magically switch from being able to put heat into the gas to perfectly insulating it, you won't be able to transfer all the heat from your heat source to the gas, you won't be able to capture all the heat from the third step of the cycle to reuse in the first step of the next cycle, some of the energy of the moving piston will be lost to friction (that is, dissipated into the surroundings as heat) and so on.

The problem-solving that goes into minimizing inefficiencies in real engines is why engineering came to be called engineering and why the hallmark of engineering is getting usefulness out of imperfection.



There are other cases where heat is transferred at a constant temperature, and we can define entropy in the same way as for a gas.  For example, temperature doesn't change during a phase change such as melting or freezing.  As our pond melts and freezes, the temperature stays right at freezing until the pond completely freezes, at which point it can get cooler, or melts entirely, at which point it can get warmer.

If all you know is that some water is at the freezing point, you can't say how much heat it will take to raise the temperature above freezing without knowing how much of it is frozen and how much is liquid.  The concept of entropy is perfectly valid here -- it relates directly to how much of the pond is liquid -- and we can define "entropy of fusion" to account for phase transitions.

There are plenty of other cases that don't look quite so much like the ideal gas case but still involve changes of entropy.  Mixing two substances increases overall entropy.  Entropy is a determining factor in whether a chemical reaction will go forward or backward and in ice melting when you throw salt on it.


Before I go any further about thermodynamic entropy, let me throw in that Claude Shannon's definition of entropy in information theory is, informally, a measure of the number of distinct messages that could have been transmitted in a particular situation.  On the other blog, for example, I've ranted about bits of entropy for passwords.  This is exactly a measure of how many possible passwords there are in a given scheme for picking passwords.

What in the world does this have to do with transferring heat at a constant temperature?  Good question.

Just as the concept of energy underwent several shifts in understanding on the way to its current formulation, so did entropy.  The first major shift came with the development of statistical mechanics.  Here "mechanics" refers to the behavior of physical objects, and "statistical" means you've got enough of them that you're only concerned about their overall behavior.

Statistical mechanics models an ideal gas as a collection of particles bouncing around in a container.  You can think of this as a bunch of tiny balls bouncing around in a box, but there's a key difference from what you might expect from that image.  In an ideal gas, all the collisions are perfectly elastic, meaning that the energy of motion (called kinetic energy) remains the same before and after.  In a real box full of balls, the kinetic energy of the balls gets converted to heat as the balls bump into each other and push each other's molecules around, and sooner or later the balls stop bouncing.

But the whole point of the statistical view of thermodynamics is that heat is just the kinetic energy of the particles the system is made up of.  When actual bouncing balls lose energy to heat, that means that the kinetic energy of the large-scale motion of the balls themselves is getting converted into kinetic energy of the small-scale motion of the molecules the balls are made of, and of the air in the box, and of the walls of the box, and eventually the surroundings.  That is, the large scale motion we can see is getting converted into a lot of small-scale motion that we can't, which we call heat.

When two particles, say two oxygen molecules, bounce off each other, the kinetic energy of the moving particles just gets converted into kinetic energy of differently-moving particles, and that's it.  In the original formulation of statistical mechanics, there's simply no other place for that energy to go, no smaller-scale moving parts to transfer energy to (assuming there's no chemical reaction between the two -- if you prefer, put pure helium in the box).

When a particle bounces off the wall of the container, it imparts a small impulse -- an instantaneous force -- to the walls.  When a whole lot of particles continually bounce off the walls of a container, those instantaneous forces add up to (for all practical purposes) a continuous force, that is, pressure.

Temperature is the average kinetic energy of the particles and volume is, well, volume.  That gives us our basic parameters of temperature, pressure and volume.

But what is entropy, in this view?  In statistical mechanics, we're concerned about the large-scale (macroscopic) state of the system, but there are many different small-scale (microscopic) states that could give the same macroscopic picture.

Once you crank through all the math, it turns out that entropy is a measure of how many different microscopic states, which we can't measure, are consistent with the macroscopic state, which we can measure.  In fuller detail, entropy is actually proportional to the logarithm of that number -- the number of digits, more or less -- both because the raw numbers are ridiculously big, and because that way the entropy of two separate systems is the sum of the entropy of the individual systems.

The actual formula is S = k ln(W), where k is Boltzmann's constant and W is the total number of possible microstates, assuming they're all equally probable.  There's a slightly bigger formula if they're not.  Note that, unlike the original thermodynamic definition, this formula deals in absolute quantities, not changes.

When ice melts, entropy increases.  Water molecules in ice are confined to fixed positions in a crystal.  We may not know the exact energy of each individual molecule, but we at least know more or less where it is, and we know that if the energy of such a molecule is too high, it will leave the crystal (if this happens on a large scale, the crystal melts).  Once it does, we know much less about its location or energy.

Even without a phase change, the same sort of reasoning applies.  As temperature -- the average energy of each particle -- increases, the range of energies each particle can have increases.  How to translate this continuous range of energies into a number we can count is a bit of a puzzle, but we can handwave around that for now.

Entropy is often called a measure of disorder, but more accurately it's a measure of uncertainty (as theoretical physicist Sabine Hossenfelder puts it: "a measure for unresolved microscopic details"), that is, how much we don't know.  That's why Shannon used the same term in information theory.  The entropy of a message measures how much we don't know about it just from knowing its size (and a couple of other macroscopic parameters).  Shannon entropy is also logarithmic, for the same reasons that thermodynamic entropy is.

The formula for Shannon entropy in the case that all possible messages are equally probable is H = k ln(M), where M is the number of messages.  I put k there to account for the logarithm usually being base 2 and because it emphasizes the similarity to the other definition.  Again, there's a slightly bigger formula if the various messages aren't all equally probable, and it too looks an awful lot like the corresponding formula for thermodynamic entropy.

The original formulation of statistical mechanics assumed that physics at the microscopic scale followed Newton's laws of motion.  One indication that statistical mechanics was on to something is that when quantum mechanics completely reformulated what physics looks like at the microscopic scale, the statistical formulation not only held up, but became more accurate with the new information available.

In our current understanding, when two oxygen molecules bounce off each other, their electron shells interact (there's more going on, but let's start there), and eventually their energy gets redistributed into a new configuration.  This can mean the molecules traveling off in new paths, but it could also mean that some of the kinetic energy gets transferred to the electrons themselves, or some of the electrons' energy gets converted into kinetic energy.

Macroscopically this all looks the same as the old model, if you have huge numbers of molecules, but in the quantum formulation we have a more precise picture of entropy.  This makes a difference in extreme situations such as extremely cold crystals.  Since energy is quantized, there is a finite (though mind-bendingly huge) number of possible quantum states a typical system can have, and we can stop handwaving about how to handle ranges of possible energy.  This all works whether you have a gas, a liquid, an ordinary solid or some weird Bose-Einstein condensate.  Entropy measures that number of possible quantum states.

Thermodynamic entropy and information theoretic entropy are measuring basically the same thing, namely the number of specific possibilities consistent with what we know in general.  In fact, the modern definition of thermodynamic entropy specifically starts with a raw number of possible states and includes a constant factor to convert from the raw number to the units (energy over temperature) of classical thermodynamics.

This makes the two notions of entropy look even more alike -- they're both based on a count of possibilities, but with different scaling factors.  Below I'll even talk, loosely, of "bits worth of thermodynamic entropy" meaning the number of bits in the binary number for the number of possible quantum states.

Nonetheless, they're not at all the same thing in practice.

Consider a molecule of DNA.  There are dozens of atoms, and hundreds of subatomic particles, in a base pair.  I really don't know how many possible states a phosphorous atom (say) could be in under typical conditions, but I'm going to guess that there are thousands of bits worth of entropy in a base pair at room temperature.  Even if each individual particle can only be on one of two possible states, you've still got hundreds of bits.

From an information-theoretic point of view, there are four possible states for a base pair, which is two bits, and because the genetic code actually includes a fair bit of redundancy in the form of different ways of coding the same amino acid and so forth, it's actually more like 10/6 of a bit, even without taking into account other sources of redundancy.

But there is a lot of redundancy in your genome, as far as we can tell, in the form of duplicated genes and stretches of DNA that might or might not do anything.  All in all, there is about a gigabyte worth of base pairs in a human genome, but the actual gene-coding information can compress down to a few megabytes.  The thermodynamic entropy of the molecule that encodes those megabytes is much, much, larger.  If each base pair represents about a thousand bits worth of thermodynamic entropy under typical conditions, then the whole strand is into the hundreds of gigabytes.

I keep saying "under typical conditions" because thermodynamic entropy, being thermodynamic, depends on temperature.  If you have a fever, your body, including your DNA molecules in particular, has higher entropy than if you're sitting in an ice bath.  The information theoretic entropy, on the other hand, doesn't change.

But all this is dwarfed by another factor.  You have billions of cells in your body (and trillions of bacterial cells that don't have your DNA, but never mind that).  From a thermodynamic standpoint, each of those cells -- its DNA, its RNA, its proteins, lipids, water and so forth -- contributes to the overall entropy of your body.  A billion identical strands of DNA at a given temperature have the same information content as a single strand but a billion times the thermodynamic entropy.

If you want to compare bits to bits, the Shannon entropy of your DNA is inconsequential compared to the thermodynamic entropy of your body.  Even the change in the thermodynamic entropy of your body as you breathe is enormously bigger than the Shannon entropy of your DNA.

I mention all this because from time to time you'll see statements about genetics and the second law of thermodynamics.  The second law, which is very well established, states that the entropy of a closed system cannot decrease over time.  One implication of it is that heat doesn't flow from cold to hot, which is a key assumption in Carnot's proof.

Sometimes the second law is taken to mean that genomes can't get "more complex" over time, since that would violate the second law.  The usual response to this is that living cells aren't closed systems and therefore the second law doesn't apply.  That's perfectly valid.  However, I think a better answer is that this confuses two forms of entropy -- thermodynamic entropy and Shannon entropy -- which are just plain different.  In other words, thermodynamic entropy and the second law don't work that way.

From an information point of view, the entropy of a genome is just how many bits it encodes once you compress out any redundancy.  Longer genomes typically have more entropy.  From a thermodynamic point of view, at a given temperature, more of the same substance has higher entropy than less as well, but we're measuring different quantities.

A live elephant has much, much higher entropy than a live mouse, and likewise for a live human versus a live mouse.  As it happens, a mouse genome is roughly the same size as a human genome, even though there's a huge difference in thermodynamic entropy between a live human and a live mouse.  The mouse genome is slightly smaller than ours, but not a lot.  There's no reason it couldn't be larger, and certainly no thermodynamic reason.  Neither the mouse nor human genome is particularly large.  Several organisms have genomes dozens of times larger, at least in terms of raw base pairs.

From a thermodynamic point of view, it hardly matters what exact content a DNA molecule has.  There are some minor differences in thermodynamic behavior among the particular base pairs, and in some contexts it makes a slight difference what order they're arranged in, but overall the gene-copying machinery works the same whether the DNA is encoding a human digestive protein or nothing at all.  Differences in gene content are dwarfed by the thermodynamic entropy change of turning one strand of DNA and a supply of loose nucleotides into two strands, that in turn is dwarfed by everything else going on in the cell, and that in turn is dwarfed by the jump from one cell to billions.

For what it's worth, content makes even less thermodynamic difference in other forms of storage.  A RAM chip full of random numbers has essentially the same thermodynamic entropy, at a given temperature, as one containing all zeroes or all ones, even though those have drastically different Shannon entropies.  The thermodynamic entropy changes involved in writing a single bit to memory are going to equate to a lot more than one bit.

Again, this is all assuming it's valid to compare the two forms of entropy at all, based on their both being measures of uncertainty about what exact state a system is in, and again, the two are not actually comparable, even though they're similar in form.  Comparing the two is like trying to compare a football score to a basketball score on the basis that they're both counting the number of times the teams involved have scored goals.


There's a lot more to talk about here, for example the relation between symmetry and disorder (more disorder means more symmetry, which was not what I thought until I sat down to think about it), and the relationship between entropy and time (for example, as experimental physicist Richard Muller points out, local entropy decreases all the time without time appearing to flow backward), but for now I think I've hit the main points:
  • The second law of thermodynamics is just that -- a law of thermodynamics
  • Thermodynamic entropy as currently defined and information-theoretic (Shannon) entropy are two distinct concepts, even though they're very similar in form and derivation.
  • The two are defined in different contexts and behave entirely differently, despite what we might think from them having the same name.
  • Back at the first point, the second law of thermodynamics says almost nothing about Shannon entropy, even though you can, if you like, use the same terminology in counting quantum states.
  • All this has even less to do with genetics.

* Strictly speaking, you need to take the Sun into account.  The Sun is gaining entropy over time, at a much, much higher rate than our little pond and its surroundings, and it's only an insignificantly tiny part of the universe.  But even if you had a closed system, of a pond and surroundings that were sometimes warm and sometimes cold, for whatever reason, the result would be the same: The entropy of a closed system increases over time.

Wednesday, July 19, 2017

The human perspective and its limits

A couple more points occurred to me after I hit "publish" on the previous post.  Both of them revolve around subjectivity versus objectivity, and to what extent we might be limited by our human perspective.


In trying to define whether a kind of behavior is simple or complex, I gave two different notions which I claimed were equivalent: how hard it is to describe and how hard it is to build something to copy it.

The first is, in a sense, subjective, because it involves our ability to describe and understand things.  Since we describe things using language, it's tied to what fits well with language.  The second is much more objective.  If I build a chess-playing robot, something with no knowledge of human language or of chess could figure out what it was doing, at least in principle.

One of the most fundamental results in computer science is that there are a number of very simple computing models (stack machines, lambda calculus, combinators, Turing machines, cellular automata, C++ templates ... OK, maybe not always so simple) which are "functionally complete".  That means that any of them can compute any "total recursive function". This covers a wide range of problems, from adding numbers to playing chess to finding cute cat videos and beyond.

It doesn't matter which model you choose.  Any of them can be used to simulate any of the others.  Even a quantum computer is still computing the same kinds of functions [um ... not 100% sure about that ... should run that down some day --D.H.].  The fuss there is about the possibility that a quantum computer could compute certain difficult functions exponentially faster than a non-quantum computer.

Defining a totally recursive function for a problem basically means translating it into mathematical terms, in other words, describing it objectively.  Computability theory says that if you can do that, you can write a program to compute it, essentially building something to perform the task (generally you tell a general-purpose computer to execute the code you wrote, but if you really want to you can build a physical circuit to do the what the computer would do).

So the two notions, of describing a task clearly and producing something to perform it are, provably, equivalent.  There are some technical issues with the notion of complexity here that I'm going to gloss over.  The whole P = NP thing revolves around whether being able to state a problem simply means being able to solve it simply, but when it comes to deciding whether recognizing faces is harder than walking, I'm going to claim we can leave that aside.

The catch here is that my notion of objectivity -- defining a computable function -- is ultimately based on mathematics, which in turn is based on our notion of what it means to prove something (the links between computing and theorem proving are interesting and deep, but we're already in deep enough as it is).  Proof, in turn, is -- at least historically -- based on how our minds work, and in particular how language works.  Which is what I called "subjective" at the top.

So, is our notion of how hard something is to do mechanically -- my ostensibly "objective" definition -- limited by our modes of reasoning, particularly verbal reasoning, or is verbal/mathematical reasoning a fundamentally powerful way of describing things that we happened to discover because we developed minds capable of apprehending it?  I'd tend to think the latter, but then maybe that's just a human bias.



Second, as to our tendency to think that particularly human things like language and house-building are special, that might not just be arrogance, even if we're not really as special as we'd like to think.  We have a theory of mind, and not just of human minds.  We attribute very human-like motivations to other animals, and I'd argue that in many, maybe most, cases we're right.  Moreover, we also attribute different levels of consciousness to different things (things includes machines, which we also anthropomorphize).

There's a big asymmetry there: we actually experience our own consciousness, and we assume other people share the same level of consciousness, at least under normal circumstances, and we have that confirmed as we communicate our consciousnesses to each other.  It's entirely natural, then, to see our own intelligence and consciousness, which we see from the inside in the case of ourselves and close up in the case of other people, as particularly richer and more vivid.  This is difficult to let go of when trying to study other kinds of mind, but it seems to me it's essential at least to try.