Thursday, December 7, 2017

Where should I file this, and do I care?

I used to love to browse the card catalog at the local library (so yep ... geek).  This wasn't just for the books, but for the way they were organized.  The local library, along with my middle and high school libraries, used the Dewey Decimal Classification (or "Dewey Decimal System" as I remember it actually being called).

This was, to my eyes, a beautiful way of sorting books.  The world was divided into ten categories, each given a range of a hundred numbers, from 000-099 for "information and general works" (now also including computer science) to 900-999 for history and geography.  Within those ranges, subjects were further divided by number.  Wikipedia gives a good example:
500 Natural sciences and mathematics
510 Mathematics
516 Geometry
516.3 Analytic geometries
516.37 Metric differential geometries
516.375 Finsler geometry
Finsler geometry is pretty specialized (a Finsler manifold is a differentiable manifold whose metric has particular properties -- I had to look that up).  Clearly you could keep adding digits as long as you like, slicing ever finer, though in practice there are never more than a few (maybe just three?) after the decimal point.

With the Dewey classification in place, you could walk into libraries around the country, indeed around the world, and quickly figure out where, say, the books on gardening, medieval musical instruments or truck repair were located.  If you or the librarian found a book lying around, you could quickly put it back in its proper place on the shelves.  If you found a book you liked, you could find other books on related topics near it, whether on the shelves or in the card catalog (what's that, Grandpa?).

On top of that, the field of library science, in which the Dewey classification and others like it* play a central role is one of the precursors of computer science as we know it.  This is true at several levels, from the very idea of organizing large amounts of information (and making it universally accessible and useful), to the idea of using an index that can easily be modified as new items are added.

There's one other very significant aspect of library classification systems like Dewey: hierarchy.

It's almost too obvious to mention, but in the Dewey Classification, and others like it, the world is organized into high-level categories (natural sciences and mathematics), which contain smaller, more specific categories (mathematics), and so on down to the bottom level (Finsler geometry).  There are lots and lots of systems like this -- federal/state/local government in the US and similar systems elsewhere; domain/kingdom/phylum/class/order/family/genus/species in taxonomy; supercluster/galaxy cluster/galaxy/star system/star in astronomy; top-level-domain/domain/.../host and so forth.

Strictly speaking, this sort of structure is a containment hierarchy, where higher levels contain lower levels.  There are other sorts of hierarchies, for example primary/secondary/tertiary colors.  However, containment hierarchies are the most prominent kind.  Even hierarchies such as rank generally have containment associated with them -- if a colonel reports to a general, then that general is ultimately in command of the colonel's units (and presumably others).  The term hierarchy itself comes from the Greek for "rule of a high priest".  One of the most notable examples, of course, is the hierarchy of the Catholic church.

Containment hierarchies organize the world into units that our minds seem pretty good at comprehending, which probably why we're willing to overlook a major drawback: containment hierarchies almost always leak.

There are some possible exceptions.  One that comes to mind is the hierarchy of molecule/atom/subatomic particle/quark implied by the Standard Model.  Molecules are always composed of atoms and atoms of subatomic particles.  Of the subatomic particles in an atom, electrons (as far as we know) are elementary, having no simpler parts, while protons and neutrons are composed of quarks which (as far as we know) are also elementary.

Even here there are some wrinkles.  There are other elementary particles besides electrons and quarks that are not parts of atoms.  Electrons, protons and neutrons can all exist independently of atoms.  Some elements can exist without forming molecules.  Electrons in some types of molecule may not belong to particular atoms.  Even defining which atoms belong to which molecules can get tricky.

Perhaps a better example would be the classification of the types of elementary particles.  All (known) particles are unambiguously quarks, leptons, gauge bosons or scalar bosons.  Leptons and quarks are subdivided into generations, again with no room for ambiguity.  There are similar hierarchies in mathematics and other fields.

For most hierarchies, though, you have more than a bit of a mess.  Cities cross state lines, and while the different parts are administratively part of separate states, there will typically be citywide organizations, some with meaningful authority, that cross state lines.  Defining species and other taxonomic groups is notoriously contentious**.  One of the key points of Darwin's Origin is that you can't always find a satisfactory boundary -- the whole point of Origin is to explain why we so often can.

In astronomy, the designations of supercluster, galaxy cluster, galaxy and star system can all become murky or even arbitrary when several are interacting -- is that one merged galaxy, or two galaxies in the process of merging?  The distinction between star and planet can be troublesome as well, so it may not always be clear whether you have a planet orbiting a star or two companion stars.

On the internet, the distinction in notation between nested domains and hosts is clear, but the same (physical or virtual) computer can have multiple identities, even in different domains, and multiple computers can share the same host identity.  On the internet, what matters is which packets you respond to (and no one knows you're a dog).

And, of course, organization charts, arguably the prototypical example of a containment hierarchy, are in real life more what you'd call guidelines.  Beyond "dotted-line reports" and such, most real work crosses team boundaries and if everyone waited for every decision to percolate up and down the chain of command appropriately, nothing would get done (I've seen this attempted.  It did not go well).


So why group things into hierarchies anyway?

Again, there's clearly something about our minds that finds them natural.  In the early days of PCs, some of the prominent players originally started out storing files in one "flat" space.  If a floppy disk typically only held a handful of files, or even a few dozen, there was no harm in just listing them all out.  It didn't take long, however, until that got unwieldy.  People wanted to group related files together and, just as importantly, keep unrelated files separate.  Before long, all the major players had ported the concept of a "directory" or "folder" from the earlier "mainframe" operating systems -- which had themselves gone through roughly the same evolution.

Since computer scientists love nothing more than recursion, folders themselves could contain folders, and so on as far as you liked.  Somehow it didn't seem to bother anyone that this couldn't possibly work in a physical folder in a physical file drawer.

This all brought a new problem -- how to put things into folders.  There are at least two varieties of this problem (hmm ... problem subdivided into varieties ...).

For various reasons, some files needed to appear in multiple folders in identical form.  This is a problem not only for space reasons, but because you'd really like a change in a common file to show up everywhere instead of having to make the same change in an unknown number of copies.   This led to the rediscovery of "shortcuts" and "symbolic links", again already part of older operating systems, which allowed you to show the same physical file under multiple folders at the same time.

When it comes to organizing human-readable information, there's a different problem -- it's not always clear what folder to put things in.  Does a personal financial document go in the "personal" folder or the "financial" folder?  This problem leads us right back to ontology (the study of, among other things, how to categorize things) and library science.  Library science has always had to deal with this problem as well.  Does a book on the history of mathematics go under history (900s) or mathematics (510s).

There are always cases where you just have to decide one way or another, and then try to make the same arbitrary decision consistently in the future, hoping that some previously-unseen common thread will emerge that can then be codified into a rule.


The upshot, I think, is that hierarchies are a useful tool for organizing things for the convenience of human minds, not a property of the universe itself (except, arguably, in cases such as subatomic particles as discussed above).  As with any tool, there are costs and benefits to its use and it's best to weigh them before charging ahead.  Imposing a hierarchy that doesn't fit well isn't just wasted effort.   It can actively obscure what's really going on.

Interestingly enough, I now work for a company that takes an entirely different approach to organizing knowledge.  Don't worry about where something should be, or what grouping it should be in.  Just search for what's in it.

This has been remarkably successful.  It may be hard to remember, but for a while there was a brisk business in manually curating and categorizing information.  It's still done, of course, because it's still a useful exercise in some contexts, but it's no longer the primary way we find information on the web.  Now we just search.

OK, time to hit Publish.  Oh wait ... what labels should I put on this post?



* Dewey isn't the only game in town, just the one most widely used in US primary and secondary schools.  The local university library uses the Library of Congress Classification, which uses letters and numbers in a way that made my brain melt, not so much for looking more complex, I think, as for not being Dewey.

** My understanding is that the idea of a clade -- all (living) organisms descended from a given ancestor -- has come to be at least as important as the traditional taxonomic groupings, at least in some contexts, but I'm not a biologist.

Thursday, November 9, 2017

syl·lab·i·fi·ca·tion

[Author's note: When I started this, I thought it was going to touch on deep questions of language and cognition.  It ended up kinda meandering around some random bits of computer word-processing.  This happens sometimes.  I'm posting it anyway since, well, it's already written.  --D.H.]

Newspaper and magazine articles are traditionally typeset in narrow, justified columns. "Justified" here means that every line is the same width (unlike, say, with most blog posts).  If the words aren't big enough to fill out a line, the typesetter will widen the spaces to fill it out.  If the words are a bit too long, the typesetter might move the last word to the next line and then add space to what's left.

Originally, a typesetter was a person who physically inserted pieces of lead type into a form.  Later, it was a person operating a Linotype™ or similar machine to do the same thing.  These days it's mostly done by software.

Technically, laying out a paragraph to minimize the amount of extra space is not trivial, but certainly feasible, the kind of thing that would make a good undergraduate programming exercise.  Several algorithms are available.  They may not always produce results as nice as an experienced human typesetter, but they do well enough for most purposes.

One option for getting better line breaks and better-looking paragraphs is to hyphenate.  If your layout looks funny because you've got floccinaucinihilipilification in the middle of a line, you might try breaking it up as, say floccinaucinihili-
pilification.  It will probably be easier to lay out those two pieces rather than trying to make room for one large one.

You can't just throw a hyphen in anywhere.  There's a strong tendency to read whatever comes before and after the hyphen as independent units, so you don't want to break at wee-
knights or pre-
aches.

In many languages, probably most, this isn't a big problem.  For example, Spanish has an official set of rules that gives a clear hyphenation for any word (actually there are several of these, depending on what country you're in).  It's hard for English, though, for the same reason that spelling is hard for English -- English spelling is historical, not phonetic, and has so far resisted attempts at standardisation standardization and fonetissizing.

So instead we have the usual suspects, particularly style guides produced by various academic and media organizations.  This leads to statements like this one from the Chicago Manual of Style:
Chicago favors a system of word division based on pronunciation and more or less demonstrated by the recommendations in Webster’s tenth.
The FAQ list that that comes from has a few interesting cases, though I'm not sure that "How should I hyphenate Josephine Bellver's last name?" actually qualifies as a frequently asked question.  The one that interests me here concerns whether it should be "bio-logy" or "biol-ogy".  CMOS opts for "biol-ogy", going by pronunciation rather than etymology.

Which makes sense, in that consistently going by pronunciation probably makes reading easiest.  But it's also a bit ironic, in that English spelling is all about etymology over pronunciation.

Either approach is hard for computers to cope with, since they both require specific knowledge that's not directly evident from the text.  It's common to teach lists of rules, which computers do deal with reasonably well, but the problem with lists of rules for English is that they never, ever work.  For example, it's going to be hard to come up with a purely rule-based approach that divides "bark-ing" but also "bar-keeper".

This is why style guides tend to fall back on looser guidance like "divide the syllables as they're pronounced".  Except -- whose pronunciation?  When I was a kid I didn't pronounce an l in also or an n in government (I've since absorbed both of those from my surroundings).  I'm pretty sure most American speakers don't pronounce a t in often.  So how do you hyphenate those according to pronunciation?


Fortunately, computers don't have to figure this out.  A hyphenation dictionary for 100,000 words will cost somewhere around a megabyte, depending on how hard you try to compress it.  That's nothing in modern environments where a minimal "Hello world" program can run into megabytes all by itself (it doesn't have to, but it's very easy to eat a few megabytes on a trivial program without anyone noticing).

But what if the hyphenator runs across some new coinage or personal name that doesn't appear in the dictionary -- for example, whoever put the dictionary together didn't know about Josephine Bellver?  One option is just not to try to hyphenate those.  A refinement of that would be to allow the author to explicitly add a hyphen.  This should be the special "optional hyphen" character, so that you don't get hyphens showing up in the middle of lines if you later edit the text.  That way if you invent a really long neologism, it doesn't have to mess up your formatting.

If there's a point to any of this, it's that computers don't have to follow specific rules, except in the sense that anything a computer does follows specific rules.  While it might be natural for a compugeek to try to come up with the perfect hyphenation algorithm, the better engineering solution is probably to treat every known word as a special case and offer a fallback (or just punt) when that fails.

This wasn't always the right tradeoff.  Memory used to be expensive, and a tightly-coded algorithm will be much smaller than a dictionary.  But even then, there are tricks to be employed.  One of my all-time favorite hacks compressed a spelling dictionary down to a small bitmap that didn't even try to represent the actual words.  I'd include a link, but the only reference I know for it, Programming Pearls by Jon Bentley, isn't online.

Saturday, November 4, 2017

Surrender, puny humans!

A while ago, Deep Mind's AlphaGo beat human champion Lee Sedol at the game of go.  This wasn't just another case of machines beating humans at games of skill.

Granted, from a mathematical point of view it was nothing special.  Games like go, chess, checkers/draughts and tic-tac-toe, can in theory be "solved" by simply bashing out all the possible combinations of moves and seeing which ones lead to wins for which players.

Naturally the technical definition of "games like go, etc." is a bit, well, technical, but the most important stipulations are
  • perfect information -- each player has the same knowledge of the game as the others
  • no random elements
That leaves out card games like poker and bridge (imperfect information, random element) and Parcheesi (random element) and which-hand-did-I-hide-the-stone-in (imperfect information), but it includes most board games (Reversi, Connect 4, Pente, that game where you draw lines to make squares on a field of dots, etc. -- please note that most of these are trademarked).

From a practical point of view, there is sort of a pecking order:
  • Tic-tac-toe is so simple that you can write down the best strategy on a piece of paper.   Most people grow bored of it quickly since the cat always wins if everyone plays correctly, and pretty much everyone can.
  • Games like ghost or Connect 4 have been "strongly solved", meaning that there's a known algorithm for determining whether a given position is a win, loss or draw for the player whose turn it is.  Typically the winning strategy is fairly complex, in some cases too complex for a human to reasonably memorize.  A human will have no chance of doing better than a computer for such games (unless the computer is programmed to make mistakes), but might be able to do as well.
  • Checkers is too complex for humans to play perfectly, but it has been "weakly solved".  This means that it's been proved that with perfect play, the result is always a draw, but, not all legal positions have been analyzed, and there is currently nothing that will always be able to tell you if a particular position is a win for either side, or a draw.  In other words, for a weakly solved game, we can answer win/loss/draw for the initial position, and typically many others, but not for an arbitrary position.
  • Chess has not been solved, even in theory, but computer chess players that bash out large numbers of sequences of moves can consistently beat even the best human players.
In most cases the important factor in determining where a game fits in this order is the "branching factor", which is the average number of moves available at any given point.  In tic-tac-toe, there are nine first moves, eight second moves, and so on, and since the board is symmetrical there are effectively even fewer.  In many positions there's really only one (win with three-in-a-row or block your opponent from doing that).

In Connect 4, there are up to six legal moves in any position.  In checkers there can be a dozen or so.  In chess, a couple dozen is typical.  As with tic-tac-toe there are positions where there is only one legal move, or only one that makes sense, but those are relatively rare in most games.

In go, there are typically more than a hundred different possible moves, and go positions tend not to be symmetrical.  Most of the time a reasonably strong human player will only be looking at a small portion of the possible moves.  In order to have any hope of analyzing a situation, a computer has to be able to narrow down the possibilities by a similar amount.  But to beat a human, it has to be able to find plays that a human will miss.

I've seen go described as a more "strategic" game, one that humans can develop a "feel" for that computers can't emulate, but that's not entirely true.  Tactics can be quite important.  Much of the strategy revolves around deciding which tactical battles to pursue and which to leave for later or abandon entirely.  At least, that's my understanding.  I'm not really a go player.

AlphaGo, and programs like it, solved the narrowing-down problem by doing what humans do: collecting advice from strong human players and studying games played by them.  Historically this has meant a programmer working with an expert player to formulate rules that computers can interpret, along with people combing through games to glean more rules.

As I understand it (and I don't know anything more about Deep Mind or AlphaGo than the public), AlphaGo used machine learning techniques to automate this process, but the source material was still games played by human players.  [Re-reading this in light of a more recent post, I see I left out a significant point: AlphaGo (and AlphaZero) encode their evaluation of positions -- their understanding of the game -- as neural networks rather than explicit rules.  While a competent coder could look at the code for explicit rules and figure out what they were doing, no on really knows how to decode what a neural network is doing, at least not to the same level of detail -- D.H. Jan 2019]

The latest iteration (AlphaGo Zero, of course) dispenses with human input.  Rather than studying human games, it plays against itself, notes what works and what doesn't, and tries again after incorporating that new knowledge.  Since it's running on a pretty hefty pile of hardware, it can do this over and over again very quickly.

This approach worked out rather well.  AlphaGo Zero can beat the AlphaGo that beat Lee Sedol, making it presumably the strongest go player in the world.  [and it has since done the same thing with chess and shogi, though its superiority in chess is not clear-cut.  See the link above for more details.  -- D.H. Jan 2019]

On the one hand, this is not particularly surprising.  It's a classic example of what I call "dumb is smarter" on the other blog, where a relatively straightforward approach without a lot of built in assumptions can outperform a carefully crafted system with lots of specialized knowledge baked in.  This doesn't mean that dumb is necessarily smartest, only that it often performs better than one might expect, because the downside to specialized knowledge is specialized blind spots.

On the other hand, this is all undeniably spooky.  An AI system with no baked-in knowledge of human thought is able, with remarkably little effort, to outperform even the very best of us at a problem that had long been held up as something unreachable by AI, something that only human judgement could deal with effectively.  If computers can beat us at being human, starting essentially from scratch (bear in mind that the hardware that all this is running on is largely designed and built by machine these days), then what, exactly are we meat bags doing here?

So let's step back and look at the actual problem being solved: given a position on a go board, find the move that is most likely to lead to capturing the most stones and territory at the end of the game.

Put that way, this is a perfectly well-posed optimization problem of the sort that we've been using computers to solve for decades.  Generations, really, at this point.  Granted, one particular solution -- bashing out all possible continuations from a given position -- is clearly not best suited, but so what?  Finding the optimum shape -- or at least a better one -- for an airplane wing isn't well suited to that either, but we've made good progress on it anyway using different kinds of algorithms.

So "chess-style algorithms suck at go, therefore go is inherently hard" was a bad argument from the get-go.

From what I've seen in the press, even taking potential hype with a grain of salt, AlphaGo Zero is literally rewriting the books on go, having found opening moves that have escaped human notice for centuries.  But that doesn't mean this is an inherently hard problem.  Humans failing to find something they're looking for for centuries means it's a hard problem for humans.

We humans are just really bad at predicting what kinds of problems are inherently hard, which I'd argue is the same as being hard to solve by machine*.  Not so long ago the stereotype of a genius was someone who "knew every word in the dictionary" or "could multiply ten-digit numbers immediately", both of which actually turned out to be pretty easy to solve by machine.

Once it was clear that some "genius" problems were easy for machines, attention turned to things that were easy for people but hard for machines.  There have been plenty of those -- walking, recognizing faces, translating between speech and text, finding the best move on the go board.  Those held out for quite a long time as "things machines will never be able to do", but the tide has been turning on them as well thanks, I think, to two main developments:
  • We can now build piles of hardware that have, in a meaningful sense, more processing power than human brains.
  • With these new piles of hardware, techniques that looked promising in the past but never really performed are now able to perform well, the main example being neural network-style algorithms
At this point, I'm convinced that trying to come up with ever fuzzier and more human things that only human brains will ever be able to do is a losing bet.  Maybe not now, but in the long run.  I will not be surprised at all if I live to see, say
  • Real time speech translation that does as well as a human interpreter.
  • Something that can write a Petrarchan sonnet on a topic of choice, say the futility of chasing perfection, that an experienced and impartial reviewer would describe as "moving", "profound" and "original".
  • Something that could read a novel and write a convincing essay on it comparing the story to specific experiences in the real world, and answer questions about it in a way that left no choice but to say that in some meaningful sense the thing "understood" what it read.
  • Something that it would be hard to argue didn't have emotions -- though the argument would certainly be made.
[On the other hand, I also won't be shocked if these don't totally pan out in the next few decades --D.H. Feb 2019]

These all shade into Turing test territory.  I've argued that, despite Alan Turing's genius and influence, the Turing test is not necessarily a great test of whatever we mean by intelligence, and in particular it's easy to game because people are predisposed to assume intelligence.  I've also argued that "the Singularity" is an ill-defined concept, but that's really a different thread.  Nevertheless, I expect that, sooner or later, we will be able to build things that pass a Turing test with no trickery, in a sense that most people can agree on.

And that's OK.

Or at least, we're going to have to figure out how to be OK with it.  Stopping it from happening doesn't seem like a realistic option.

This puts us firmly in the territory of I, Robot and other science fiction of its era and more recently (the modern Westworld reboot comes to mind), which is one reason I chose the cheesy title I did.  Machines can already do a lot of things better than we can, and the list will only grow over time.  At the moment we still have a lot of influence over how that happens, but that influence will almost certainly decrease over time (the idea behind the Singularity is that this will happen suddenly, in fact nearly instantaneously, once the conditions are right).

The question now is how to make best use of what influence we still have while we still have it.  I don't really have any good, sharp answers to that, but I'm pretty sure it's the right question.


* There's a very well-developed field, complexity theory, dealing in what kinds of problems are hard or easy for various models of computing in an absolute, quantifiable sense.  This is largely distinct from the question of what kinds of games or other tasks computers should be good at, or at least better than humans at, though some games give good examples of various complexity classes.  One interesting result is that it's often easy (in a certain technical sense) to produce good-enough approximate solutions to problems that are provably very hard to solve exactly.  Another interesting result is that it can be relatively tricky to find hard examples of problems that are known to be hard in general.

Saturday, July 22, 2017

Yep. Tron.

It was winter when I started writing this, but writing posts about physics is hard, at least if you're not a physicist.  This one was particularly hard because I had to re-learn what I thought I knew about the topic, and then realize that I'd never really understood it as well as I'd thought, then try to learn it correctly, then realize that I also needed to re-learn some of the prerequisites, which led to a whole other post ... but just for the sake of illustration, let's pretend it's still winter.

If you live near a modest-sized pond or lake, you might (depending on the weather) see it freeze over at night and thaw during the day.  Thermodynamically this can be described in terms of energy (specifically heat) and entropy.  At night, the water is giving off heat into the surrounding environment and losing entropy (while its temperature stays right at freezing).  The surrounding environment is taking on heat and gaining entropy.  The surroundings gain at least as much entropy as the pond loses, and ultimately the Earth will radiate just that bit more heat into space.  When you do all the accounting, the entropy of the universe increases by just a tiny bit, relatively speaking.

During the day, the process reverses.  The water takes on heat and gains entropy (while its temperature still stays right at freezing).  The surroundings give off heat, which ultimately came from the sun, and lose entropy.  The water gains at least as much entropy as the surroundings lose*, and again the entropy of the universe goes up by just that little, tiny bit, relatively speaking.

So what is this entropy of which we speak?  Originally entropy was defined in terms of heat and temperature.  One of the major achievements of modern physics was to reformulate entropy in a more powerful and elegant form, revealing deep and interesting connections, thereby leading to both enlightenment and confusion.  The connections were deep enough that Claude Shannon, in his founding work on information theory, defined a similar concept with the same name, leading to even more enlightenment and confusion.

The original thermodynamic definition relies on the distinction between heat and temperature.  Temperature, at least in the situations we'll be discussing here, is a measure of how energetic individual particles -- typically atoms or molecules -- are on average.  Heat is a form of energy, independent of how many particles are involved.

The air in an oven heated to 500K (that is, 500 Kelvin, about 227 degrees Celsius or 440 degrees Fahrenheit) and a pot full of oil at 500K are, of course, at the same temperature, but you can safely put your hand in the oven for a bit.  The oil, not so much.  Why?  Mainly because there's a lot more heat in the oil than in the air.  By definition the molecules in the oven air are just as energetic, on average, as a the molecules the oil, but there are a lot more molecules of oil, and therefore a lot more energy, which is to say heat.

At least, that's the quick explanation for purposes of illustration.  Going into the real details doesn't change the basic point: heat is different from temperature and changing the temperature of something requires transferring energy (heat) to or from it.  As in the case of the pond freezing and melting, there are also cases where you can transfer heat to or from something without changing its temperature.  This will be important in what follows.

Entropy was originally defined as part of understanding the Carnot cycle, which describes the ideal heat-driven engine (the efficiency of a real engine is usually given as a percentage of what the Carnot cycle would produce, not as a percentage of the energy it uses).  Among the principal results in classical thermodynamics is that the Carnot cycle was as good as you can get even in principle, but not even it can ever be perfectly efficient, even in principle.

At this point it might be helpful to read that earlier post on energy, if you haven't already.  Particularly relevant parts here are that the state of the working fluid in a heat engine, such as the steam in a steam engine, can be described with two parameters, or, equivalently, as a point in a two-dimensional diagram, and that the cycle an engine goes through can be described by a path in that two-dimensional space.

Also keep in mind the ideal gas law: In an ideal gas, the temperature of a given amount of gas is proportional to pressure times volume.  Here and in the rest of this post, "gas" means "a substance without a fixed shape or volume" and not what people call "gasoline" or "petrol".

If you've ever noticed a bicycle pump heat up as you pump up a tire, that's (more or less) why.  You're compressing air, that is, decreasing its volume, so (unless the pump is able to spill heat with perfect efficiency, which it isn't) the temperature has to go up.  For the same reason the air coming out of a can of compressed air is dangerously cold.  The air is expanding rapidly so the temperature drops sharply.

In the Carnot cycle you first supply heat a to gas (the "working fluid", for example steam in a steam engine) while maintaining a perfectly constant temperature by expanding the container it's in.  You're heating that gas, in the sense of supplying heat, but not in the sense of raising its temperature.  Again, heat and temperature are two different things.

To continue the Carnot cycle, let the container keep expanding, but now in such a way that it neither gains nor loses heat (in technical terms, adiabatically).  In these first two steps, you're getting work out of the engine (for example, by connecting a rod to the moving part of a piston and attaching the other part of that rod to a wheel).  The gas is losing energy since it's doing work on the piston, and it's also expanding, so the temperature and pressure are both dropping, but no heat is leaving the container in the adiabatic step.

Work is force times distance, and force in this case is pressure times the area of the surface that's moving.    Since the pressure, and therefore the force, is dropping during the second step you'll need to use calculus to figure out the exact amount of work, but people know how to do that.

The last two steps of the cycle reverse the first two.  In step three you compress the gas, for example by changing the direction the piston is moving, while keeping the temperature the same.  This means the gas is cooling in the sense of giving off heat, but not in the sense of dropping in temperature.  Finally, in step four, compress the gas further, without letting it give off heat.  This raises the temperature.  The piston is doing work on the gas and the volume is decreasing.  In a perfect Carnot cycle the gas ends up in the same state -- same pressure, temperature and volume -- as it began and you can start it all over.

As mentioned in the previous post, you end up putting more heat in at the start then you end up getting back in the third step, and you end up getting more work out in the first two steps than you put in in the last two (because the pressure is higher in the first two steps).  Heat gets converted to work (or if you run the whole thing backwards, you end up with a refrigerator).

If you plot the Carnot cycle on a diagram of pressure versus volume, or the other two combinations of pressure, volume and temperature, you get a a shape with at least two curved sides, and it's hard to tell whether you could do better.  Carnot proved that this cycle is the best you can do, in terms of how much work you can get out of a given amount of heat, by choosing two parameters that make the cycle into a rectangle.  One is temperature -- steps one and three maintain a constant temperature.

The other needs to make the other two steps straight lines.  To make this work out, the second quantity has to remain constant while the temperature is changing, and change when temperature is constant.  The solution is to define a quantity -- call it entropy -- that changes, when temperature is constant, by the amount of heat transferred, divided by that temperature (ΔS = ΔQ/T -- the deltas (Δ) say that we're relating changes in heat and entropy, not absolute quantities; Q stands for heat and S stands for entropy, because reasons).  When there's no heat transferred, entropy doesn't change.  In step one, temperature is constant and entropy increases.  In step two, temperature decreases while entropy remains constant, and so forth.

To be clear, entropy and temperature can, in general, both change at the same time.  For example, if you heat a gas at constant volume, then pressure, temperature and entropy all go up.  The Carnot cycle is a special case where only one changes at a time.

Knowing the definition of entropy, you can convert, say, a pressure/volume diagram to a temperature/entropy diagram and back.  In real systems, the temperature/entropy version won't show absolutely straight vertical and horizontal lines -- that is, there will be at least some places where both change at the same time.  The Carnot cycle is exactly the case where the lines are perfectly horizontal and vertical.

This definition of entropy in terms of heat and temperature says nothing at all about what's going on in the gas, but it's enough, along with some math I won't go into here (but which depends on the cycle being a rectangle), to prove Carnot's result: The portion of heat wasted in a Carnot cycle is the ratio of the cold temperature to the hot temperature (on an absolute temperature scale).  You can only have zero loss -- 100% efficiency -- if the cold temperature is absolute zero.  Which it won't be.

Any cycle that deviates from a perfect rectangle will be less efficient yet.  In real life this is inevitable.  You can come pretty close on all the steps, but not perfectly close.  In real life you don't have an ideal gas, you can't magically switch from being able to put heat into the gas to perfectly insulating it, you won't be able to transfer all the heat from your heat source to the gas, you won't be able to capture all the heat from the third step of the cycle to reuse in the first step of the next cycle, some of the energy of the moving piston will be lost to friction (that is, dissipated into the surroundings as heat) and so on.

The problem-solving that goes into minimizing inefficiencies in real engines is why engineering came to be called engineering and why the hallmark of engineering is getting usefulness out of imperfection.



There are other cases where heat is transferred at a constant temperature, and we can define entropy in the same way as for a gas.  For example, temperature doesn't change during a phase change such as melting or freezing.  As our pond melts and freezes, the temperature stays right at freezing until the pond completely freezes, at which point it can get cooler, or melts entirely, at which point it can get warmer.

If all you know is that some water is at the freezing point, you can't say how much heat it will take to raise the temperature above freezing without knowing how much of it is frozen and how much is liquid.  The concept of entropy is perfectly valid here -- it relates directly to how much of the pond is liquid -- and we can define "entropy of fusion" to account for phase transitions.

There are plenty of other cases that don't look quite so much like the ideal gas case but still involve changes of entropy.  Mixing two substances increases overall entropy.  Entropy is a determining factor in whether a chemical reaction will go forward or backward and in ice melting when you throw salt on it.


Before I go any further about thermodynamic entropy, let me throw in that Claude Shannon's definition of entropy in information theory is, informally, a measure of the number of distinct messages that could have been transmitted in a particular situation.  On the other blog, for example, I've ranted about bits of entropy for passwords.  This is exactly a measure of how many possible passwords there are in a given scheme for picking passwords.

What in the world does this have to do with transferring heat at a constant temperature?  Good question.

Just as the concept of energy underwent several shifts in understanding on the way to its current formulation, so did entropy.  The first major shift came with the development of statistical mechanics.  Here "mechanics" refers to the behavior of physical objects, and "statistical" means you've got enough of them that you're only concerned about their overall behavior.

Statistical mechanics models an ideal gas as a collection of particles bouncing around in a container.  You can think of this as a bunch of tiny balls bouncing around in a box, but there's a key difference from what you might expect from that image.  In an ideal gas, all the collisions are perfectly elastic, meaning that the energy of motion (called kinetic energy) remains the same before and after.  In a real box full of balls, the kinetic energy of the balls gets converted to heat as the balls bump into each other and push each other's molecules around, and sooner or later the balls stop bouncing.

But the whole point of the statistical view of thermodynamics is that heat is just the kinetic energy of the particles the system is made up of.  When actual bouncing balls lose energy to heat, that means that the kinetic energy of the large-scale motion of the balls themselves is getting converted into kinetic energy of the small-scale motion of the molecules the balls are made of, and of the air in the box, and of the walls of the box, and eventually the surroundings.  That is, the large scale motion we can see is getting converted into a lot of small-scale motion that we can't, which we call heat.

When two particles, say two oxygen molecules, bounce off each other, the kinetic energy of the moving particles just gets converted into kinetic energy of differently-moving particles, and that's it.  In the original formulation of statistical mechanics, there's simply no other place for that energy to go, no smaller-scale moving parts to transfer energy to (assuming there's no chemical reaction between the two -- if you prefer, put pure helium in the box).

When a particle bounces off the wall of the container, it imparts a small impulse -- an instantaneous force -- to the walls.  When a whole lot of particles continually bounce off the walls of a container, those instantaneous forces add up to (for all practical purposes) a continuous force, that is, pressure.

Temperature is the average kinetic energy of the particles and volume is, well, volume.  That gives us our basic parameters of temperature, pressure and volume.

But what is entropy, in this view?  In statistical mechanics, we're concerned about the large-scale (macroscopic) state of the system, but there are many different small-scale (microscopic) states that could give the same macroscopic picture.

Once you crank through all the math, it turns out that entropy is a measure of how many different microscopic states, which we can't measure, are consistent with the macroscopic state, which we can measure.  In fuller detail, entropy is actually proportional to the logarithm of that number -- the number of digits, more or less -- both because the raw numbers are ridiculously big, and because that way the entropy of two separate systems is the sum of the entropy of the individual systems.

The actual formula is S = k ln(W), where k is Boltzmann's constant and W is the total number of possible microstates, assuming they're all equally probable.  There's a slightly bigger formula if they're not.  Note that, unlike the original thermodynamic definition, this formula deals in absolute quantities, not changes.

When ice melts, entropy increases.  Water molecules in ice are confined to fixed positions in a crystal.  We may not know the exact energy of each individual molecule, but we at least know more or less where it is, and we know that if the energy of such a molecule is too high, it will leave the crystal (if this happens on a large scale, the crystal melts).  Once it does, we know much less about its location or energy.

Even without a phase change, the same sort of reasoning applies.  As temperature -- the average energy of each particle -- increases, the range of energies each particle can have increases.  How to translate this continuous range of energies into a number we can count is a bit of a puzzle, but we can handwave around that for now.

Entropy is often called a measure of disorder, but more accurately it's a measure of uncertainty (as theoretical physicist Sabine Hossenfelder puts it: "a measure for unresolved microscopic details"), that is, how much we don't know.  That's why Shannon used the same term in information theory.  The entropy of a message measures how much we don't know about it just from knowing its size (and a couple of other macroscopic parameters).  Shannon entropy is also logarithmic, for the same reasons that thermodynamic entropy is.

The formula for Shannon entropy in the case that all possible messages are equally probable is H = k ln(M), where M is the number of messages.  I put k there to account for the logarithm usually being base 2 and because it emphasizes the similarity to the other definition.  Again, there's a slightly bigger formula if the various messages aren't all equally probable, and it too looks an awful lot like the corresponding formula for thermodynamic entropy.

The original formulation of statistical mechanics assumed that physics at the microscopic scale followed Newton's laws of motion.  One indication that statistical mechanics was on to something is that when quantum mechanics completely reformulated what physics looks like at the microscopic scale, the statistical formulation not only held up, but became more accurate with the new information available.

In our current understanding, when two oxygen molecules bounce off each other, their electron shells interact (there's more going on, but let's start there), and eventually their energy gets redistributed into a new configuration.  This can mean the molecules traveling off in new paths, but it could also mean that some of the kinetic energy gets transferred to the electrons themselves, or some of the electrons' energy gets converted into kinetic energy.

Macroscopically this all looks the same as the old model, if you have huge numbers of molecules, but in the quantum formulation we have a more precise picture of entropy.  This makes a difference in extreme situations such as extremely cold crystals.  Since energy is quantized, there is a finite (though mind-bendingly huge) number of possible quantum states a typical system can have, and we can stop handwaving about how to handle ranges of possible energy.  This all works whether you have a gas, a liquid, an ordinary solid or some weird Bose-Einstein condensate.  Entropy measures that number of possible quantum states.

Thermodynamic entropy and information theoretic entropy are measuring basically the same thing, namely the number of specific possibilities consistent with what we know in general.  In fact, the modern definition of thermodynamic entropy specifically starts with a raw number of possible states and includes a constant factor to convert from the raw number to the units (energy over temperature) of classical thermodynamics.

This makes the two notions of entropy look even more alike -- they're both based on a count of possibilities, but with different scaling factors.  Below I'll even talk, loosely, of "bits worth of thermodynamic entropy" meaning the number of bits in the binary number for the number of possible quantum states.

Nonetheless, they're not at all the same thing in practice.

Consider a molecule of DNA.  There are dozens of atoms, and hundreds of subatomic particles, in a base pair.  I really don't know how many possible states a phosphorous atom (say) could be in under typical conditions, but I'm going to guess that there are thousands of bits worth of entropy in a base pair at room temperature.  Even if each individual particle can only be on one of two possible states, you've still got hundreds of bits.

From an information-theoretic point of view, there are four possible states for a base pair, which is two bits, and because the genetic code actually includes a fair bit of redundancy in the form of different ways of coding the same amino acid and so forth, it's actually more like 10/6 of a bit, even without taking into account other sources of redundancy.

But there is a lot of redundancy in your genome, as far as we can tell, in the form of duplicated genes and stretches of DNA that might or might not do anything.  All in all, there is about a gigabyte worth of base pairs in a human genome, but the actual gene-coding information can compress down to a few megabytes.  The thermodynamic entropy of the molecule that encodes those megabytes is much, much, larger.  If each base pair represents about a thousand bits worth of thermodynamic entropy under typical conditions, then the whole strand is into the hundreds of gigabytes.

I keep saying "under typical conditions" because thermodynamic entropy, being thermodynamic, depends on temperature.  If you have a fever, your body, including your DNA molecules in particular, has higher entropy than if you're sitting in an ice bath.  The information theoretic entropy, on the other hand, doesn't change.

But all this is dwarfed by another factor.  You have billions of cells in your body (and trillions of bacterial cells that don't have your DNA, but never mind that).  From a thermodynamic standpoint, each of those cells -- its DNA, its RNA, its proteins, lipids, water and so forth -- contributes to the overall entropy of your body.  A billion identical strands of DNA at a given temperature have the same information content as a single strand but a billion times the thermodynamic entropy.

If you want to compare bits to bits, the Shannon entropy of your DNA is inconsequential compared to the thermodynamic entropy of your body.  Even the change in the thermodynamic entropy of your body as you breathe is enormously bigger than the Shannon entropy of your DNA.

I mention all this because from time to time you'll see statements about genetics and the second law of thermodynamics.  The second law, which is very well established, states that the entropy of a closed system cannot decrease over time.  One implication of it is that heat doesn't flow from cold to hot, which is a key assumption in Carnot's proof.

Sometimes the second law is taken to mean that genomes can't get "more complex" over time, since that would violate the second law.  The usual response to this is that living cells aren't closed systems and therefore the second law doesn't apply.  That's perfectly valid.  However, I think a better answer is that this confuses two forms of entropy -- thermodynamic entropy and Shannon entropy -- which are just plain different.  In other words, thermodynamic entropy and the second law don't work that way.

From an information point of view, the entropy of a genome is just how many bits it encodes once you compress out any redundancy.  Longer genomes typically have more entropy.  From a thermodynamic point of view, at a given temperature, more of the same substance has higher entropy than less as well, but we're measuring different quantities.

A live elephant has much, much higher entropy than a live mouse, and likewise for a live human versus a live mouse.  As it happens, a mouse genome is roughly the same size as a human genome, even though there's a huge difference in thermodynamic entropy between a live human and a live mouse.  The mouse genome is slightly smaller than ours, but not a lot.  There's no reason it couldn't be larger, and certainly no thermodynamic reason.  Neither the mouse nor human genome is particularly large.  Several organisms have genomes dozens of times larger, at least in terms of raw base pairs.

From a thermodynamic point of view, it hardly matters what exact content a DNA molecule has.  There are some minor differences in thermodynamic behavior among the particular base pairs, and in some contexts it makes a slight difference what order they're arranged in, but overall the gene-copying machinery works the same whether the DNA is encoding a human digestive protein or nothing at all.  Differences in gene content are dwarfed by the thermodynamic entropy change of turning one strand of DNA and a supply of loose nucleotides into two strands, that in turn is dwarfed by everything else going on in the cell, and that in turn is dwarfed by the jump from one cell to billions.

For what it's worth, content makes even less thermodynamic difference in other forms of storage.  A RAM chip full of random numbers has essentially the same thermodynamic entropy, at a given temperature, as one containing all zeroes or all ones, even though those have drastically different Shannon entropies.  The thermodynamic entropy changes involved in writing a single bit to memory are going to equate to a lot more than one bit.

Again, this is all assuming it's valid to compare the two forms of entropy at all, based on their both being measures of uncertainty about what exact state a system is in, and again, the two are not actually comparable, even though they're similar in form.  Comparing the two is like trying to compare a football score to a basketball score on the basis that they're both counting the number of times the teams involved have scored goals.


There's a lot more to talk about here, for example the relation between symmetry and disorder (more disorder means more symmetry, which was not what I thought until I sat down to think about it), and the relationship between entropy and time (for example, as experimental physicist Richard Muller points out, local entropy decreases all the time without time appearing to flow backward), but for now I think I've hit the main points:
  • The second law of thermodynamics is just that -- a law of thermodynamics
  • Thermodynamic entropy as currently defined and information-theoretic (Shannon) entropy are two distinct concepts, even though they're very similar in form and derivation.
  • The two are defined in different contexts and behave entirely differently, despite what we might think from them having the same name.
  • Back at the first point, the second law of thermodynamics says almost nothing about Shannon entropy, even though you can, if you like, use the same terminology in counting quantum states.
  • All this has even less to do with genetics.

* Strictly speaking, you need to take the Sun into account.  The Sun is gaining entropy over time, at a much, much higher rate than our little pond and its surroundings, and it's only an insignificantly tiny part of the universe.  But even if you had a closed system, of a pond and surroundings that were sometimes warm and sometimes cold, for whatever reason, the result would be the same: The entropy of a closed system increases over time.

Wednesday, July 19, 2017

The human perspective and its limits

A couple more points occurred to me after I hit "publish" on the previous post.  Both of them revolve around subjectivity versus objectivity, and to what extent we might be limited by our human perspective.


In trying to define whether a kind of behavior is simple or complex, I gave two different notions which I claimed were equivalent: how hard it is to describe and how hard it is to build something to copy it.

The first is, in a sense, subjective, because it involves our ability to describe and understand things.  Since we describe things using language, it's tied to what fits well with language.  The second is much more objective.  If I build a chess-playing robot, something with no knowledge of human language or of chess could figure out what it was doing, at least in principle.

One of the most fundamental results in computer science is that there are a number of very simple computing models (stack machines, lambda calculus, combinators, Turing machines, cellular automata, C++ templates ... OK, maybe not always so simple) which are "functionally complete".  That means that any of them can compute any "total recursive function". This covers a wide range of problems, from adding numbers to playing chess to finding cute cat videos and beyond.

It doesn't matter which model you choose.  Any of them can be used to simulate any of the others.  Even a quantum computer is still computing the same kinds of functions [um ... not 100% sure about that ... should run that down some day --D.H.].  The fuss there is about the possibility that a quantum computer could compute certain difficult functions exponentially faster than a non-quantum computer.

Defining a totally recursive function for a problem basically means translating it into mathematical terms, in other words, describing it objectively.  Computability theory says that if you can do that, you can write a program to compute it, essentially building something to perform the task (generally you tell a general-purpose computer to execute the code you wrote, but if you really want to you can build a physical circuit to do the what the computer would do).

So the two notions, of describing a task clearly and producing something to perform it are, provably, equivalent.  There are some technical issues with the notion of complexity here that I'm going to gloss over.  The whole P = NP thing revolves around whether being able to state a problem simply means being able to solve it simply, but when it comes to deciding whether recognizing faces is harder than walking, I'm going to claim we can leave that aside.

The catch here is that my notion of objectivity -- defining a computable function -- is ultimately based on mathematics, which in turn is based on our notion of what it means to prove something (the links between computing and theorem proving are interesting and deep, but we're already in deep enough as it is).  Proof, in turn, is -- at least historically -- based on how our minds work, and in particular how language works.  Which is what I called "subjective" at the top.

So, is our notion of how hard something is to do mechanically -- my ostensibly "objective" definition -- limited by our modes of reasoning, particularly verbal reasoning, or is verbal/mathematical reasoning a fundamentally powerful way of describing things that we happened to discover because we developed minds capable of apprehending it?  I'd tend to think the latter, but then maybe that's just a human bias.



Second, as to our tendency to think that particularly human things like language and house-building are special, that might not just be arrogance, even if we're not really as special as we'd like to think.  We have a theory of mind, and not just of human minds.  We attribute very human-like motivations to other animals, and I'd argue that in many, maybe most, cases we're right.  Moreover, we also attribute different levels of consciousness to different things (things includes machines, which we also anthropomorphize).

There's a big asymmetry there: we actually experience our own consciousness, and we assume other people share the same level of consciousness, at least under normal circumstances, and we have that confirmed as we communicate our consciousnesses to each other.  It's entirely natural, then, to see our own intelligence and consciousness, which we see from the inside in the case of ourselves and close up in the case of other people, as particularly richer and more vivid.  This is difficult to let go of when trying to study other kinds of mind, but it seems to me it's essential at least to try.

Monday, July 17, 2017

Is recognizing faces all that special?

I've seen some headlines recently saying that fish can be taught to recognize human faces.  It's not clear why these would be circulating now, since the original paper appeared in 2016, but it's supposed to be newsworthy because fish weren't thought to have the neural structures needed to recognize faces.  In particular, they lack a neocortex (particularly the fusiform gyrus), or anything clearly analogous to it, which is what humans and primates use in recognizing faces.  Neither do the fish in question normally interact with humans, unlike, say, dogs, which might be expected to have developed an innate ability to recognize people.

The main thesis of the paper appears to be that there's nothing particularly special about recognizing faces.  As a compugeek, I'd say that the human brain is optimized for recognizing faces, but that doesn't mean that a more general approach can't work.  It makes sense that we'd have special machinery for faces.  Recognizing human faces is important to humans, though it's worth pointing out that there are plenty of people who don't seem to have this optimization (the technical term is prosopagnosia).

The authors of the paper also point out that recognizing faces is tricky:
[F]aces share the same basic components and individuals must be discriminated based on subtle differences in features or spatial relationships.
To be sure that the fish are performing the same recognition task we do, though presumably through different means, the experimental setup uses the same skin tone in all the images and crops them to a uniform oval.  Frankly, I found it hard to pick out the differences in what was left, but my facial recognition seems to be weaker than average in real life as well.

This is interesting work and the methodology seems solid, but should we really be surprised?  Yes, recognizing faces is tricky, but so is picking out a potential predator or prey, particularly if it's trying not to be found.

The archerfish used in the experiments normally provide for themselves by spitting jets of water at flies and small animals, then collecting them when they fall.  This means seeing the prey through the distortion of the air/water boundary, contracting various muscles at just the right rate and time, and finding the fallen prey.  For bonus points, don't waste energy shooting down dead leaves and such.

Doing all that requires the type of neural computation that seems easy until you actually try to duplicate it.  Did I mention that archerfish have a range on the order of meters, a dozen or so times their body length? It's not clear why recognizing faces should be particularly hard by comparison.

Computer neural networks can recognize faces using far fewer neurons than a fish has (Wikipedia says an adult zebrafish has around 10 million).  Granted, the fish has other things it needs to do with those neurons, and you can't necessarily compare virtual neurons directly to real ones, but virtual neurons are pretty simple -- they basically add a bunch of numbers, each multiplied by a "weight", and fiddle the result slightly.  Real neurons do much the same thing with electrical signals, hence the name "neural network".

It doesn't seem like recognizing shapes as complex as human faces should require a huge number of neurons.  The question, rather, is what kinds of brains are flexible enough to repurpose their shape recognition to an arbitrary task like figuring out which image of a face to spit at in order to get a tasty treat.

Again, is it surprising that a variety of different brains should have that kind of flexibility?  Being able to recognize new types of shape in the wild has pretty clear adaptive value, as does having flexible brain wiring in general.  Arguably the surprise would be finding an animal that relies strongly on its visual system that couldn't learn to recognize subtle differences in arbitrary shapes.

And yet, this kind of result does seem counterintuitive to many, and I'd include myself if I hadn't already seen similar results.  Intuitively we believe that some things take a more powerful kind of intelligence than others.  Playing chess or computing the derivative of a function is hard.  Walking is easy.

We also have a natural understanding of what kinds of intelligence are particularly human.  We naturally want to draw a clear line between our sort of intelligence and everyone else's.  Clearly those uniquely human abilities must require some higher form of intelligence.  Language with features like pronouns, tenses and subordinate clauses seems unique to us (though there's a lot we don't know about communication in other species), so it must be very high level.  Likewise for whatever we want to call the kind of planning and coordination needed to, say, build a house.

Recognizing each other's faces is a very human thing to do -- notwithstanding that several other kinds of animal seem perfectly capable of it -- so it must require some higher level of intelligence as well.

Now, to be clear, I'm quite sure that there is a constellation of features that, taken together, is unique and mostly universal to humanity, even if we share a number of particular features in that constellation with other species.  No one else we're aware of produces the kind of artifacts we do ... jelly donuts, jet skis, jackhammers, jugs, jujubes, jazz ...  or forms quite the same kind of social structures, or any of a number of other things.

However, that doesn't mean that these things are particularly complex or special.  We're also much less hairy than other primates, but near-hairlessness isn't a complex trait.  Our feet (and much of the rest of our bodies) are specialized for standing up, but that doesn't seem particularly different from specializing to swing through trees, or gallop, or hop like a kangaroo, or whatever else.

Our intuitions about what kind of intelligence is complex, or "of a higher order" are just not very reliable.  Playing chess is not particularly complicated.  It just requires bashing out lots and lots of different possible moves.  Calculating derivatives from a general formula is easy.  Walking, on the other hand, is fiendishly hard.  Language is ... interesting ... but many of the features of language, particularly stringing together combinations of distinct elements in sequence, are quite simple.

What do I mean by "simple" here?  I mean one of two more or less equivalent things: How hard is it to describe accurately, and how hard is it to build something to perform the task.  In other words, how hard is it to objectively model something, in the sense that you'll get the same result no matter who or what is following the instructions.

This is not necessarily the same question as how complex a brain do you need in order to perform the task, but this is partly because brains have developed in response to particular features of their environment.  Playing chess or taking the derivative of a polynomial shouldn't take a lot of neurons in principle, but it's hard for us because we don't have any neurons hardwired for those tasks.  Instead we have to use the less-hardwired parts of our brain pull together pieces that originally arose for different purposes.

Recognizing faces seems like something that requires a modest amount of machinery of the type that most visually-oriented animals should have available, and probably available in a form that can be adapted to the task, even if recognizing human faces isn't something the animal would normally have to do.  Cataloging what sorts of animals do it well seems interesting and ultimately useful in helping us understand our own brains, but we shouldn't be surprised if that catalog turns out to be fairly large.


Sunday, July 16, 2017

Discovering energy

If you get an electricity bill, you're aware that energy is something that can be quantified, consumed, bought and sold.   It's something real, even if you can't touch it or see it.  You probably have a reasonable idea of what energy is, even if it's not a precisely scientific definition, and an idea of some of the things you can do with energy: move things around, heat them or cool them, produce light, transmit information and so forth.

When something's as everyday-familiar as energy it's easy to forget that it wasn't always this way, but in fact the concept of energy as a measurable quantity is only a couple of centuries old, and the closely related concept of work is even newer.

Energy is now commonly defined as the ability to do work, and work as a given force acting over a given distance.  For example, lifting a (metric) ton of something one meter in the Earth's surface gravity requires exerting approximately 9800 Newtons of force over that distance, or approximately 9800 Newton-meters of work altogether.  A Joule of energy is the ability to do one Newton-meter of work, so lifting one ton one meter requires approximately 9800 Joules of energy, plus whatever is lost to inefficiency.

As always, there's quite a bit more going on if you start looking closely.  For one thing, the modern physical concept of energy is more subtle than the common definition, and for another energy "lost" to inefficiency is only "lost" in the sense that it's now in a form (heat) that can't directly do useful work.  I'll get into some, but by no means all, of that detail later in this post and probably in others as well.

I'm not going to try to give an exact history of thermodynamics or calorimetry here, but I do want to call out a few key developments in those fields.  My main aim is to trace the evolution of energy as a concept from a concrete, pragmatic working definition born out of the study of steam engines to the highly abstract concept that underpins the current theoretical understanding of the physical world.



The concept of energy as we know it dates to somewhere around the turn of the 19th century, that is, the late 1700s and early 1800s.   At that point practical steam engines had been around for several decades, though they only really took off when Watt's engine came along in 1781.  Around the same time a number of key experiments were done, heat was recognized as a form of energy and a full theory of heat, work and the relationship between the two was formulated.

What makes things hot?  This is one of those "why is the sky blue?" questions that quickly leads into deep questions that take decades to answer properly.  The short answer, of course, is "heat", but what exactly is that?  A perfectly natural answer, and one of the first to be formalized into something like what we would call a theory, is that heat is some sort of substance, albeit not one that we can see, or weigh, or any of a number of other things one might expect to do with a substance.

This straightforward answer makes sense at first blush.  If you set a cup of hot tea on a table, the tea will get cooler and the spot where it's sitting on the table will get warmer.  The air around the cup also gets warmer, though maybe not so obviously.  It's completely reasonable to say that heat is flowing from the hot teacup to its surroundings, and to this day "heat flow" is still an academic subject.

With a little more thought it seems reasonable to say that heat is somehow trapped in, say, a stick of wood, and that burning the wood releases that heat, or that the Sun is a vast reservoir of heat, some of which is flowing toward us, or any of a number of quite reasonable statements about heat considered as a substance.  This notional substance came to be known as caloric, from the Latin for heat.

As so often happens, though, this perfectly natural idea gets harder and harder to defend as you look more closely.  For example, if you carefully weigh a substance before and after burning it, as Lavoisier did in 1772, you'll find that it's actually heavier after burning.  If burning something releases the caloric in it, then does that mean that caloric has negative weight?  Or perhaps it's actually absorbing cold, and that's the real substance?

On the other hand, you can apparently create as much caloric as you want without changing the weight of anything.  In 1797 Benjamin Thompson, Count Rumford, immersed an unbored cannon in water, bored it with a specially dulled borer and observed that the water was boiling hot after about two and a half hours.  The metal bored from the cannon was not observably different from the remaining metal of the cannon, the total weight of the two together was the same as the original weight of the unbored cannon, and you could keep generating heat as long as you liked.  None of this could be easily explained in terms of heat as a substance.

Quite a while later, in the 1840s, James Joule did precise measurements how much heat was generated by a falling weight powering a stirrer in a vat of water.  Joule determined that heating a pound of water one degree Fahrenheit requires 778.24 foot-pounds of work (e.g., letting a 778.24 pound weight fall one foot, or a 77.824 weight fall ten feet, etc.). Ludwig Colding did similar research, and both Joule and Julius Robert von Mayer published the idea that heat and work can each be converted to the other.  This is not just significant theoretically.  Getting five digits of precision out of stirring water with a falling weight is pretty impressive in its own right.

At this point we're well into the development of thermodynamics, which Lord Kelvin eventually defined in 1854 as "the subject of the relation of heat to forces acting between contiguous parts of bodies, and the relation of heat to electrical agency."  This is a fairly broad definition, and the specific mention of electricity is interesting, but a significant portion of thermodynamics and its development as a discipline centers around the behavior of gasses, particularly steam.


In 1662, Robert Boyle published his finding that the volume of a gas, say, air in a piston, is inversely proportional to the pressure exerted on it.  It's not news, and wasn't at the time, that a gas takes up less space if you put it under pressure.  Not having a fixed volume is a defining property of a gas.  However, "inversely proportional" says more.   It says if you double the pressure on a gas, its volume shrinks by half, and so forth.  Another way to put this is that pressure multiplied by volume remains constant.

In the 1780s, Jacque Charles formulated (but didn't publish) the idea that the volume of a gas was proportional to its temperature.  In 1801 and 1802, John Dalton and Joseph Louis Guy-Lussac published experimental results showing the same effect.  There was one catch: you had to measure temperature on the right scale.  A gas at 100 degrees Fahrenheit doesn't have twice the volume of a gas at 50 degrees, nor does it if you measure in Celsius.

However, if you plot volume vs. temperature on either scale you get a straight line, and if you put the zero point of your temperature scale where that line would show zero volume -- absolute zero -- then the proportionality holds.  Absolute zero is quite cold, as one might expect.  It's around 273 degrees below zero Celsius (about 460 degrees below zero Fahrenheit).  It's also unobtainable, though recent experiments in condensed matter physics have come quite close.

Put those together and you get the combined gas law: Pressure times volume is proportional to temperature.

In 1811 Amedeo Avogadro hypothesized that two samples of the same gas at the same temperature, pressure and volume contained the same number of molecules.  This came to be known as Avogadro's Law.  The number of molecules in a typical system is quite large.  It is usually expressed in terms of Avogadro's Number, approximately six followed by twenty-three zeroes, one of the larger numbers that one sees in regular use in science.

Put that together with the combined gas law and you have the ideal gas law:
PV = nRT
P is the pressure, V is the volume, n is the number of molecules, T is the temperature and R is the gas constant that makes the numbers and units come out right.

The really important abstraction here is state.  If you know the parameters in the ideal gas law -- the pressure, volume, temperature and how many gas particles there are, then you know its state.  This is all you need to know, and all you can know, about that gas as far as thermodynamics is concerned.  Since the number of gas particles doesn't change in a closed system like a steam engine (or at least an idealized one), you only need to know pressure, volume and temperature to know the state.

Since the ideal gas law relates those, you really only need to keep track of two of the three.  If you measure pressure, volume and temperature once to start with, and you observe that the volume remains constant while the pressure increases by 10%, you know that the temperature must be 10% higher than it was.  If the volume had increased by 20% but the pressure had dropped by 10%, the temperature must now be 8% higher (1.2 * 0.9 = 1.08).  And so forth.

You don't have to track pressure and volume particularly, or even two of {pressure, volume, temperature}.  There are other measures that will do just as well (we'll get to one of the important ones in a later post), but no matter how you define your measures you'll need two of them to account for the thermodynamic state of a gas and (as long as they aren't essentially the same measure in disguise), two will be enough.  Technically, there are two degrees of freedom.

This means that you can trace the thermodynamic changes in a gas on a two-dimensional diagram called a phase diagram.  Let's pause for a second to take that in.  If you're studying a steam engine (or in general, a heat engine) that converts heat into work (or vice versa) you can reduce all the movements of all the machinery, all the heating and cooling, down to a path on a piece of paper.  That's a really significant simplification.


In theory, the steam in a steam engine (or generally the working fluid in a heat engine), will follow a cycle over and over, always returning to the same point in the phase diagram (that is, the same state).    In practice, the cycle won't repeat exactly, but it will still follow a path through the phase diagram that repeats the same basic pattern over and over, with minor variations.

The heat source heats the steam and the steam expands.  Expanding means exerting force against the walls of whatever container it's in, say the surface of a piston.  That is, it means doing work.  The steam is then cooled, removing heat from it, and the steam returns to its original pressure, volume and temperature.  At that point, from a thermodynamic point of view, that's all we know about the steam.  We can't know, just from taking measurements on the steam, how many times it's been heated and cooled, or anything else about its history or future.  All you know is its current thermodynamic state.

As the steam contracts back to its original volume, its surroundings are doing work on it.  The trick is to manipulate the pressure, temperature and volume in such a way that the pressure, and thus the force, is lower on the return stroke than the power stroke, and the steam does more work expanding than is done on it contracting.  Doing so, it turns out, will involve putting more heat into the heating than comes out in the cooling.  Heat goes in, work comes out.


This leads us to one of the most important principles in science.  If you carefully measure what happens in real heat engines, and the ways you can trace through a path in a phase diagram, you find that you can convert heat to work, and work to heat, and that you will always lose some waste heat to the surroundings, but when you add it all up (in suitable units and paying careful attention to the signs of the quantities involved), the total amount of heat transferred and work done never changes.  If you put in 100 Watts worth of heat, you won't get more than 100 Watts worth of work out.  In fact, you'll get less.  The difference will be wasted heating the surroundings.

This is significant enough when it comes to heat engines, but that's only the beginning.  Heat isn't the only thing you can convert into work and vice versa.  You can use electricity to move things, and moving things to make electricity.  Chemical reactions can produce or absorb heat or produce electrical currents, or be driven by them.   You can spin up a rotating flywheel and then, say, connect it to a generator, or to a winch.

Many fascinating experiments were done, and the picture became clearer and clearer: Heat, electricity, the motion of an object, the potential for a chemical reaction, the stretch in a spring, the potential of a mass raised to a given height, among other quantities, can all be converted to each other, and if you measure carefully, you always find the total amount to be the same.

This leads to the conclusion that all of these are different forms of the same thing -- energy -- and that this thing is conserved, that is, never created or destroyed, only converted to different forms.


As far-reaching and powerful as this concept is, there were two other important shifts to come.  One was to take conservation of energy not as an empirical result of measuring the behavior of steam engines and chemical reactions, but as a fundamental law of the universe itself, something that could be used to evaluate new theories that had no direct connection to thermodynamics.

If you have a theory of how galaxies form over millions of years, or how electrons behave in an atom, and it predicts that energy isn't conserved, you're probably not going to get far.  That doesn't mean that all the cool scientist kids will point and laugh (though a certain amount of that has been known to happen).  It means that sooner or later your theory will hit a snag you hadn't thought of and sooner or later the numbers won't match up with reality*.  When this happens over and over and over, people start talking about fundamental laws.


The second major shift in the understanding of energy came with the quantum theory, that great upsetter of scientific apple carts everywhere.  At a macroscopic scale, energy still behaves something like a substance, like the caloric originally used to explain heat transfer.  In Count Rumford's cannon-boring experiment, mechanical energy is being converted into heat energy.  Heat itself is not a substance, but one could imagine that energy is, just one that can change forms and lacks many of the qualities -- color, mass, shape, and so forth -- that one often associates with a substance.

In the quantum view, though, saying that energy is conserved doesn't assume some substance or pseudo-substance that's never created or destroyed.  Saying that energy is conserved is saying that the laws describing the universe are time-symmetric, meaning that they behave the same at all times.  This is a consequence of Noether's theorem (after the mathematician Emmy Noether), one of the deepest results in mathematical physics, which relates conservation in general to symmetries in the laws describing a system.  Time symmetry implies conservation of energy.  Directional symmetry -- the laws work the same no matter which way you point your x, y and axes -- implies conservation of angular momentum.

Both of these are very abstract.  In the quantum world you can't really speak of a particle rotating on an axis, yet you can measure something that behaves like angular momentum, and which is conserved just like the momentum of spinning things is in the macroscopic world.  Just the same, energy in the quantum world has more to do with the rates at which the mathematical functions describing particles vary over space and time, but because of how the laws are structured it's conserved and, once you follow through all the implications, energy as we experience it on our scale is as well.

This is all a long way from electricity bills and the engines that drove the industrial revolution, but the connections are all there.  Putting them together is one of the great stories in human thought.

* I suppose I can't avoid at least mentioning virtual particles here.  From an informal description, of particles being created and destroyed spontaneously, it would seem that they violate conservation of energy (considering matter as a form of energy).  They don't, though.  Exactly why they don't is beyond my depth and touches on deeper questions of just how one should interpret quantum physics, but one informal way of putting it is that virtual particles are never around for long enough to be detectable.  Heisenberg uncertainty is often mentioned as well.

Friday, May 26, 2017

The value of the thing ...

... is what it will bring.

I've now seen several headlines along the lines of "NASA to explore  $10,000 quadrillion metal asteroid"

What does this even mean?

Two things, really:
  • NASA is planning a mission to the nickel-iron asteroid 16 Psyche, which is true
  • That asteroid contains $10 quintillion worth of metal, which is, um ...
I mean, on the one hand it's a simple calculation: Psyche contains X tons of nickel at $Y/ton, and likewise for iron.  Total value: $10 quintillion or, for whatever reason, $10,000 quadrillion.

Except that's about 100,000 times the world's GDP, so maybe we're missing something?

Suppose we could magically bring all the nickel and iron in Psyche to earth.  That's a ball about 200km across, so we'd have to be a bit careful, but say we break it down into a few million 1km heaps distributed strategically around the world.  How much is that really worth?

You might think "Yay, free iron and nickel!" but that's not quite right.  Even scrap iron, which has already been refined and packaged into usable pieces, costs something to buy, something to transport and something more to put to use, unless it happens to be in just the form you need.  More realistically, it would mean no more iron mining, which is great unless you happen to be in the iron/nickel mining business.  That's not nothing -- world iron production looks to be around $300 billion and nickel maybe more like $20 billion.  But it's not a trillion dollars, much less a quadrillion or quintillion.

Or look at it another way: We've got a rock out in space that's worth as much as the entire world economy would produce in 100,000 years at current rates.  The total budget of NASA is around $20 billion, with ESA JAXA and the Russian space agency accounting for a few billion more.  Surely it would be worth it to throw the world's entire space budget into mining that rock.

Except, the question isn't whether there's a bunch of valuable metal to be mined. The question is whether it's worth mining.  It currently costs about $20,000 per kilogram to get a payload to low earth orbit.  It's anybody's guess what it would cost to actually mine a given amount of metal in the asteroid belt and bring it back to Earth safely -- though if you're transporting a hunk of metal I suppose you just have to make sure that it doesn't hit anything on the way in.  But bulk nickel from Earth runs more like $10/kg and iron is cheaper yet, so ... maybe not.


I don't really want to pick on NASA for trying to drum up a little interest in its latest mission -- though it's probably worth mentioning that the past couple of decades of unmanned missions by NASA and the other space agencies have been spectacularly successful in exploring the solar system and in an ideal world that would speak for itself.  If there's a point here, it's that it's a good idea not to take numbers, especially eye-catching dollar amounts, at face value without asking what they actually mean.

Thursday, April 6, 2017

Big vocabulary, or just big words?

The other day I was reading an article that used a couple of words I hadn't seen in a while, say anodyne or encomium.  I more-or-less remembered what they meant, and it was reasonably clear from context what they meant, but I still ended up looking them up.  I had two feelings about this: on the one hand, did the author really have to drag those out?  Why not just use Plain English?  On the other hand, they were correctly used, and apt, so what's the big deal?  I'm sure I've thrown out a word or two here that I could replaced with something more familiar, maybe with a little rewording.

But I'm not here to critique style.  What stuck in my head about this incident was how conspicuous an unusual word can be (and besides that, unusual words tend to stick out).  The article itself was probably a thousand words or so, maybe more, but it was those two that changed the whole reading experience.

This wasn't just because of the extra time it took to look the words up and make sure I knew what they meant.  That's a speed bump these days, reading an online article with search bar and dictionary app at the ready, maybe an extra minute, if that.  Even if I hadn't had a dictionary handy, I could have gotten the good out of the article without knowing exactly what those words meant.

The real issue lies deeper in human perception: We (and living things with recognizable brains in general) are finely tuned to notice discrepancies.  In a field of green grass it's the shape of that predator, or that prey,  or that particularly tasty plant, or whatever, that stands out.  In an article of a thousand words, it's the unusual ones that stand out.

I could go on and on, but it's worth particularly noting how important this is in social environments.  We can spot an unfamiliar accent in seconds.  We can spot someone dressed differently, or with different features than we usually see, well before we're even aware that we have.  The other night I was watching a TV show with a foreign actor playing an American, and everything was just fine until they said "not" with a British "o".   It didn't ruin the whole show -- this was a single vowel, not Dick Van Dyke in Mary Poppins -- but it was noticeable enough I still recall it out of an hour of tense drama.

(I have to say that dialect coaching has gotten a lot better over the past couple of decades.  Time was, movie stars talked like movie stars, with a kind of over-enunciated diction that didn't sound like anyone in real life, and if a character was meant to sound foreign, pretty much anything would do.   This is doubtless because in the early days of "talking pictures" the medium was still transitioning from the stage, a theatre actor was used to projecting up to the cheap seats and a fake accent was as good as a fake beard since everything was a hand-painted set and there probably weren't that many people in the audience who knew what a true Elbonian accent sounded like anyway.  Today pretty much every part of that is different, and we expect realism -- Billy Bob Thornton's all-too-valid complaint about "that Southern accent that no one in the South actually speaks with" notwithstanding.)

Where was I?

I've argued before that we often seem to care most about distinctions when they matter least. Vocabulary is largely another example of that.  Unless  you're reading Finnegan's Wake or something equally chewy, you're probably OK just skimming over anything you don't know and looking it up later.  Even that blowhard commentator with the two-dollar words is trying to get a point across and isn't going to let the vocabulary get completely in the way.

As a corollary to that, you don't need to know very many unusual words in order to stand out.  If you know a few dozen and use them appropriately, you'll almost certainly draw attention (if you learn a few dozen and use them inappropriately you'll also draw attention, but probably not the kind you want).  This can happen naturally if you run across a rare-ish but useful word or two in your reading from time to time and hold onto it for future use.  There's something nice about, say, cogent that is hard to reword cleanly, the distinction between terse and concise is sometimes worth making, and so on.

Contrast that with the average human vocabulary.  This is a hard thing to measure, but if you've heard something on the order of "uneducated people have a vocabulary of 2000 words while educated people know 20,000", rest assured that's complete bunk.  If we're measuring vocabulary, we have to measure "listemes", that is, things that you just have to learn by rote because you can't work them out from their parts.

This includes all kinds of things:
  • proper names of people and places
  • distinct senses of words, particularly small words like out and by, which can have quite a few, depending on how you count.
  • idioms large and small, like in touch or look up (in its non-literal senses) to classics like red herring, two-dollar word and that's the way the cookie crumbles.
  • Cultural references, which are kind of like names and kind of like idioms
  • Fine points that we don't generally think of as idioms, but are idiomatic nonetheless, like fried egg meaning a particular way of frying an egg, as distinct from scrambling an egg or -- for whatever reason -- trying to fry a whole egg in a pan without removing the shell
I'm not trying to give a full taxonomy of things-that-you-just-have-to-learn, but I hope that gives the general idea.   The main point is that there are lots and lots of these, the categories they might fall into are somewhat arbitrary, and how many you know doesn't have a great deal to do with how many literary classics you've read.

I'm not really familiar with the research on this, but my understanding is that the average person knows somewhere in the hundreds of thousands of listemes, and a large portion of them are commonly understood.  On top of those, we can add a smaller portion of jargon, slang or sesquipedalianisms.  That part, people will notice.  But it's a relatively small part.