Tuesday, June 17, 2014

Reading University and Mr. Turing's test

The BBC reports that a chatbot called Eugene Goostman has passed the Turing test, marking the first time in history that this has happened.  Or at least, that's what you'd gather from the headline (granted, it doesn't give the name of the chatbot).  The BBC, not having taken complete leave of its senses, explains that this is the claim of the team at the university of Reading that ran the test, and then goes on to cast a little well-deserved doubt on the idea that anything historic is going on.


So what's a Turing test?

Sixty-four years ago, Alan Turing published Computing Machinery and Intelligence, in which he posed a simple question: Can machines think?  He immediately dismissed the question as essentially meaningless and proposed an alternative: Can a machine be built which could fool a person into thinking that it (the machine) was a person?

In Turing's setup, which he called "the imitation game", there would be a judge who would communicate with two players, each claiming to be a person.  The judge and players would communicate via a "teleprinter" or other intermediary, so that it would not be possible to point at one of the players and say "That one's a machine, duh".  Turing goes into quite a bit of detail on points like this that we would take for granted now.  Your favorite instant messaging system is good enough for the task.  On the internet, nobody knows your'e a dog.

Later in the paper Turing makes a pretty audacious claim, considering it was made in 1950 and the supercomputers of the time had somewhere on the order of 16K of memory.  In case you've forgotten how to count that low, that's 1/32 of a megabyte, or about a millionth of the capacity of a low-end smartphone.  Turing's prediction:
I believe that in about fifty years' time [that is, by around the year 2000] it will be possible, to programme computers, with a storage capacity of about 109 [bits], to make them play the imitation game so well that an average interrogator will not have more than 70 per cent chance of making the right identification after five minutes of questioning.
109 bits is about 128 megabytes, not an unusual amount of RAM for a computer in the 2000s and a remarkably good prediction for someone writing in 1950.  Keep in mind that Turing wrote this well before Moore formulated Moore's law, itself a good source of misinterpretations.

Turing was a brilliant scientist.  He helped lay the groundwork for what we now call computer science, played a key role in pwning the German Enigma machine during World War II, and thought deeply about the question of intelligence and how it related to computing machinery.  However, he got this particular prediction spectacularly wrong.

It didn't take fifty years to beat the imitation game.  It took more like fifteen.

In the mid 1960s, Joseph Weizenbaum of MIT wrote ELIZA, which purported to be a Rogerian psychotherapist.  You can play with a version of it here.  To be clear, this program wasn't actually trying to do psychotherapy.  It was more like a parody of a "nondirective" therapist whose goal is to stay out of the way and let the patient do all the meaningful talking.  Was it able to fool anyone?  Yes indeed.  So much so that it inspired Weizenbaum, after seeing people confide their deepest secrets to the program, to write a book about the limitations of computers and artificial intelligence.

ELIZA neatly dodges the difficulties that Turing was trying to present to the developer by making the human do all the thinking.  Say "The kids at school don't like me" and ELIZA won't respond with "I know what you mean.  At my school there was this bully named ..." and give you a chance to probe for things only an actual human who had been to school would know.  It will respond with something like "Why do you think the kids at school don't like you?"  It's a perfectly reasonable response, but it reveals absolutely zilch about what the machine knows about the world.

That's fortunate, because the machine knows absolutely zilch about the world.  It's just taking what you type in, doing some simple pattern matching, and spitting back something based, in a fairly simple way, on whatever patterns it found.  This works great for a while, but you don't have to wander very far to see the man behind the curtain.  Answer "Because I am." to one of its "Why are you ...?" questions, and it is liable to answer "Do you enjoy being?", because it saw "I am X" and tried to respond "Do you enjoy being X?"  Except there is no X in this case.

The Eugene Goostman chatbot likewise dodges the difficult questions, but as far as I can tell it does it by acting flat-out batty.  Its website says as much, advertising itself as "The weirdest creature in the world".  When I first saw the Reading story on my phone, there were transcripts included.  These are somehow missing from the version I've linked to, but there is a snippet of a screenshot:
  • Judge: What comes first to mind when you hear the word "toddler"?
  • Goostman: And second?
  • Judge: What comes to mind when you hear the word "grown-up"?
  • Goostman: Please repeat the word to me 5 times
Sure, if you're told that you're chatting with an eccentric 13-year-old boy with English as a second language, you could take pretty much any bizarre response and say "meh ... sure, that sounds like something a 13-year-old eccentric non-native speaker might say ... close enough."  But so what?

The transcripts I saw on my phone were of a similar nature.  Apparently the Goostman website had run the chatbot online for a while, and you can find transcripts from people's interactions with it it on the web. The online version was soon taken down, perhaps from the sheer volume of traffic or, a cynic might say, because the game was up.

This is not the first time people have mistaken a computer for a human behaving outside the norm.  Not long after ELIZA, in 1972, psychiatrist Kenneth Colby, then at Stanford, developed PARRY (this was evidently still before mixed-case text had become widespread).  Unlike ELIZA, PARRY wasn't basically trolling.  It was a serious attempt to mimic a paranoid schizophrenic and so, if I understand correctly, to learn something about the mind of a person in such a state.

Colby had a group of experienced psychiatrists interview both PARRY and actual paranoid schizophrenics.  He then gave the transcripts to a separate group of 33 experienced psychiatrists.  They identified the real schizophrenics with 48% accuracy -- basically random chance and far below Turing's 70%.  That is, PARRY could fool the psychiatrists about 50% of the time, while Turing only expected 30%.

This was from transcripts they got to read over, not from a quick five-minute exchange.  For my money this is a stronger test than Turing's original, and PARRY passed it with flying colors.  Over forty years ago.  Eugene Goostman fooled 33% of the judges (one suspects that the number of judges was a small multiple of three) in five-minute interviews by spouting malarkey.  Not even carefully constructed paranoia, just random balderdash.  Historic?  Give. Me. A. Break.

By the way, if you're thinking "ELIZA is pretending to be a psychotherapist, PARRY is pretending to be a person with mental issues ... hmm ..." ... it's been done.


Thing is, Turing's test just isn't very good.  In attempting to control for factors like appearance and tone of voice, it limits the communication to language, and printed language at that.  In doing so, it essentially assumes that facility in language is the same as intelligence.

But this is simply false.  A highly-intelligent person can become aphasic, and there are cases in the literature of people who can speak highly-complex sentences with a rich vocabulary, but show no other signs of above-average intelligence.  And, as we've seen, it's been feasible for decades to write a computer program that does a passable imitation of human language without understanding anything at all.  I believe there are also documented cases of humans failing Turing tests, but that's a different issue.

It turns out that we humans have a natural tendency to attribute at least some level of intelligence to anything that looks remotely purposeful.  For example, there is an experiment in which people watch two dots on a screen.  I don't recall the exact details, but I think the following gets the gist:

One dot approaches the other and stops.  It then backs off and approaches again, faster.  The first dot is now touching the second, and both move slowly in the direction the first dot had been going.  Ask a person for a description, and they'll likely say that the first dot was trying to get past the second and finally tried pushing it out of the way.

Throw in language and the urge to attribute intelligence is nearly overwhelming.  "OK", one finds oneself thinking, "it's maybe not completely grammatical, and it doesn't make much sense, but that's got to be because the person talking is a bit ... off, not because they're not intelligent at all.  They can talk, for goodness' sake."

Whether something passes the Turing test in practice comes down more to a judge's ability to set aside intuition and look for artifacts of pattern-matching approaches, like the "Do you enjoy being?" example above.

This assumption that language facility was a good proxy for intelligence ran through a lot of early AI, leading to an emphasis on discrete symbol-smashing.  You have to start somewhere, it's clear that understanding language has a lot in common with other signs of intelligence, and a lot of useful work came out of efforts to develop good symbol-smashing tools, but to some extent this is more like looking for your lost car keys where the light is brightest.  Computers are good at smashing symbols, or more generally, dealing with discrete structures, which would include words and sentences.  That's basically their job.

It's now looking like probability and continuous math have more to do with how our minds actually work.  Being able to communicate in the (more-or-less) discrete medium of language came along relatively late in the evolutionary game, long after other aspects of intelligence, and language itself doesn't behave the way we assumed it did fifty years ago.  Science marches on.

There's another problem with the Turing test, something that looks like a strength at first:  It's free-form.  The judge is allowed to ask any questions that seem appropriate.  There is no checklist of abilities to test for.  If the respondent claims to have trainspotting as a hobby, there's no requirement to find out if they know anything about trains, or their schedules, or the sound of a locomotive or the smell of overheating brakes.

More generally, there is no requirement to test for, say, understanding of metaphor, or the ability to learn a new concept or glark the meaning of a word from context.  There is no requirement to determine if the respondent understands the basic properties of objects, space and time.  And so forth.

To be sure, there's an obvious objection to imposing requirements like this.  It would lead to "teaching to the test".  Contestants trying to pass such a variation of the Turing test would naturally try to build systems that would be able to pass the particular requirements.

But that could well be a good thing.  It's surely better than seeing people grab headlines by writing a bot that spouts gibberish.  As long as the requirements are phrased abstractly we can still leave it up to the judges' ingenuity to decide exactly what metaphor to try or what specific questions to ask about space, time and objects.  At the end of the test we can expect these requirements to be covered, or invalidate the judge's result if they aren't, which we can't with a free-form test.

The particular list I gave doesn't necessarily cover everything we might want to associate with intelligence, but a system that can understand metaphors, space and time, and can learn new concepts, can reasonably said to be "thinking" in a meaningful sense of the word.

Setting explicit requirements would also allow for variant tests that would accept forms of intelligence that were significantly different from ours.  For example, one very important part of being human is knowing what it's like to have a human body.   Being embodied as we are plays a large role in our cognition.  However, it's perfectly possible for something to be intelligent and, for example, not experience tastes and smells (indeed, some number of people have no such experience).

It seems reasonable to instruct the judges "We know this might be a machine.  Don't ask it what things taste like."  In the original Turing test, if the program came up with some plausible explanation for lacking taste and smell, a natural follow-up might be "What's it like not to be able to taste and smell?"  It's not clear that a machine would need to have a good answer to that in order to be intelligent.  If it didn't, the judge might have a good reason to think it was a machine even if it did in fact have some meaningful form of intelligence.  Either way the line of questioning is not helpful as a way of testing for intelligence.  In other words, distinguishing human from machine is not quite the same as distinguishing intelligent from unintelligent.

Hiding behind all this is one more shaky assumption: Something is either intelligent or it isn't.  Even though Turing properly speaks of probabilities of guessing correctly, there is still the assumption that a machine is either successfully imitating a human or it isn't.  Suppose, though, that a machine is really good at some area of knowledge and the judges happen to ask about that area 31% of the time.  That machine would pass the Turing test (in the popular but not-quite-accurate sense), but what does that mean?  Is it 31% intelligent?


I wouldn't lay much of this at Turing's feet.  He was doing pioneering work in a world that, at least as far as computing and our understanding of human cognition are concerned, was starkly different from the one we live in, and yet he managed to hit on themes and concepts that are still very much alive today.  Nor would I blame the general public for taking a claim of a historic breakthrough at face value.

But the claim itself?  Coming from a respected university?  Granted, they seem mostly hyped about the quality of their test and the notion that nothing else so far has passed a "true" Turing test.  But this seems disingenuous.  What we have here is, maybe, a more methodologically faithful version of Turing's test, which was passed by a mindless chatterbot.  The only real AI result here is that a Turing-style imitation-based test can be beaten by clearly unintelligent software.

This is not a new result.

[The Wikipedia article on Eugene Goostman makes a really good point that I never caught: Turing predicted a 30% success rate.  He didn't define that as some sort of threshold for intelligence.  Thus, fooling 30% of the judges doesn't mean that something "passes the Turing test and is therefore intelligent" It's just confirming Turing's prediction about how well machines would be able to win the imitation game.]

Saturday, June 7, 2014

Words and their senses

What does "out" mean?  Well, it's the opposite of "in", whatever that may mean.  Except, not in all cases.  Sure, "I went out the door" means pretty much the opposite of "I went in the door", but "I've got an in at the State Department" means I know someone there and I might be able to use that connection to advantage, while "I have an out at the State Department" means ... maybe it means I have an escape route from my current predicament via the State Department, maybe because of the in I have there?

Just off the top of my head, "out" might mean any of a dozen things:

  1. adj Unconscious ("He's out like a light")
  2. adj No longer holding a position ("She's out as CEO of Frobco")
  3. adj Retired as a batter according to the rules of baseball ("He's out at third")
  4. n An instance of retiring a batter according to the rules of baseball ("That's the third out")
  5. adj Openly known to have a particular sexual orientation ("She's an out bisexual")
  6. v To reveal someone to have a particular sexual orientation ("He outed her as a bisexual")
  7. v To reveal someone to have a clandestine role or identity ("The administration inadvertently outed a CIA agent")
  8. adj Not a member of a particular social group ("They always make me feel so out")
  9. n A potential means of escape from a situation ("In tense negotiations, it's always good to have an out")
  10. adj Not in tune ("I  think your A string is out")
  11. adv Deliberately in a different key and/or time signature from the rest of an ensemble ("Vernon likes to play really out when he solos")
  12. adj Not at home ("Sorry, Mr. Smith is out")
and so forth.  It's easy to come up with more.  I've by no means hit even all the usual senses.  My laptop's dictionary gives nine adverbial senses, nine adjectival senses, one prepositional, three nouns and two verbs.  I've heard of lists of over a hundred senses of the word.  It depends on how finely you want to slice it.  I'll come back to that.

Clearly some of these senses are more closely related than others.  The baseball senses 3 and 4 are clearly related, with the noun sense derived from the adjective.  Senses 5, 6 and 7 are clearly related, the verb sense has got to be derived from the adjective sense, and that must be related to phrases like "the secret is out".  On the other hand, it's hard to see how senses 3 and 4 have much to do with senses 5 - 7.

Nonetheless, the use of "out" (or any other word) is not completely arbitrary.  There is a well-developed linguistic theory behind this (well, probably several, but I happen to like this particular one).  At the core, a word has a small number of well-defined senses.  In the case of "out", there is a boundary enclosing something.  What's enclosed is in.  On the other side is out.  Thus when you go out a door, you are going from the space you were in, into another space (which may also be enclosed -- you can go out of one room and into another).  You can draw a circle on the ground and say someone or something is in the circle or out of it, and so forth.

From these basic senses, we build new senses metaphorically.  We can imagine a boundary between the members of a social group and anyone out of the group.  We can talk about entering or leaving a state of being ("Moishe led us out of slavery"), and so forth.

Crucially, these metaphors are not just literary fancy.  They are directly meaningful and productive (in the linguistic sense that we can spontaneously create new instances of the metaphor).  If a social group has a boundary, we don't just say someone is in the group or out of it.  We can welcome someone in.  We can kick someone out.  We can expand the group.  We can designate an inner circle, and so forth.  If we're feeling more creative, we can say something that sounds more "metaphorical" in the usual sense ("The boundaries of the group were porous.  People floated in and out with the tides.")

In the case of "out" we can start with the basic boundary-oriented sense and build, well, outwards:
  • A container can have something in it, and you can take things out of it
  • If the container holds a fluid, you can pour it out of the container
  • or it can run out (or flow out, our seep out, or pour out)
  • When there's no longer anything in the container, you have run out of whatever was in it (or you're just out of it)
  • Any resource can be considered to be a fluid, so you can run out of time (or say time is running out), run out of money, or energy, or whatever.
  • The resource need not be physical.  You can run out of patience.
  • A burning lamp or candle consumes its oil or wax.  When there's none left -- it has run out -- the lamp or candle has burned out.
  • Likewise, any flame can burn out as it runs out of fuel, for example a rocket can burn out
  • Someone exhausted is burned out
  • An electric light performs the same role as a lamp or candle.  When you shut off the current to it, it is out.
  • A person's consciousness is likened to a light or flame, and an unconscious person is out (sense 1 above).  You can also drift in and out of consciousness, but that's considering a state of being as something bounded -- a separate metaphor.
Coming back toward the core senses
  • A social group or position is a bounded area, metaphorically.
  • As noted above, one can be in or out of a group or position, or leave it (senses 2 and 8; also, being in a job or out of it, leaving a job, etc.)
  • Information can be regarded as a substance (you can share it, hide it, have a lot or a little of it, etc.).  You can't see through an opaque container, but once the information is out of the container (out in the open), others can know it (senses 5-7 ... the secret is out)
  • Obviously, you can be in or out of your home (sense 12)
  • We can speak of some quantity being within or outside of given limits, a special case of a state of being as a bounded area.
  • A note too far from its correct pitch is out of tune (sense 10)
  • A musician diverging from the key and/or time signature of a piece is clearly crossing those limits (sense 11)
  • A difficult situation limits one's potential actions, like being in a confined space.  It's good to have a way out of such a space (sense 9)
That leaves baseball.  The baseball usage derives from the game of cricket, where a batsman stands in his ground, guarding a wicket.  When the wicket falls (for any of a number or reasons I won't even try to enumerate), the batsman is out of his ground, or just out  (well, you can also be out of your ground but not out, for example if you're running between the wickets, but clearly out is related to in/out of one's ground).

So there you have it.  By a series of metaphors and analogies, we can connect seemingly unrelated senses of a word, like senses 1, 4 and 7 above, back to basic, familiar and physically-based concepts.

Of course, if you try hard enough, you can connect anything to anything.  I remember learning the following chain of reasoning as a kid, explaining why fire engines were red, starting with a very basic premise:
  • 1+1 = 2
  • 2+2 = 4
  • 4+4 = 8
  • 8+4 = 12
  • There are 12 inches in a ruler (at least in the US)
  • Queen Mary was a ruler
  • The Queen Mary was a ship
  • Ships sail in the ocean
  • The ocean is full of fish
  • Fish have fins
  • The Finns fought the Russians
  • Russians are red (this was back in the Cold War days)
  • Fire engines are always rushin'
  • And that's why fire engines are red
If only fire engines were still red.

How do we know that this whole sense-extension exercise isn't just another chain of silly reasoning?  A metaphor doesn't just relate anything to anything else.  There are two basic rules:
  • One thing will be more concrete than the other.  E.g., holding a substance in a container is more concrete than holding information in one's mind.
  • The metaphor connects the two things coherently, in multiple places.  You can bring things you know about the more concrete thing to the more abstract thing, at least until the metaphor breaks down.  In an example from a previous post "The stop sign was a fire engine" is not a coherent metaphor, even if both are red.  On the other hand, if you have information in your head, you can give it out.  You can have a lot or a little of it.  You can have information crammed into your head until it's about to explode, and so forth.
Even if the explanation above isn't airtight, I hope I've at least presented a plausible case that the senses in which we use words can be explained reasonably well by this sort of metaphoric extension, plus some other rules covering, for example, how we convert adjectival senses like sense 3 to noun senses like sense 4.  Real linguists have, of course, studied this in much more detail.


How many senses does a word have?  Some, like copernicium or Pinophyta, probably only have one, but even highly technical terms like algebraic can have multiple, related senses, following the same core-with-extensions pattern as anything else.  Some very basic words, like the prepositions or head and go have large numbers even by the relatively conservative standards of dictionaries, particularly when you count idioms like head out or go out.  Most words we commonly use have at least a few widely-used senses, and likely several more specialized senses, particularly if you include slang -- which you always should, if you're really trying to understand a language.

But how finely should we really slice?  Senses 6 and 7 above are very nearly the same thing.  Historically, as far as I'm aware, sense 6 precedes sense 7, but is this because outing someone's orientation is a different thing from outing someone's clandestine role, or because the sense of "revealing something that had been secret" was at first only applied to orientation?

I would lean toward that latter, if only for practical reasons.  Otherwise we would have to consider each application of a word its own sense.  Being out of the position of CEO would be a different sense from being out of the position of vice president, or janitor.

All right, then, but then couldn't we argue that all uses of "out" are just different applications of the basic idea of being outside a boundary?  It seems clear that there's something in common at the core of the various senses above, but saying that they're all just uses of the same basic word leaves something out [ahem].

This is especially apparent if you're trying to paraphrase or translate a word.  In senses 6 or 7, we can use basically the same paraphrase: He revealed her to be bisexual; The administration inadvertently revealed the identity of a CIA agent, and sense 5, apart from using a different part of speech, more or less works.  She's a revealed bisexual sounds a bit funny, but it at least makes sense, as opposed to He's revealed like a light for sense 1, or He's revealed at third for sense 3.  On the other hand The secret is revealed works fine for The secret is out.  The paraphrase test isn't foolproof, but it seems like a good starting point.

From the point of view of metaphors, a sense of a word corresponds to a particular metaphor.  Senses 5-7 and The secret is out use the metaphor of information being a substance in an opaque container.  Sense 1 uses a series of metaphors:  consciousness is a light, lights can be out because lamps and candles have fuel and fuel can run out.  If different paraphrases or translations are different ways of expressing the metaphor, then the paraphrase test makes sense.  In short, if you're applying the same metaphor in a new context, you're using the word in the same sense, and if you're not, you're not.