Thursday, October 21, 2010

Parts of speech

We all learned about nouns, verbs and their friends in elementary school.  A typical list is
  • noun
  • pronoun
  • adjective
  • verb
  • adverb
  • preposition
  • conjunction
  • interjection
(See here for more detail)

These are generally useful categories, but if you're really trying to figure out a language, you have to slice a little finer.  For example there are
  • Transitive verbs (ones that take an object, like hit)
  • Intransitive verbs (ones that don't take an object, at least not typically, like sit)
  • Modal auxiliary verbs (like can, could, may and might, which some dialects can stack up into lovely constructions like might could and may can)
  • Phrasal verbs, like get up
  • Countable nouns, like tree
  • Uncountable (or mass) nouns like water
  • Pluralia tanta (always-plural nouns -- singular plurale tantum) like scissors
  • Comparable adjectives, like tall
  • Uncomparable adjectives, like dead, or NP-complete
  • Determiners, including articles like the or an, but also adjectives like some, any, all, this or that
  • Comparators, like more, most, less and least
Such distinctions go some way towards predicting what words can and can't be used together.  For example, you don't normally use comparators like more with uncomparable adjectives:
  • Smith is more famous than Jones.
  • *Graph isomorphism is more NP-complete than 3-Sat.
(The * at the beginning indicates something that wouldn't normally be said.  I'm fudging with "wouldn't normally be said" instead of "incorrect" or "ungrammatical" as it is notoriously easy in general to invent contexts in which a given construct would make sense.)

This being language, the boundaries aren't perfectly crisp.  Mass nouns don't generally appear as plurals, but there are a few exceptions, for example
  • When referring to some standard serving, as in I ordered three waters.
  • When referring to different types of a given substance, as in She preferred the wines of Bordeaux.
Some nouns can work both ways, for example
  • Hand me a brick.
  • We need five tons of brick.
And let's not even get into whether it's OK to say "more unique" even though unique is supposed to be an absolute and therefore uncomparable.

The word that got me thinking of all this was summit, as a verb meaning "to reach the summit of".  As the "of" would suggest, this verb is generally transitive -- it takes an object, as in Apa Sherpa summited Everest for the twentieth time (which he actually did, last May).  However, the object is often omitted, as in Apa Sherpa summited for the twentieth time.  In contexts where this would be said, it would be abundantly clear that Everest was the peak in question.  In particular, it doesn't matter how many other peaks he might have climbed how many times.

So is summit then acting as an intransitive verb, or a transitive one with an implied object?  I tend towards the latter, as would most grammarians, I believe.  But what about more common cases like sing?  In I sang, there is no implication that I sang any particular song, so one would think sing is acting intransitively.  But I must have been singing something.  Is it really acting transitively, but with an implied, unspecified object?  At some point, such qualifications cease to pull their own weight.  As the man said, volleyball is technically racketless team ping-pong, played with an inflated ball and raised net while standing on the table, but what does that buy us?

What interests me here is how grammar, which is by definition pure syntax, seems unable to stay cleanly separated from semantics.  For example, some mass nouns resist the plural
  • * I would like three neutroniums.
  • * He was a connoisseur of neutroniums.
In the first example, one does not serve neutronium.  In the second, there is only one kind of neutronium.  How would we detect such errors?  I would think the process is something like
  • In a construction like three neutroniums, if the object is a substance, we expect it to mean a particular sort of container full of the substance.
  • But that doesn't make sense in the case of neutronium.
In that view, the syntax is fine and the error is semantic.  Mass nouns, then, are syntactically nouns, but ones whose plural forms have particular semantic features.  Similarly, whether a verb is used transitively or intransitively is a syntactic distinction, but whether there is an implied object is semantic concern.

Except that "object" is a syntactic concept.  One way of reconciling this is to posit that the syntactic form Apa Sherpa summited, for example, is somehow transformed into the form Apa Sherpa summited Everest, with Everest as the object.  The choice of "transformed" here deliberately suggests transformational grammar, though I'm not sure that's completely appropriate.

Another would be to posit that the form Apa Sherpa summited gets transformed into some internal structure, in which the concept represented by summited requires something acting in the semantic role of "thing which is summited", which we may as well call an "object", albeit with some risk of confusion.  This putative internal structure would be describable in words, for example Apa Sherpa summited, or He summited, or Apa Sherpa summited Everest, or Everest was summited by Apa Sherpa and so on, but it would be an essentially different structure from any of those sentences.  As I very dimly undestand it, this is more along the lines of cognitive grammar.

Thursday, October 7, 2010

How much do we know?

The question here is not how much does humanity know collectively, or how much do we know about some given topic compared to how much we don't, or what portion of things can we reasonably say we "know" as opposed to believing or being "fairly sure" or such.  Those are all interesting questions, but what I'm after here is more literal.  How much does a typical human being know, by some objective measure?

To get the flavor of the question, it has been estimated that the average high-school graduate knows about 40,000 vocabulary items, or listemes.  A listeme is a word, word part or collection of words that you have to memorize in order to understand, as opposed to something you can understand by breaking it into parts you already know. For example
  • There are two listemes in "listemes": listeme itself and the plural marker -s.  If you understand both of those, you can understand their combination [Or three: list, -eme and -s, if you're a linguist and familiar with morpheme, phoneme and such -- see below -- D.H.].
  • Typical acronyms and such are listemes: USA or LOL, for example, even though the parts they stand for are well known, because you have to know which words the letters stand for.
  • Idioms are listemes.  Knowing flying and saucer is not enough to know flying saucer.
  • Proper names are listemes.  You have to learn that Muskegon is a city and that Michael Jordan is a former NBA player, even if you already know that Michael and Jordan are names.
  • To some extent, different senses of words count as different listemes.  Knowing that you can eat off a plate doesn't tell you how to plate something in gold or what it means for a batter to step up to the plate.
  • Listemes are somewhat subjective.  Someone well-versed in Latin might see intermittent or conjecture as made up of simpler parts, while for most of us they're one listeme each, and of course different languages have largely different listemes.
Each listeme binds a largely arbitrary sign to a meaning.  At a bare minimum, then, our typical high school grad knows 40,000 items, however much knowledge an item might represent.  Now, I make no pretense of knowing how the mind really represents such things, but the title of this blog is Intermittent Conjecture, so it seems that by a miraculous coincidence I've left myself room to speculate.

I would guess that typical listemes are associated with bundles of memories and their relations to other memories.   For example, plate might perhaps conjure up images of typical dinner plates and memories of eating and setting the table and such; images of plated items one may have encountered or a representation of the plating process; images from a baseball game with a batter in stance or a runner sliding into home.

Similarly to how words may be defined with other words, these bundles of images will typically overlap.  A memory of a dinner plate may include an image of a table, or of eating, making "something you put on the table" or "something you eat off of" natural, if incomplete, answers to "What's a plate?"

I've used "memory" and "image" fairly interchangeably here, but I suspect that the images that concepts are built on are nothing like fully detailed pictures or movies.  Rather, they're highly abstracted, with only the relevant features retained.

By this line of speculation, those 40,000 listemes might represent 400,000 or 1,000,000 or more images, grouped into concepts and with arbitrary signs attached.  There is much, much more to the picture, of course, but again we're just trying to get a rough estimate of what's in a typical brain.

Words are only one window into the contents of the mind.  We also know things we can't easily put into words, which one reason I had wanted to talk about different kinds of knowledge and formal vs informal education.  We learn to walk instinctively, and so it's much harder to characterize what sort of things one must "know" in oder to walk, yet if we can learn something, there must be some kind of knowledge involved.  Likewise for other skills like skiing or playing the trumpet, which we learn consciously and in many cases formally, but without necessarily learning a lot of vocabulary in the process.

We can also make associations unconsciously and non-verbally.  When the pioneer Lucky Bill in the post I linked to above looks off and sees bad weather brewing in the clouds, he probably doesn't have words for what he's sensing, but it's definitely something he's learned and knows, just as he knows how to let his horse know it's time to go.  This knowledge may well be built on the same sort of memories and images that we pin language onto, but it's not readily accessible to language.

If we take a mental image -- an abstracted memory -- as the basic unit of knowledge, with images grouped into concepts which may or may not have language attached, then it seems plausible that an adult human could have millions or tens of millions of such images.  We must also allow some capacity for storing the relations among the various images, concepts signs and so forth, but such "metadata" tends to be much smaller than the data it helps organize (see this post on the other blog for more on that).

Being a compugeek, the handiest objective measure of information I have is the byte.  Leaving aside that images may differ widely in size and taking an image to be on the order of a megabyte -- a completely wild guess which may well be off by orders of magnitude -- that would put our mental storage capacity on the order of terabytes or dozens of terabytes.

Until fairly recently, that was a lot of storage, but these days it's not a staggering figure.  As far as putting together something of the same order as a human brain, we may just now be reaching a necessary, but not necessarily sufficient, technological milestone.

I'm happy to learn that the wild stab in the dark given above turns out to be reasonably in line with other wild stabs in the dark.  See for example this Google Answers page (I didn't have a lot of luck tracing this back to the literature, but since it's all guesswork I'm not going to worry about it).