True Wit is Nature to advantage dress'd
What oft was thought, but ne'er so well express'd
-- Alexander Pope
How did humans come to have language?
There is, to put it mildly, a lot we don't know about this. Apart from the traditional explanations from various cultures, which are interesting in their own right, academic fields including evolutionary biology, cognitive science and linguistics have had various things to say about the question, so why shouldn't random bloggers?
In what follows, please remember that the title of this blog is
Intermittent Conjecture. I'm not an expert in any of those three fields, though I've had an amateur interest in all three for years and years. Real research requires careful gathering of evidence and checking of sources, detailed knowledge of the existing literature, extensive review and in general lots of time and effort. I can confidently state that none of those went into this post, and anything in here should be weighed accordingly. Also, I'm not claiming any original insight. Most likely, all the points here have already been made, and better made, by someone else already.
With that said ...
In order to talk about how humans came to have language, the first question to address is what does it mean to have language at all. Language is so pervasive in human existence that it's surprisingly hard to step back and come up with an objective definition that captures the important features of language and doesn't directly or indirectly amount to "It's that thing people do when they talk (or sign, or write, or ...) in order to communicate information."
We want to delimit, at least roughly, something that includes all the ways we use language, but excludes other activities, including things that we sometimes call "language", but that we somehow know aren't "really" language, say body language,
the language of flowers or, ideally, even computer languages, which deliberately share a number of features with human natural languages.
Since language is often considered something unique to humans, or even something that makes us human, it might be tempting to actively try to exclude various ways that other animals communicate, but it seems better to me just to try to pin down what we mean by human language and let the chips fall where they may when it comes to other species.
For me, some of the interesting features of language are
- It can communicate complex, arbitrary structures from one mind to another, however imperfectly.
- It is robust in the face of noise and imperfection (think of shouting in a loud music venue or talking with someone struggling with a second language).
- It tolerates ambiguity, meaning that (unlike in computer languages and other formal systems) ambiguity doesn't bring a conversation to a halt. In some cases it's even a useful feature.
- Any given language provides multiple ways to express the same basic facts, each with its own particular connotations and emphasis.
- Different languages often express the same basic facts in very different ways.
- Related to these, language is fluid across time and populations. Usage changes over time and varies across populations.
- It can be communicated by a variety of means, notably speech, signing and writing.
- From an evolutionary point of view, it has survival value.
I'd call these functional properties, meaning that they relate mainly to what language does without saying anything concrete about how it does it. Structurally (from here on I'll tend to focus on spoken/written language, with the understanding that it's not the whole story),
That is, whatever the medium, words are produced and received one at a time, though there can be a number of "side channels" such as pitch and emphasis, facial expressions and hand gestures.
- The mapping between a word and its meaning is largely arbitrary (though you can generally trace a pretty elaborate history involving similar words with similar meanings).
- Vocabulary is extensible.
We can coin words for new concepts. This is true only for certain kinds of words, but where it can happen it happens easily.
- Meaning is also extensible
We can apply existing words in new senses and again this happens easily.
- The forms used adjust to social conditions.
You speak differently with your peers after work than you would to your boss at work, or to your parents as a child, or to your prospective in-laws, and so forth
- The forms used adjust to particular needs of the conversation, for example which details you want to emphasize (or obscure).
- Some concepts seem to more tightly coupled to the structure of a particular language than others.
For example, when something happened or will happen in relation to when it is spoken of is generally part of the grammar, or marked by a small, closed set of words, or both.
- On the other hand, there is wide variety in exactly how such things are expressed.
Different languages emphasize different distinctions. For example, some languages don't specially mark singular/plural, or past/present, though of course they can still express that there was more than one of something or that something happened yesterday rather than today. Different languages use different devices to convey basic information like when something happened or what belongs to whom.
- Syntax, in the form of word order and inflection (changing the forms of words, as with changing dog to dogs or bark to barked or barking), collectively seem to matter in all languages, but the exact way in which they matter, and the degree to which each matters, seem to be unique to any given language. Even closely related languages generally differ in the exact details.
There are plenty of other features that could each merit a separate post, such as honorifics (Mr. Hull) and diminutives (Davey), or how accent and vocabulary are such devastatingly effective in-group markers, or how metaphors work, or what determines when and how we choose to move words around to focus on a topic, or why some languages build up long words that equate to whole sentences of short words in other languages, or why in some languages directional words like to and of take on grammatical meaning, or why different languages break down verb tenses in different ways, or can use different words for numbers depending on what's being counted, and so on and so on ...
Many of these features of language have to do with the interplay between cognition -- how we think -- and language -- how we express thoughts. The development of cognition must have been both a driver and a limiting factor in the development of language, but we are almost certainly still in the very early stages of understanding this relationship.
For example, languages generally seem to have a way of nesting one clause inside another, as in The fence that went around the house that was blue was red. How would this arise? In order to understand such a sentence, we need some way of setting aside The fence while we deal with that went around the house that was blue and then connecting was red back to the fence in order to understand that the fence is red and the house is blue. To a compugeek, this means something like a stack, a data structure for storing and retrieving things such that the last thing stored is the first thing retrieved.
Cognitively, handling such a sentence is like veering off a path on some side trip and returning to pick up where you left off, or setting aside a task to handle some interruption and then returning to the original task. Neither of these abilities is anywhere near unique to humans, so they must older than humanity, even though we are the only animals that we know of that seem to use them in communication.
These cognitive abilities are also completely separate from a large number of individual adaptations of our vocal apparatus, which do seem to be unique to us, notably fine control of breathing and of the position of the tongue and shape of the mouth. While these adaptations are essential to our being able to speak as fluently as we do, they don't have anything to do with what kinds of sentences we can express, just how well we can do so using spoken words. Sign languages get along perfectly well without them.
In other words, it's quite possible we were able to conceive of structures like "I saw that the lion that killed the wildebeest went around behind that hill over there" without being able to put them into words, and that ability only came along later. There's certainly no shortage, even in modern humans, of things that are easy to think but hard to express (I'd give a few examples, but ...). The question here, then, is not "How did we develop the ability to think in nested clauses?" but "How did we come to use the grammatical structures we now see in languages to communicate such thoughts?"
There's a lot to evolution, and it has to be right up there with quantum mechanics as far as scientific theories that are easy to oversimplify, draw unwarranted conclusions from, or to get outright wrong, so this next bit is even less precise than what I've already said. For example, I'm completely skirting around major issues of population genetics -- how a gene (useful or not) spreads (or doesn't) in a population.
Let's try to consider vocabulary in an evolutionary context. I pick vocabulary to start with because it's clearly distinct from grammar. Indeed one of the useful features of a grammar is that you can plug an arbitrary set of words into it. Conversely, one requirement for developing language as we know it is the ability to learn and use a large and expandable vocabulary. Without that, and regardless of the grammatical apparatus, we do not account for the way people actually use language.
Suppose some animal has the ability to make one kind of call when a it spots particular predator and a different call for another predator, in such a way that is conspecifics (animals of the same species) can understand and react appropriately. That's two calls (three if you count not making any call) and it's easy to see how that could be useful in not getting eaten. Again, this is far from unique to us (see
here, and search for "vervets" in the post, for example).
Now suppose some particular animal is born with the ability to make a third call for some other hazard, say a large branch falling (this is more than a bit contrived, but bear with me). A large branch falls, the animal cries out ... and no one does anything. The ability to make new calls isn't particularly useful without the ability to understand new calls. But suppose that nobody did anything because they didn't know what the new call meant, but they were able to connect "that oddball over there made a funny noise" with "a big branch fell". The next time a big branch falls and our three-call-making friend cries out, everyone looks out and scatters to safety. Progress.
I'm more than a bit skeptical that the ability to make three calls rather than two would arise by a lucky mutation, but I think there are still two valid points here:
First, the ability to comprehend probably runs ahead of the ability to express, and certainly new ways to express are much less likely to catch on if no one understands what they mean. Moreover, comprehension is useful in and of itself. Whether or not my species is able to make calls that signal specific scenarios, being able to understand other species' calls is very useful (when a vervet makes a predator call, other species will take appropriate action as well), as is the ability to match up new calls with their meanings from context and examples.
In other words, the ability to understand a large vocabulary is liable to develop even without the ability to express a large vocabulary. For a real-life example, at least some domestic dogs can understand many more human words than (as far as anyone can tell) they can produce distinct barks and similar sounds, and certainly more human words than they can themselves produce.
Second, this appears to be a very common pattern in evolution. Abilities that are useful in one context (distinguishing the different calls of animals around you) become useful in other contexts (developing a system of specialized calls within your own species). The general pattern is known as
exaptation (or cooption, or formerly and more confusingly as pre-adaptation).
Let's suppose that the local population of some species can potentially understand, say, dozens of distinct calls (whether their own or those of other species), but its ability to produce distinct calls is limited. If some individual comes along with the gift of being able to produce more distinct calls, then that will probably increase that individual's chances of surviving -- because its conspecifics will learn the new calls and so increase everyone's chance of survival -- and at least potentially its chances of reproducing, if only because there will be more potential mates around if fewer of them get eaten.
If that particular individual fails to survive and reproduce, the conditions are still good for some other individual to come along with the ability to produce a bigger vocabulary, perhaps through some entirely different mechanism. This in turn doesn't preclude some future individual from being born with the ability to produce a larger vocabulary through yet a third mechanism, or either of the original two. If there are multiple mechanisms for doing something advantageous, the chances of it taking hold in the long run are better (I'm pretty sure, but I don't know if an actual biologist would agree. Also, this isn't particular to vocabulary.).
If the community as a whole develops the tendency to find larger vocabularies attractive, so much the better, though the math starts to get hairy at this point. Sexual selection is a pretty good way of driving traits to extremes -- think peacocks and male walruses -- so it's quite plausible that a species that starts to develop larger and larger vocabularies of calls could take this quite far, past the point of immediate usefulness. You then have a population with a large vocabulary ready for an environment where it makes more of a difference.
In short, even some ability to produce distinct calls for different situations is useful, and it's no surprise many animals have it. The ability to produce a large and expandable variety of distinct calls for different situations also looks useful, but also seems harder to evolve, considering that it's fairly rare. Taking this a step further, we appear to be unique in our ability to produce and distinguish thousands of distinct vocabulary items, though as always there's quite a bit we still don't know about communication in other species.
It's clear that other animals can distinguish, and in some cases produce, non-trivial vocabularies, even if it's not particularly common. How do you get from there to our as-far-as-we-know-unique abilities? The usual answer for how complex traits evolve is "a piece at a time".
In order to find a (very hypothetical) evolutionary pathway from an extensible collection of specialized calls to what we call language today, we want to find a series of small steps that each add something useful to what's already there without requiring major restructuring. Some of those, in no strict order except where logically necessary, might be:
- The ability to refer to a class of things without reference to a particular instance
This is one aspect of what one might call "abstract concepts". As such, it doesn't require any new linguistic machinery beyond the ability to make and distinguish a reasonably large set of calls (which I'll call
words from here on out), but it does require a cognitive shift. The speaker has to be able to think of, say,
wolf without referring to a particular wolf trying to sneak up. The listener has to realize that someone saying "wolf" may not be referring to a wolf currently sneaking up on them. Instead, if the speaker is pointing to a set of tracks it might mean "a wolf went here", or if pointing in a particular direction, maybe "wolves come from over there".
This may seem completely natural to us, but it's not clear who, if anyone else besides us, can do this. Lots of animals can distinguish different types of things, but being able to classify is different from being aware that classes exist. An apple-sorting machine can sort big from small without understanding "big" or "small". I say "it's not clear" because devising an experiment to tell if something does or doesn't understand some aspect of abstraction is difficult, in no small part because there's
a lot of room for interpretation of the results.
[Re-reading this and taking Earl's comment into account, I think I've conflated two different kinds of abstraction. Classifying "wolf" as opposed to "a wolf" is probably more basic than I have it here. For example, a vervet will give the call for a particular kind of predator. It doesn't have to develop a call for each individual animal, and a good thing, because that would only work for individuals that it had seen before. A behavioralist would probably say this is all stimulus-response based on particular characteristics -- give the leopard call in response to the smell or sound of a leopard, and so on, and fair enough.
Classification, then, would be more a matter of connecting particular attributes -- sound, smell, shape or whatever -- to physical things with those attributes. The progression, as Earl suggests, would be leopard smell/sound --> leopard --> the leopard that just disappeared into the trees over there. That is, it requires more cognitive machinery to be able to conceive of leopard as any animal that smells this way and/or makes this sound and/or has pointy ears and a tail or whatever, and it requires a different piece of machinery -- a sort of object permanence -- to conceive of an absent leopard and connect that thing that was here but isn't any more to leopard and get that leopard that was here but isn't any more -- D.H 28 Oct. 2021]
- The ability to designate a quality such as "big" or "red" without reference to any particular thing with that quality.
This is similar to the previous item, but for adjectives rather than nouns. From a language standpoint it's important because it implies that you can mix and match qualities and things (adjectives and nouns). A tree can be big, a wolf can be big and a wolf can be gray without needing a separate notion of "big tree", "big wolf" and "gray wolf". An adjective is a
predicate that applies to something rather than standing alone as a noun does.
As I understand it, the widely-recognized stages of language development in humans are babbling, single words, two-word sentences and "all hell breaks loose". A brain that can handle nouns and predicates is ready for two-word sentences consisting of a predicate and something it applies to. This is a very significant step in communication and it appears to be quite rare, but linguistically it's nearly trivial. A grammar to describe it has one rule and no recursion (rules that refer, directly or indirectly, to themselves).
As a practical matter, producing a two-word sentence means signifying a predicate and an object that it applies to (called an argument). Understanding it means understanding the predicate, understanding the argument and, crucially, understanding that the predicate applies to the argument. If you can distinguish predicates from objects, order doesn't even matter. "Big wolf!" is just as good as "Wolf big!" or even a panicked sequence of "Wolf wolf big wolf big big wolf!" (which, to be fair, would require recursion to describe in a phrase-structure grammar).
From a functional point of view, the limiting factor to communicating such concepts is not grammar but the ability to form and understand the concepts in the first place.
Where do we go from predicate/argument sentences to something resembling what we now call language? Some possible next steps might be
- Predicates with more than one argument.
The important part here is that you need a way to distinguish the arguments. In wolf big, you know that big is the predicate and wolf is the argument and that's all you need, but in see rabbit wolf, where see is the predicate and rabbit and wolf are arguments, how do we tell if the wolf sees the rabbit or the rabbit sees the wolf? There are two solutions, given that you're limited to putting words together in some particular order
Either the order of words matters, so see rabbit wolf means one thing and see wolf rabbit means the other, or there's a way of marking words according to what role they play, so for example see wolf-at rabbit means the rabbit sees the wolf and see wolf rabbit-at means the wolf sees the rabbit. There are lots of possible variations, and the two approaches can be combined. Actual languages do both, in a wide variety of ways.
From a linguistic point of view, word order and inflection (ways of marking words) are the elements of syntax, which (roughly speaking) provides structure on top of a raw stream of words. Languages apply syntax in a number of ways, allowing us to put together complex sentences such as this one, but you need the same basic tools even for simple three-word sentences. Turning that around, if you can solve the problem of distinguishing the meaning of a predicate and two arguments, you have a significant portion of the machinery needed for more complex sentences.
- Pronouns, that is, a way to designate a placeholder for something without saying exactly what that something is, and connect it with a specific meaning separately.
Cognitively, pronouns imply some form of memory beyond the scope of a simple sentence. Linguistically, their key property is that their meaning can be redefined on the fly. A noun like wolf might refer to different specific wolves at different times, but it will always refer to some wolf. A pronoun like it is much less restrained. It could refer to any noun, depending on context.
Pronouns allow for more compact sentences, which is useful in itself since you don't have to repeat some long descriptive phrase every time you want to say something new about, say, the big red house across the street with the oak tree in the yard. You can just say that house or just it if the context is clear enough.
More than this, though, by equating two things in separate sentences they allow linear sequences of words to describe non-linear structures, for example I see a wolf and it sees me. By contrast, in I see a wolf and a wolf sees me it's not clear whether it's the same wolf and we don't necessarily have the circular structure of two things seeing each other.
- The ability to stack up arbitrarily many predicates: big dog, big red dog, big red hairy dog, etc.
I left this for last because it leads into a bit of a rabbit hole concerning the role of nesting and recursion in language. I'm going to dig into that a bit here by way of arguing that some of the analytic tools commonly used in analyzing language may not be particularly relevant to its development. Put another way, "how did language develop" is not the same question as "how did the structures we work with in analyzing language develop".
A common analysis of phrases like
big red hairy dog uses a recursive set of rules like
a noun phrase can be a noun by itself, or
a noun phrase can be an adjective followed by a noun phrase
This is much simpler than a full definition of
noun phrase in a real grammar, and it's not the only way to analyze noun phrases, but it shows the recursive pattern that's often used in such an analysis. The second definition of
noun phrase refers to
noun phrase recursively. The noun phrase on the right-hand side will be smaller, since it has one less adjective, so there's no infinite regress. The example,
big red hairy dog, breaks down to
big modifying
red hairy dog, which breaks down to
red modifying
hairy dog, which breaks down to
hairy modifying
dog, and
dog is a noun phrase by itself. In all there are four noun phrases, one by the first rule and three by the second.
On the other hand, if you can conceive of a dog being big, red and hairy at the same time, you can just as well express this with two-word sentences and a pronoun:
dog big. it red. it hairy. The same construction could even make sense without the pronouns:
dog big. red. hairy. Here a listener might naturally assume that
red and
hairy have to apply to
something, and the last thing we were talking about was a dog, so the dog must be red and hairy as well as big.
This is not particularly different from someone saying
I saw the movie about the duck. Didn't like it, where the second sentence clearly means
I didn't like it and you could even just say
Didn't like and still be clearly understood, even if
Didn't like by itself sounds a bit odd.
From a grammatical standpoint (at least for a constituency grammar) these all seem quite different. In
big red hairy dog, there's presumed to be a nested structure of noun phrases. In
dog big. it red. it hairy you have three sentences with a simple noun-verb structure and in
dog big. red. hairy. you have one two-word sentence and two fragments that aren't even sentences.
However, from the point of view of "I have some notion of predicates and arguments, and multiple predicates can apply to the same argument, now how do I put that in words?", they seem pretty similar. In all three cases you say the argument and the predicates that apply to it and the listener understands that the predicates apply to the argument because that's what predicates do.
I started this post with the idea of exploring how language as we now know it could develop from simpler pieces such as those we can see in other animals. The title is a nod to the question of
What good is half an eye? regarding the evolution of complex eyes such as we see in several lineages, including our own and (in a different form) in cephalopods. In that case, it turns out that there are several intermediate forms which provide an advantage even though they're not what we would call fully-formed eyes, and it's not hard to trace a plausible pathway from basic light-sensitive "eye spots" to what we and many other animals have.
The case of language seems similar. I think the key points are
- Cognition is crucial. You can't express what you can't conceive of.
- The ability to understand almost certainly runs ahead of the ability to express.
- There are plausibly a number intermediate stages between simple calls and complex language (again, the account above is completely speculative and I don't claim to have identified the actual steps precisely or completely).
- Full grammar, in the sense of nested structures described by recursive rules, may not be a particularly crucial step.
- A purely grammatical analysis may even obscure the picture, both by failing to make distinctions (as with the jump from "wolf" meaning "this wolf right there" to it meaning "wolf" in the abstract) and by drawing distinctions that aren't particularly relevant (as with the various forms of big red hairy dog).