I a previous post entitled Experiences, mechanisms, behaviors and LLMs, I discussed a couple of strawman objections to the idea that an LLM isn't doing anything particularly intelligent: that it's "just manipulating text" and it's "just doing calculations".
The main argument was that "just" is doing an awful lot of work there. Yes, an LLM is "just" calculating and manipulating text, but it's not "just" doing so in the same way as an early system like ELIZA, which just turned one sentence template into another, or even a 90s-era Markov chain, which just generates text based on how often which words appeared directly after which others in a sample text.
In both of those cases, we can point at particular pieces of code or data and say "those are the templates it's using", or "there's the table of probabilities" and explain directly what's going on. Since we can point at the exact calculations going on, and the data driving them, and we understand how those work, it's easy to say that the earlier systems aren't understanding text the way we do.
We can't do that with an LLM, even if an LLM generating text is doing the same general thing as a simple Markov chain. We can say "here's the code that's smashing tensors to produce output text from input text", and we understand the overall strategy, but the data feeding that strategy is far beyond our understanding. Unlike the earlier systems, there's way, way too much of it. It's structured, but that structure is much too complex to fit in a human brain, at least as a matter of conscious thought. Nonetheless, the actual of behavior shows some sort of understanding of the text without having to stretch the meaning of the word "understanding".
In the earlier post, I also said that even if an LLM encodes a lot about how words are used and in which contexts -- which it clearly does -- the LLM doesn't know the referents of those words -- it doesn't know what it means for water to be wet or what it feels like to be thirsty -- and so it doesn't understand text in the same sense we do.
This feels similar to appeals like "but a machine can't have feelings", which I generally find fairly weak, but that wasn't quite the argument I was trying to make. While cleaning up a different old post (I no longer remember which one), I ran across a reference that sharpens the picture by looking more closely at the calculations/manipulations an LLM is actually doing.
I think the first post I mentioned, on experiences etc. puts a pretty solid floor under what sort of understanding an LLM has of text, namely that it encodes some sort of understanding of how sentences are structured and how words (and somewhat larger units) associate with each other. Here, I hope to put a ceiling over that understanding by showing more precisely in what way LLMs don't understand the meaning of text in the way that we do.
Taking these together, we can roughly say that LLMs understand the structure of text but not the meaning, but the understanding of structure is deep enough that an LLM can extract information from a large body of text that's meaningful to us.
In much of what follows, I'm making use of an article in Quanta Magazine that discusses how LLMs do embeddings, that is, how they turn a text (or other input) into a list of vectors to feed into the tensor-smashing machine. It matches up well with papers I've read and a course I've taken, and I found it well-written, so I'd recommend it even if you don't read any further here.
Despite the name, a Large Language Model doesn't process language directly. The core of an LLM drives the processing of a list of tokens. A token is a vector -- an ordered list of numbers of a given length -- that represents a piece of the actual input.
To totally make up an example, if vectors are three numbers long, and a maps to (1.2, 3.0, -7.5), list maps to (6.4, -3.2, 1.6), of maps to (27.5, 9.8, 2.0), and vectors maps to (0.7, 0.3, 6.8), then a list of vectors maps to [(1.2, 3.0, -7.5), (6.4, -3.2, 1.6), (27.5, 9.8, 2.0), (0.7, 0.3, 6.8)].
Here I'm using parentheses for vectors, which in this case always have three numbers, and square brackets for lists, which can have any length (including zero for the empty list, []). In practice, the vectors will have many more than three components. Thousands is typical. The list of vectors encoding a text will be however long the text is.
The particular mapping from input to tokens is called the embedding*. The overall idea is to encode similarities along various dimensions. There are (practically) infinitely many ways to do this mapping. Over time this has evolved from a mostly-manual process, to an automated process using hand-written code, to the current state of the art, which uses machine learning techniques on large bodies of text. The first two approaches are pretty easy to understand.
An ML-produced embedding, on the other hand, is a mass of numbers created during a training phase. This mass of numbers drives a generic algorithm that turns words into large vectors. While the numbers themselves don't really lend themselves to easy analysis, people have noticed interesting patterns in the results of applying embedding.
Because the model-building phase is looking at streams of text, it's not surprising that the embedding itself captures information about what words appear in what contexts in that text. For example in typical training corpora, dog and cat appear much more often in contexts like my pet ___ than, say, chair does. They are also likely to occur in conjunction with terms like paw and fur, while other words won't, and so forth.
While we don't really understand exactly how the embedding-building stage of training an LLM extracts relations like this, the article in Quanta gives the example that in one particular embedding the vector for king minus the one for man plus the one for woman is approximately equal to the one for queen (you add or subtract vectors component by component, so (1.2, 3.0, -7.5) + (6.4, -3.2, 1.6) = (7.6, -0.2, -5.9) and so on).
It's long been known that use in similar contexts correlates with similarity in meaning. But we're talking about implied similarities in meaning here, not actual meanings. You can know an analogy like cat : fur :: person : hair without knowing anything about what a cat is, or a person, or fur or hair.
That may seem odd from our perspective. A person would solve a problem like cat : fur :: person : ? by thinking about cats and people, and what about a person is similar to fur for a cat, because we're embodied in the world and we have experience of hair, cats, fur and so forth. Odd as it might seem to know that cat : fur :: person : hair without knowing what any of those things is, that's essentially what's going on with an LLM. It understands relations between words, based on how they appear in a mass of training text, but that's all it understands.
But what, exactly, is the difference between understanding how a word relates to other words and understanding what it means? There are schools of thought that claim there is no difference. The meaning of a word is how it relates to other words. If you believe that, then there's a strong argument that an LLM understands words the same way we do, and about as well as we do.
Personally, I don't think that's all there is to it. The words we use to express our reality are not our reality. For one thing, we can also use the same words to express completely different realities. We can use words in new ways, and the meaning of words can and does shift over time. There are experiences in our own reality that defy expression in words.
Words are something we use to convey meaning, but they aren't that meaning. Meaning ultimately comes from actual experiences in the real world. The way words relate to each other clearly captures something about what they actually mean -- quite a bit of it, by the looks of things -- but just as clearly it doesn't capture everything.
I have no trouble saying that the embeddings that current LLMs use encode something significant about how words relate to each other, and that the combination of the embedding and the LLM itself has a human-level understanding of how language works. That's not nothing. It's something that sets current LLMs apart from anything before them, and it's an interesting result. For one thing, it goes a long way toward clarifying what's understanding of the world and what's just understanding of how language works.
If an LLM is good at it, then it's something about how language works. If an LLM isn't good at it, then it's probably something about the world itself. I'll have a bit more to say about that in the next (shorter) post.
Because LLMs know about language, but not what it represents in the real world, we shouldn't be surprised that LLMs hallucinate, and we shouldn't expect them to stop hallucinating just because they're trained on larger and larger corpora of text.
The earlier post distinguished among behavior, mechanism and experience. An LLM is capable of linguistic behavior very similar to a person's.
The mechanism of an LLM may, or may not, be similar as far as language processing. We may well learn rules like the way that we use the in relation to nouns in a way that's similar to training an LLM. Whether that's the case or not, an LLM, by design, lacks a mechanism for tying words to anything in the real world. This probably accounts for much of the difference between what we would say and what an LLM would say.
All of this is separate from subjective experience. One could imagine a robot that builds up a store of interactions with the world, processes them into some more abstract representation and associates words with them. But even if that is more similar to what we do in terms of mechanism, it says nothing about what the robot might or might not be experiencing subjectively, even if it becomes harder to rule out the possibility that the robot is experiencing the world as we do.
* Wikipedia seems to think it's only an embedding if it's done using feature learning, but that seems overly strict. Mathematically, an embedding is any map from one domain into another, no matter how it's produced.