Another sidebar working up to talking about the hide-and-seek demo.
Few words express more exasperation than "I just told you that!", and -- fairly or not -- there are few things that can lower someone's opinion of another person's cognitive function faster than not remembering simple things.
Ironically for systems that can remember much more data much more permanently and accurately than we ever could, computers often seem to remember very little. For example, I just tried a couple of online AI chatbots, including one that claimed to have passed a Turing test. The conversations went something like this:
Me: How are you?
Bot: I'm good.
Me: That's great to hear. My name is Fred. My cousin went to the store the other day and bought some soup.
<a bit of typical AI bot chat, pattern-matching what I said and parroting it back, trying stock phrases etc.>
Me: By the way, I just forgot my own name. What was it?
<some dodge, though one did note that it was a bit silly to forget one's own name>
Me: Do you remember what my cousin bought the other day?
<some other dodge with nothing to do with what I said>
The bots are not even trying to remember the conversation, even in the rudimentary sense of scanning back over the previous text. They appear to have little to no memory of anything before the last thing the human typed [This has changed with the advent of LLM-driven bots with large context windows. I argue in later posts that they're still not that smart, but the point here is that lack of short-term memory makes the previous generation of bots seem even less smart, and I think that point stands -- D.H. Sep 2024].
Conversely, web pages suddenly got a lot smarter when sites started using cookies to remember state between visits and again when browsers started to be able to remember things you'd typed in previously. There's absolutely nothing anyone would call AI going on, but it still makes the difference between "dumb computer" and "not so bad".
When I say "memory" here, I mean the memory of things that happen while the program is running. Chess engines often incorporate "opening books" of positions that have occurred in previous games, so they can play the first few moves of a typical game without doing any calculation. Neural networks go through a training phase (whether guided by humans or not). One way or another, that training data is incorporated into the weightings that determine the networks behavior.
In some sense, those are both a form of memory -- they certainly consume storage on the underlying hardware -- but they're both baked in beforehand. A chess engine in a tournament is not updating its opening book. As I understand it, neural network-based chess engines don't update their weights while playing in a tournament, but can do so between rounds (but if you're winning handily, how much do you really want to learn from your opponents' play?).
Likewise, a face recognizer will have been trained on some particular set of faces and non-faces before being set loose on your photo collection. For better or worse, its choices are not going to change until the next release (unless there's randomization involved).
Chess engines do use memory to their advantage in one way: they tend to remember a "cache" of positions they've already evaluated in determining previous moves. If you play a response that the engine has already evaluated in detail, it will have a head start in calculating its next move. This is standard in AB engines, at least (though it may be turned off during tournaments). I'm not sure how much it applies for NN engines. To the extent it does apply, I'd say this absolutely counts as "memory makes you smarter".
Overall, though, examples of what we would typically call memory seem to be fairly rare in AI applications. Most current applications can be framed as processing a particular state of the world without reference to what happened before. Recognizing a face is just recognizing a face.
Getting a robot moving on a slippery surface is similar, as I understand it. You take a number of inputs regarding the position and velocity of the various members and whatever visual input you have, and from that calculate what signals to send to the actuators. There's (probably?) a buffer remembering a small number of seconds worth of inputs, but beyond that, what's past is past (for that matter, there's some evidence that what we perceive as "the present" is basically a buffer of what happened in the past few seconds).
Translating speech to text works well enough a word or phrase at a time, even if remembering more context might (or might not) help with sorting out homonyms and such. In any case, translators that I'm familiar with clearly aren't gathering context from previous sentences. It's not even clear they can remember all of the current sentence.
One of the most interesting things about the hide-and-seek demo is that its agents are capable of some sort of more sophisticated memory. In particular, they can be taught some notion of object permanence, usually defined as the ability to remember that objects exist even when you can't see them directly, as when something is moved behind an opaque barrier. In purely behavioral terms, you might analyze it as the ability to change behavior in response to objects that aren't directly visible, and the hide-and-seek agents can definitely do that. Exactly how they do that and what that might imply is what I'm really trying to get to here ...
Getting a robot moving on a slippery surface is similar, as I understand it. You take a number of inputs regarding the position and velocity of the various members and whatever visual input you have, and from that calculate what signals to send to the actuators. There's (probably?) a buffer remembering a small number of seconds worth of inputs, but beyond that, what's past is past (for that matter, there's some evidence that what we perceive as "the present" is basically a buffer of what happened in the past few seconds).
Translating speech to text works well enough a word or phrase at a time, even if remembering more context might (or might not) help with sorting out homonyms and such. In any case, translators that I'm familiar with clearly aren't gathering context from previous sentences. It's not even clear they can remember all of the current sentence.
One of the most interesting things about the hide-and-seek demo is that its agents are capable of some sort of more sophisticated memory. In particular, they can be taught some notion of object permanence, usually defined as the ability to remember that objects exist even when you can't see them directly, as when something is moved behind an opaque barrier. In purely behavioral terms, you might analyze it as the ability to change behavior in response to objects that aren't directly visible, and the hide-and-seek agents can definitely do that. Exactly how they do that and what that might imply is what I'm really trying to get to here ...
"sentence" in the next to last line of the next to last para should be "sentences."
ReplyDeleteNot to the point, but I've thought for some time that a great deal of what we think of as intelligence (In the "Jack is not as smart as Jill" sense, is really intellectual honesty, the ability and willingness to recognize that something is true even when we really would ratheer it weren't.
Object permanence is one of the milestones in early childhood cognitive development. Not clear that this is exactly a memory thing.
Typo fixed, thanks.
ReplyDeleteIf knowing things are still there even when you can't directly sense them isn't memory, what is it then?