Intermittent Conjecture: AGI, goals and influence

While putting together What would superhuman intelligence even mean? I took out a few paragraphs that seemed redundant at the time. While I think that post is better for the edit, when I re-read the deleted material, I realized that there was one point in them that I didn't explicitly make in the finished post. Here's the argument (If you have "chess engine" on your AI-post bingo card, yep, you can mark it off yet again. I really do think it's an apt example, but I'm even getting tired of mentioning it):

When it comes to question what are the implications of AGI?, actual intelligence is one factor among many. A superhuman chess engine poses little if any risk. A simple non-linear control system that can behave chaotically is a major risk if it's controlling something dangerous.

To the extent that a control system with some sort of general superintelligence is hard to predict and may make decisions that don't align with our priorities, it would be foolhardy to put it directly in charge of something dangerous. Someone might do that anyway, but that's a hazard of our imperfect human judgment. A superhuman AI is just one more dangerous thing that humans have the potential to misuse.

The more interesting risk is that an AI with limited control of something innocuous could leverage that into more and more control, maybe through extortion -- give the system control of the power plants or it will destroy all the banking data -- or persuasion -- someone hooks a system up to social media where its accounts convince people in power to put it in charge of the power plants.

These are worthy scenarios to contemplate. History is full of examples of human intelligences extorting or persuading people to do horribly destructive things, so why would an AGI be any different? Nonetheless, in my personal estimation, we're still quite a ways from this actually happening.

Current LLMs can sound persuasive if you don't fact-check them and don't let them go on long enough to say something dumb -- which in my experience is not very long -- but what would a chatbot ask for? Whom would it ask? How would the person or persons carry out its instructions? (I initially said "its will", rather than "its instructions", but there's nothing at all to indicate that a chatbot has anything resembling will)

You could imagine some sort of goal-directed agent using a chatbot to generate persuasive arguments on its behalf, but, at least as it stands, I'd say the most likely goal-directed agent for this would a human being using a chatbot to generate a convincing web of deception. But human beings are already highly skilled at conning other human beings. It's not clear what new risk generative AI presents here.

Certainly, an autonomous general AI won't trigger a cataclysm in the real world if it doesn't exist, so in that sense, the world is safer without it. Eventually, though, the odds are good that something will come along that meets DeepMind's definition of AGI (or ASI). Will that AI's skills include parlaying whatever small amount of influence it starts with into something more dangerous? Will its goals include expanding its influence, even if we don't think they do at first?

The idea of an AI with seemingly harmless goals becoming an existential threat to humanity is a staple in fiction (and the occasional computer game). It's good that people have been exploring it, but it's not clear what conclusions to draw from those explorations, beyond a general agreement that existential threats to humanity are bad. Personally, I'm not worried yet, at least not about AGI itself, but I've been wrong many times before.

Intermittent Conjecture

Wednesday, September 25, 2024

AGI, goals and influence

No comments:

Post a Comment