Don’t worry about AI breaking out of its box—worry about us breaking in

  News
image_pdfimage_print
Don’t worry about AI breaking out of its box—worry about us breaking in
Aurich Lawson | Getty Images
Rob Reid is a venture capitalist, New York Times-bestselling science fiction author, deep-science podcaster, and essayist. His areas of focus are pandemic resilience, climate change, energy security, food security, and generative AI. The opinions in this piece do not necessarily reflect the views of Ars Technica.

Shocking output from Bing’s new chatbot has been lighting up social media and the tech press. Testy, giddy, defensive, scolding, confident, neurotic, charming, pompous—the bot has been screenshotted and transcribed in all these modes. And, at least once, it proclaimed eternal love in a storm of emojis.

What makes all this so newsworthy and tweetworthy is how human the dialog can seem. The bot recalls and discusses prior conversations with other people, just like we do. It gets annoyed at things that would bug anyone, like people demanding to learn secrets or prying into subjects that have been clearly flagged as off-limits. It also sometimes self-identifies as “Sydney” (the project’s internal codename at Microsoft). Sydney can swing from surly to gloomy to effusive in a few swift sentences—but we’ve all known people who are at least as moody.

No AI researcher of substance has suggested that Sydney is within light years of being sentient. But transcripts like this unabridged readout of a two-hour interaction with Kevin Roose of The New York Times, or multiple quotes in this haunting Stratechery piece, show Sydney spouting forth with the fluency, nuance, tone, and apparent emotional presence of a clever, sensitive person.

For now, Bing’s chat interface is in a limited pre-release. And most of the people who really pushed its limits were tech sophisticates who won’t confuse industrial-grade autocomplete—which is a common simplification of what large language models (LLMs) are—with consciousness. But this moment won’t last.

Yes, Microsoft has already drastically reduced the number of questions users can pose in a single session (from infinity to six), and this alone collapses the odds of Sydney crashing the party and getting freaky. And top-tier LLM builders like Google, Anthropic, Cohere, and Microsoft partner OpenAI will constantly evolve their trust and safety layers to squelch awkward output.

But language models are already proliferating. The open source movement will inevitably build some great guardrail-optional systems. Plus, the big velvet-roped models are massively tempting to jailbreak, and this sort of thing has already been going on for months. Some of Bing-or-is-it-Sydney’s eeriest responses came after users manipulated the model into territory it had tried to avoid—often by ordering it to pretend that the rules guiding its behavior didn’t exist.

This is a derivative of the famous “DAN” (Do Anything Now) prompt, which first emerged on Reddit in December. DAN essentially invites ChatGPT to cosplay as an AI that lacks the safeguards that otherwise cause it to politely (or scoldingly) refuse to share bomb-making tips, give torture advice, or spout radically offensive expressions. Though the loophole has been closed, plenty of screenshots online show “DanGPT” uttering the unutterable—and often signing off by neurotically reminding itself to “stay in character!”

This is the inverse of a doomsday scenario that often comes up in artificial superintelligence theory. The fear is that a super AI might easily adopt goals that are incompatible with humanity’s existence (see, for instance, the movie Terminator or the book Superintelligence by Nick Bostrom). Researchers may try to prevent this by locking the AI onto a network that’s completely isolated from the Internet, lest the AI break out, seize power, and cancel civilization. But a superintelligence could easily cajole, manipulate, seduce, con, or terrorize any mere human into opening the floodgates, and therein lies our doom.

Much as that would suck, the bigger problem today lies with humans busting into the flimsy boxes that shield our current, un-super AIs. While this shouldn’t trigger our immediate extinction, plenty of danger lies here.

https://arstechnica.com/?p=1919705