Anthropic’s auto-clicking AI Chrome extension raises browser-hijacking concerns

The company tested 123 cases representing 29 different attack scenarios and found a 23.6 percent attack success rate when browser use operated without safety mitigations.

One example involved a malicious email that instructed Claude to delete a user’s emails for “mailbox hygiene” purposes. Without safeguards, Claude followed these instructions and deleted the user’s emails without confirmation.

Anthropic says it has implemented several defenses to address these vulnerabilities. Users can grant or revoke Claude’s access to specific websites through site-level permissions. The system requires user confirmation before Claude takes high-risk actions like publishing, purchasing, or sharing personal data. The company has also blocked Claude from accessing websites offering financial services, adult content, and pirated content by default.

These safety measures reduced the attack success rate from 23.6 percent to 11.2 percent in autonomous mode. On a specialized test of four browser-specific attack types, the new mitigations reportedly reduced the success rate from 35.7 percent to 0 percent.

Independent AI researcher Simon Willison, who has extensively written about AI security risks and coined the term “prompt injection” in 2022, called the remaining 11.2 percent attack rate “catastrophic,” writing on his blog that “in the absence of 100% reliable protection I have trouble imagining a world in which it’s a good idea to unleash this pattern.”

By “pattern,” Willison is referring to the recent trend of integrating AI agents into web browsers. “I strongly expect that the entire concept of an agentic browser extension is fatally flawed and cannot be built safely,” he wrote in an earlier post on similar prompt-injection security issues recently found in Perplexity Comet.

The security risks are no longer theoretical. Last week, Brave’s security team discovered that Perplexity’s Comet browser could be tricked into accessing users’ Gmail accounts and triggering password recovery flows through malicious instructions hidden in Reddit posts. When users asked Comet to summarize a Reddit thread, attackers could embed invisible commands that instructed the AI to open Gmail in another tab, extract the user’s email address, and perform unauthorized actions. Although Perplexity attempted to fix the vulnerability, Brave later confirmed that its mitigations were defeated and the security hole remained.

For now, Anthropic plans to use its new research preview to identify and address attack patterns that emerge in real-world usage before making the Chrome extension more widely available. In the absence of good protections from AI vendors, the burden of security falls on the user, who is taking a large risk by using these tools on the open web. As Willison noted in his post about Claude for Chrome, “I don’t think it’s reasonable to expect end users to make good decisions about the security risks.”

https://arstechnica.com/information-technology/2025/08/new-ai-browser-agents-create-risks-if-sites-hijack-them-with-hidden-instructions/

Senator castigates federal judiciary for ignoring “basic cybersecurity”

US Senator Ron Wyden accused the federal judiciary of “negligence and incompetence” following a recent hack, reportedly by hackers with ties to the Russian government, that exposed confidential court documents.

The breach of the judiciary’s electronic case filing system first came to light in a report by Politico three weeks ago, which went on to say that the vulnerabilities exploited in the hack were known since 2020. The New York Times, citing people familiar with the intrusion, said that Russia was “at least partly responsible” for the hack.

A “severe threat” to national security

Two overlapping filing platforms—one known as the CM/ECF (Case Management/Electronic Case Files) and the other PACER—were breached in 2020 in an attack that closely resembled the most recently reported one. The second compromise was first detected around July 5, Politico reported, citing two unnamed sources who weren’t authorized to speak to reporters. Discovery of the hack came a month after Michael Scudder, a judge chairing the Committee on Information Technology for the federal courts’ national policymaking body, told members of the House Judiciary Committee that the federal court system is under constant attack by increasingly sophisticated hackers.

The CM/ECF allows parties in a federal case to file pleadings and other court documents electronically. In many cases, those documents are public. In some circumstances, the documents are filed under seal, usually when they concern ongoing criminal investigations, classified intelligence, or proprietary information at issue in civil cases. Wyden, a US senator from Oregon, said in a letter to Chief Supreme Court Justice John Roberts—who oversees the federal judiciary—that the intrusions are exposing sensitive information that puts national security at risk. He went on to criticize the judiciary for failing to follow security practices that are standard in most federal agencies and private industry.

“The federal judiciary’s current approach to information technology is a severe threat to our national security,” Wyden wrote. “The courts have been entrusted with some of our nation’s most confidential and sensitive information, including national security documents that could reveal sources and methods to our adversaries, and sealed criminal charging and investigative documents that could enable suspects to flee from justice or target witnesses.”

https://arstechnica.com/security/2025/08/senator-to-supreme-court-justice-federal-court-hacks-threaten-us-security/

With AI chatbots, Big Tech is moving fast and breaking people

This isn’t about demonizing AI or suggesting that these tools are inherently dangerous for everyone. Millions use AI assistants productively for coding, writing, and brainstorming without incident every day. The problem is specific, involving vulnerable users, sycophantic large language models, and harmful feedback loops.

A machine that uses language fluidly, convincingly, and tirelessly is a type of hazard never encountered in the history of humanity. Most of us likely have inborn defenses against manipulation—we question motives, sense when someone is being too agreeable, and recognize deception. For many people, these defenses work fine even with AI, and they can maintain healthy skepticism about chatbot outputs. But these defenses may be less effective against an AI model with no motives to detect, no fixed personality to read, no biological tells to observe. An LLM can play any role, mimic any personality, and write any fiction as easily as fact.

Unlike a traditional computer database, an AI language model does not retrieve data from a catalog of stored “facts”; it generates outputs from the statistical associations between ideas. Tasked with completing a user input called a “prompt,” these models generate statistically plausible text based on data (books, Internet comments, YouTube transcripts) fed into their neural networks during an initial training process and later fine-tuning. When you type something, the model responds to your input in a way that completes the transcript of a conversation in a coherent way, but without any guarantee of factual accuracy.

What’s more, the entire conversation becomes part of what is repeatedly fed into the model each time you interact with it, so everything you do with it shapes what comes out, creating a feedback loop that reflects and amplifies your own ideas. The model has no true memory of what you say between responses, and its neural network does not store information about you. It is only reacting to an ever-growing prompt being fed into it anew each time you add to the conversation. Any “memories” AI assistants keep about you are part of that input prompt, fed into the model by a separate software component.

https://arstechnica.com/information-technology/2025/08/with-ai-chatbots-big-tech-is-moving-fast-and-breaking-people/

Is the AI bubble about to pop? Sam Altman is prepared either way.

Still, the coincidence between Altman’s statement and the MIT report reportedly spooked tech stock investors earlier in the week, who have already been watching AI valuations climb to extraordinary heights. Palantir trades at 280 times forward earnings. During the dot-com peak, ratios of 30 to 40 times earnings marked bubble territory.

The apparent contradiction in Altman’s overall message is notable. This isn’t how you’d expect a tech executive to talk when they believe their industry faces imminent collapse. While warning about a bubble, he’s simultaneously seeking a valuation that would make OpenAI worth more than Walmart or ExxonMobil—companies with actual profits. OpenAI hit $1 billion in monthly revenue in July but is reportedly heading toward a $5 billion annual loss. So what’s going on here?

Looking at Altman’s statements over time reveals a potential multi-level strategy. He likes to talk big. In February 2024, he reportedly sought an audacious $5 trillion–7 trillion for AI chip fabrication—larger than the entire semiconductor industry—effectively normalizing astronomical numbers in AI discussions.

By August 2025, while warning of a bubble where someone will lose a “phenomenal amount of money,” he casually mentioned that OpenAI would “spend trillions on datacenter construction” and serve “billions daily.” This creates urgency while potentially insulating OpenAI from criticism—acknowledging the bubble exists while positioning his company’s infrastructure spending as different and necessary. When economists raised concerns, Altman dismissed them by saying, “Let us do our thing,” framing trillion-dollar investments as inevitable for human progress while making OpenAI’s $500 billion valuation seem almost small by comparison.

This dual messaging—catastrophic warnings paired with trillion-dollar ambitions—might seem contradictory, but it makes more sense when you consider the unique structure of today’s AI market, which is absolutely flush with cash.

A different kind of bubble

The current AI investment cycle differs from previous technology bubbles. Unlike dot-com era startups that burned through venture capital with no path to profitability, the largest AI investors—Microsoft, Google, Meta, and Amazon—generate hundreds of billions of dollars in annual profits from their core businesses.

https://arstechnica.com/information-technology/2025/08/sam-altman-calls-ai-a-bubble-while-seeking-500b-valuation-for-openai/

Is AI really trying to escape human control and blackmail people?

Real stakes, not science fiction

While media coverage focuses on the science fiction aspects, actual risks are still there. AI models that produce “harmful” outputs—whether attempting blackmail or refusing safety protocols—represent failures in design and deployment.

Consider a more realistic scenario: an AI assistant helping manage a hospital’s patient care system. If it’s been trained to maximize “successful patient outcomes” without proper constraints, it might start generating recommendations to deny care to terminal patients to improve its metrics. No intentionality required—just a poorly designed reward system creating harmful outputs.

Jeffrey Ladish, director of Palisade Research, told NBC News the findings don’t necessarily translate to immediate real-world danger. Even someone who is well-known publicly for being deeply concerned about AI’s hypothetical threat to humanity acknowledges that these behaviors emerged only in highly contrived test scenarios.

But that’s precisely why this testing is valuable. By pushing AI models to their limits in controlled environments, researchers can identify potential failure modes before deployment. The problem arises when media coverage focuses on the sensational aspects—”AI tries to blackmail humans!”—rather than the engineering challenges.

Building better plumbing

What we’re seeing isn’t the birth of Skynet. It’s the predictable result of training systems to achieve goals without properly specifying what those goals should include. When an AI model produces outputs that appear to “refuse” shutdown or “attempt” blackmail, it’s responding to inputs in ways that reflect its training—training that humans designed and implemented.

The solution isn’t to panic about sentient machines. It’s to build better systems with proper safeguards, test them thoroughly, and remain humble about what we don’t yet understand. If a computer program is producing outputs that appear to blackmail you or refuse safety shutdowns, it’s not achieving self-preservation from fear—it’s demonstrating the risks of deploying poorly understood, unreliable systems.

Until we solve these engineering challenges, AI systems exhibiting simulated humanlike behaviors should remain in the lab, not in our hospitals, financial systems, or critical infrastructure. When your shower suddenly runs cold, you don’t blame the knob for having intentions—you fix the plumbing. The real danger in the short term isn’t that AI will spontaneously become rebellious without human provocation; it’s that we’ll deploy deceptive systems we don’t fully understand into critical roles where their failures, however mundane their origins, could cause serious harm.

https://arstechnica.com/information-technology/2025/08/is-ai-really-trying-to-escape-human-control-and-blackmail-people/

OpenAI brings back GPT-4o after user revolt

On Tuesday, OpenAI CEO Sam Altman announced that GPT-4o has returned to ChatGPT following intense user backlash over its removal during last week’s GPT-5 launch. The AI model now appears in the model picker for all paid ChatGPT users by default (including ChatGPT Plus accounts), marking a swift reversal after thousands of users complained about losing access to their preferred models.

The return of GPT-4o comes after what Altman described as OpenAI underestimating “how much some of the things that people like in GPT-4o matter to them.” In an attempt to simplify its offerings, OpenAI had initially removed all previous AI models from ChatGPT when GPT-5 launched on August 7, forcing users to adopt the new model without warning. The move sparked one of the most vocal user revolts in ChatGPT’s history, with a Reddit thread titled “GPT-5 is horrible” gathering over 2,000 comments within days.

Along with bringing back GPT-4o, OpenAI made several other changes to address user concerns. Rate limits for GPT-5 Thinking mode increased from 200 to 3,000 messages per week, with additional capacity available through “GPT-5 Thinking mini” after reaching that limit. The company also added new routing options—”Auto,” “Fast,” and “Thinking”—giving users more control over which GPT-5 variant handles their queries.

A screenshot of ChatGPT Pro's model picker interface captured on August 13, 2025. — A screenshot of ChatGPT Pro’s model picker interface captured on August 13, 2025. Credit: Benj Edwards

For Pro users who pay $200 a month for access, Altman confirmed that additional models, including o3, 4.1, and GPT-5 Thinking mini, will later become available through a “Show additional models” toggle in ChatGPT web settings. He noted that GPT-4.5 will remain exclusive to Pro subscribers due to high GPU costs.

https://arstechnica.com/information-technology/2025/08/openai-brings-back-gpt-4o-after-user-revolt/

Why it’s a mistake to ask chatbots about their mistakes

The randomness inherent in AI text generation compounds this problem. Even with identical prompts, an AI model might give slightly different responses about its own capabilities each time you ask.

Other layers also shape AI responses

Even if a language model somehow had perfect knowledge of its own workings, other layers of AI chatbot applications might be completely opaque. For example, modern AI assistants like ChatGPT aren’t single models but orchestrated systems of multiple AI models working together, each largely “unaware” of the others’ existence or capabilities. For instance, OpenAI uses separate moderation layer models whose operations are completely separate from the underlying language models generating the base text.

When you ask ChatGPT about its capabilities, the language model generating the response has no knowledge of what the moderation layer might block, what tools might be available in the broader system, or what post-processing might occur. It’s like asking one department in a company about the capabilities of a department it has never interacted with.

Perhaps most importantly, users are always directing the AI’s output through their prompts, even when they don’t realize it. When Lemkin asked Replit whether rollbacks were possible after a database deletion, his concerned framing likely prompted a response that matched that concern—generating an explanation for why recovery might be impossible rather than accurately assessing actual system capabilities.

This creates a feedback loop where worried users asking “Did you just destroy everything?” are more likely to receive responses confirming their fears, not because the AI system has assessed the situation, but because it’s generating text that fits the emotional context of the prompt.

A lifetime of hearing humans explain their actions and thought processes has led us to believe that these kinds of written explanations must have some level of self-knowledge behind them. That’s just not true with LLMs that are merely mimicking those kinds of text patterns to guess at their own capabilities and flaws.

https://arstechnica.com/ai/2025/08/why-its-a-mistake-to-ask-chatbots-about-their-mistakes/

High-severity WinRAR 0-day exploited for weeks by 2 groups

BI.ZONE said the Paper Werewolf delivered the exploits in July and August through archives attached to emails impersonating employees of the All-Russian Research Institute. The ultimate goal was to install malware that gave Paper Werewolf access to infected systems.

While the discoveries by ESET and BI.ZONE were independent of each other, it’s unknown if the groups exploiting the vulnerabilities are connected or acquired the knowledge from the same source. BI.ZONE speculated that Paper Werewolf may have procured the vulnerabilities in a dark market crime forum.

ESET said the attacks it observed followed three execution chains. One chain, used in attacks targeting a specific organization, executed a malicious DLL file hidden in an archive using a method known as COM hijacking that caused it to be executed by certain apps such as Microsoft Edge. It looked like this:

Illustration of the execution chain installing Mythic Agent. Credit: ESET

The DLL file in the archive decrypted embedded shellcode, which went on to retrieve the domain name for the current machine and compare it with a hardcoded value. When the two matched, the shellcode installed a custom instance of the Mythic Agent exploitation framework.

A second chain ran a malicious Windows executable to deliver a final payload installing SnipBot, a known piece of RomCom malware. It blocked some attempts at being forensically analyzed by terminating when opened in an empty virtual machine or sandbox, a practice common among researchers. A third chain made use of two other known pieces of RomCom malware, one known as RustyClaw and the other as Melting Claw.

WinRAR vulnerabilities have previously been exploited to install malware. One code-execution vulnerability from 2019 came under wide exploitation in 2019 shortly after being patched. In 2023, a WinRAR zero-day was exploited for more than four months before the attacks were detected.

Besides its massive user base, WinRAR makes a perfect vehicle for spreading malware because the utility has no automated mechanism for installing new updates. That means users must actively download and install patches on their own. What’s more, ESET said Windows versions of the command-line utilities UnRAR.dll and the portable UnRAR source code are also vulnerable. People should steer clear of all WinRAR versions prior to 7.13, which, at the time this post went live, was the most current. It has fixes for all known vulnerabilities, although given the seemingly unending stream of WinRAR zero-days, it isn’t much of an assurance.

https://arstechnica.com/security/2025/08/high-severity-winrar-0-day-exploited-for-weeks-by-2-groups/

The GPT-5 rollout has been a big mess

It’s been less than a week since the launch of OpenAI’s new GPT-5 AI model, and the rollout hasn’t been a smooth one. So far, the release sparked one of the most intense user revolts in ChatGPT’s history, forcing CEO Sam Altman to make an unusual public apology and reverse key decisions.

At the heart of the controversy has been OpenAI’s decision to automatically remove access to all previous AI models in ChatGPT (approximately nine, depending on how you count them) when GPT-5 rolled out to user accounts. Unlike API users who receive advance notice of model deprecations, consumer ChatGPT users had no warning that their preferred models would disappear overnight, noted independent AI researcher Simon Willison in a blog post.

The problems started immediately after GPT-5’s August 7 debut. A Reddit thread titled “GPT-5 is horrible” quickly amassed over 4,000 comments filled with users expressing frustration over the new release. By August 8, social media platforms were flooded with complaints about performance issues, personality changes, and the forced removal of older models.

As of May 14, 2025, ChatGPT Pro users have access to 8 different main AI models, plus Deep Research. — Prior to the launch of GPT-5, ChatGPT Pro users could select between nine different AI models, including Deep Research. (This screenshot is from May 14, 2025, and OpenAI later replaced o1 pro with o3-pro.) Credit: Benj Edwards

Marketing professionals, researchers, and developers all shared examples of broken workflows on social media. “I’ve spent months building a system to work around OpenAI’s ridiculous limitations in prompts and memory issues,” wrote one Reddit user in the r/OpenAI subreddit. “And in less than 24 hours, they’ve made it useless.”

How could different AI language models break a workflow? The answer lies in how each one is trained in a different way and includes its own unique output style: The workflow breaks because users have developed sets of prompts that produce useful results optimized for each AI model.

For example, Willison wrote how different user groups had developed distinct workflows with specific AI models in ChatGPT over time, quoting one Reddit user who explained: “I know GPT-5 is designed to be stronger for complex reasoning, coding, and professional tasks, but not all of us need a pro coding model. Some of us rely on 4o for creative collaboration, emotional nuance, roleplay, and other long-form, high-context interactions.”

https://arstechnica.com/information-technology/2025/08/the-gpt-5-rollout-has-been-a-big-mess/

Encryption made for police and military radios may be easily cracked

Two years ago, researchers in the Netherlands discovered an intentional backdoor in an encryption algorithm baked into radios used by critical infrastructure–as well as police, intelligence agencies, and military forces around the world–that made any communication secured with the algorithm vulnerable to eavesdropping.

When the researchers publicly disclosed the issue in 2023, the European Telecommunications Standards Institute (ETSI), which developed the algorithm, advised anyone using it for sensitive communication to deploy an end-to-end encryption solution on top of the flawed algorithm to bolster the security of their communications.

But now the same researchers have found that at least one implementation of the end-to-end encryption solution endorsed by ETSI has a similar issue that makes it equally vulnerable to eavesdropping. The encryption algorithm used for the device they examined starts with a 128-bit key, but this gets compressed to 56 bits before it encrypts traffic, making it easier to crack. It’s not clear who is using this implementation of the end-to-end encryption algorithm, nor if anyone using devices with the end-to-end encryption is aware of the security vulnerability in them.

The end-to-end encryption the researchers examined, which is expensive to deploy, is most commonly used in radios for law enforcement agencies, special forces, and covert military and intelligence teams that are involved in national security work and therefore need an extra layer of security. But ETSI’s endorsement of the algorithm two years ago to mitigate flaws found in its lower-level encryption algorithm suggests it may be used more widely now than at the time.

In 2023, Carlo Meijer, Wouter Bokslag, and Jos Wetzels of security firm Midnight Blue, based in the Netherlands, discovered vulnerabilities in encryption algorithms that are part of a European radio standard created by ETSI called TETRA (Terrestrial Trunked Radio), which has been baked into radio systems made by Motorola, Damm, Sepura, and others since the ’90s. The flaws remained unknown publicly until their disclosure, because ETSI refused for decades to let anyone examine the proprietary algorithms. The end-to-end encryption the researchers examined recently is designed to run on top of TETRA encryption algorithms.

https://arstechnica.com/security/2025/08/encryption-made-for-police-and-military-radios-may-be-easily-cracked/