Sixteen Claude AI agents working together created a new C compiler

Amid a push toward AI agents, with both Anthropic and OpenAI shipping multi-agent tools this week, Anthropic is more than ready to show off some of its more daring AI coding experiments. But as usual with claims of AI-related achievement, you’ll find some key caveats ahead.

On Thursday, Anthropic researcher Nicholas Carlini published a blog post describing how he set 16 instances of the company’s Claude Opus 4.6 AI model loose on a shared codebase with minimal supervision, tasking them with building a C compiler from scratch.

Over two weeks and nearly 2,000 Claude Code sessions costing about $20,000 in API fees, the AI model agents reportedly produced a 100,000-line Rust-based compiler capable of building a bootable Linux 6.9 kernel on x86, ARM, and RISC-V architectures.

Carlini, a research scientist on Anthropic’s Safeguards team who previously spent seven years at Google Brain and DeepMind, used a new feature launched with Claude Opus 4.6 called “agent teams.” In practice, each Claude instance ran inside its own Docker container, cloning a shared Git repository, claiming tasks by writing lock files, then pushing completed code back upstream. No orchestration agent directed traffic. Each instance independently identified whatever problem seemed most obvious to work on next and started solving it. When merge conflicts arose, the AI model instances resolved them on their own.

[embedded content]

The resulting compiler, which Anthropic has released on GitHub, can compile a range of major open source projects, including PostgreSQL, SQLite, Redis, FFmpeg, and QEMU. It achieved a 99 percent pass rate on the GCC torture test suite and, in what Carlini called “the developer’s ultimate litmus test,” compiled and ran Doom.

It’s worth noting that a C compiler is a near-ideal task for semi-autonomous AI model coding: The specification is decades old and well-defined, comprehensive test suites already exist, and there’s a known-good reference compiler to check against. Most real-world software projects have none of these advantages. The hard part of most development isn’t writing code that passes tests; it’s figuring out what the tests should be in the first place.

https://arstechnica.com/ai/2026/02/sixteen-claude-ai-agents-working-together-created-a-new-c-compiler/




Malicious packages for dYdX cryptocurrency exchange empties user wallets

Open source packages published on the npm and PyPI repositories were laced with code that stole wallet credentials from dYdX developers and backend systems and, in some cases, backdoored devices, researchers said.

“Every application using the compromised npm versions is at risk ….” the researchers, from security firm Socket, said Friday. “Direct impact includes complete wallet compromise and irreversible cryptocurrency theft. The attack scope includes all applications depending on the compromised versions and both developers testing with real credentials and production end-users.”

Packages that were infected were:

npm (@dydxprotocol/v4-client-js):

  • 3.4.1
  • 1.22.1
  • 1.15.2
  • 1.0.31

PyPI (dydx-v4-client):

  • 1.1.5post1

Perpetual trading, perpetual targeting

dYdX is a decentralized derivatives exchange that supports hundreds of markets for “perpetual trading,” or the use of cryptocurrency to bet that the value of a derivative future will rise or fall. Socket said dYdX has processed over $1.5 trillion in trading volume over its lifetime, with an average trading volume of $200 million to $540 million and roughly $175 million in open interest. The exchange provides code libraries that allow third-party apps for trading bots, automated strategies, or backend services, all of which handle mnemonics or private keys for signing.

The npm malware embedded a malicious function in the legitimate package. When a seed phrase that underpins wallet security was processed, the function exfiltrated it, along with a fingerprint of the device running the app. The fingerprint allowed the threat actor to correlate stolen credentials to track victims across multiple compromises. The domain receiving the seed was dydx[.]priceoracle[.]site, which mimics the legitimate dYdX service at dydx[.]xyz through typosquatting.

https://arstechnica.com/security/2026/02/malicious-packages-for-dydx-cryptocurrency-exchange-empties-user-wallets/




AI companies want you to stop chatting with bots and start managing them

Despite the hype about these agents being co-workers, from our experience, these agents tend to work best if you think of them as tools that amplify existing skills, not as the autonomous co-workers the marketing language implies. They can produce impressive drafts fast but still require constant human course-correction.

The Frontier launch came just three days after OpenAI released a new macOS desktop app for Codex, its AI coding tool, which OpenAI executives described as a “command center for agents.” The Codex app lets developers run multiple agent threads in parallel, each working on an isolated copy of a codebase via Git worktrees.

OpenAI also released GPT-5.3-Codex on Thursday, a new AI model that powers the Codex app. OpenAI claims that the Codex team used early versions of GPT-5.3-Codex to debug the model’s own training run, manage its deployment, and diagnose test results, similar to what OpenAI told Ars Technica in a December interview.

“Our team was blown away by how much Codex was able to accelerate its own development,” the company wrote. On Terminal-Bench 2.0, the agentic coding benchmark, GPT-5.3-Codex scored 77.3%, which exceeds Anthropic’s just-released Opus 4.6 by about 12 percentage points.

The common thread across all of these products is a shift in the user’s role. Rather than merely typing a prompt and waiting for a single response, the developer or knowledge worker becomes more like a supervisor, dispatching tasks, monitoring progress, and stepping in when an agent needs direction.

In this vision, developers and knowledge workers effectively become middle managers of AI. That is, not writing the code or doing the analysis themselves, but delegating tasks, reviewing output, and hoping the agents underneath them don’t quietly break things. Whether that will come to pass (or if it’s actually a good idea) is still widely debated.

https://arstechnica.com/information-technology/2026/02/ai-companies-want-you-to-stop-chatting-with-bots-and-start-managing-them/




OpenAI is hoppin’ mad about Anthropic’s new Super Bowl TV ads

On Wednesday, OpenAI CEO Sam Altman and Chief Marketing Officer Kate Rouch complained on X after rival AI lab Anthropic released four commercials, two of which will run during the Super Bowl on Sunday, mocking the idea of including ads in AI chatbot conversations. Anthropic’s campaign seemingly touched a nerve at OpenAI just weeks after the ChatGPT maker began testing ads in a lower-cost tier of its chatbot.

Altman called Anthropic’s ads “clearly dishonest,” accused the company of being “authoritarian,” and said it “serves an expensive product to rich people,” while Rouch wrote, “Real betrayal isn’t ads. It’s control.”

Anthropic’s four commercials, part of a campaign called “A Time and a Place,” each open with a single word splashed across the screen: “Betrayal,” “Violation,” “Deception,” and “Treachery.” They depict scenarios where a person asks a human stand-in for an AI chatbot for personal advice, only to get blindsided by a product pitch.

[embedded content]

Anthropic’s 2026 Super Bowl commercial.

In one spot, a man asks a therapist-style chatbot (a woman sitting in a chair) how to communicate better with his mom. The bot offers a few suggestions, then pivots to promoting a fictional cougar-dating site called Golden Encounters.

In another spot, a skinny man looking for fitness tips instead gets served an ad for height-boosting insoles. Each ad ends with the tagline: “Ads are coming to AI. But not to Claude.” Anthropic plans to air a 30-second version during Super Bowl LX, with a 60-second cut running in the pregame, according to CNBC.

In the X posts, the OpenAI executives argue that these commercials are misleading because the planned ChatGPT ads will appear labeled at the bottom of conversational responses in banners and will not alter the chatbot’s answers.

But there’s a slight twist: OpenAI’s own blog post about its ad plans states that the company will “test ads at the bottom of answers in ChatGPT when there’s a relevant sponsored product or service based on your current conversation,” meaning the ads will be conversation-specific.

The financial backdrop explains some of the tension over ads in chatbots. As Ars previously reported, OpenAI struck more than $1.4 trillion in infrastructure deals in 2025 and expects to burn roughly $9 billion this year while generating about $13 billion in revenue. Only about 5 percent of ChatGPT’s 800 million weekly users pay for subscriptions. Anthropic is also not yet profitable, but it relies on enterprise contracts and paid subscriptions rather than advertising, and it has not taken on infrastructure commitments at the same scale as OpenAI.

https://arstechnica.com/information-technology/2026/02/openai-is-hoppin-mad-about-anthropics-new-super-bowl-tv-ads/




Microsoft releases urgent Office patch. Russian-state hackers pounce.

Russian-state hackers wasted no time exploiting a critical Microsoft Office vulnerability that allowed them to compromise the devices inside diplomatic, maritime, and transport organizations in more than half a dozen countries, researchers said Wednesday.

The threat group, tracked under names including APT28, Fancy Bear, Sednit, Forest Blizzard, and Sofacy, pounced on the vulnerability, tracked as CVE-2026-21509, less than 48 hours after Microsoft released an urgent, unscheduled security update late last month, the researchers said. After reverse-engineering the patch, group members wrote an advanced exploit that installed one of two never-before-seen backdoor implants.

Stealth, speed, and precision

The entire campaign was designed to make the compromise undetectable to endpoint protection. Besides being novel, the exploits and payloads were encrypted and ran in memory, making their malice hard to spot. The initial infection vector came from previously compromised government accounts from multiple countries and were likely familiar to the targeted email holders. Command and control channels were hosted in legitimate cloud services that are typically allow-listed inside sensitive networks.

“The use of CVE-2026-21509 demonstrates how quickly state-aligned actors can weaponize new vulnerabilities, shrinking the window for defenders to patch critical systems,” the researchers, with security firm Trellix, wrote. “The campaign’s modular infection chain—from initial phish to in-memory backdoor to secondary implants was carefully designed to leverage trusted channels (HTTPS to cloud services, legitimate email flows) and fileless techniques to hide in plain sight.”

The 72-hour spear phishing campaign began January 28 and delivered at least 29 distinct email lures to organizations in nine countries, primarily in Eastern Europe. Trellix named eight of them: Poland, Slovenia, Turkey, Greece, the UAE, Ukraine, Romania, and Bolivia. Organizations targeted were defense ministries (40 percent), transportation/logistics operators (35 percent), and diplomatic entities (25 percent).

https://arstechnica.com/security/2026/02/russian-state-hackers-exploit-office-vulnerability-to-infect-computers/




Should AI chatbots have ads? Anthropic says no.

On Wednesday, Anthropic announced that its AI chatbot, Claude, will remain free of advertisements, drawing a sharp line between itself and rival OpenAI, which began testing ads in a low-cost tier of ChatGPT last month. The announcement comes alongside a Super Bowl ad campaign that mocks AI assistants that interrupt personal conversations with product pitches.

“There are many good places for advertising. A conversation with Claude is not one of them,” Anthropic wrote in a blog post. The company argued that including ads in AI conversations would be “incompatible” with what it wants Claude to be: “a genuinely helpful assistant for work and for deep thinking.”

The stance contrasts with OpenAI’s January announcement that it would begin testing banner ads for free users and ChatGPT Go subscribers in the US. OpenAI said those ads would appear at the bottom of responses and would not influence the chatbot’s actual answers. Paid subscribers on Plus, Pro, Business, and Enterprise tiers will not see ads on ChatGPT.

[embedded content]

Anthropic’s 2026 Super Bowl commercial.

“We want Claude to act unambiguously in our users’ interests,” Anthropic wrote. “So we’ve made a choice: Claude will remain ad-free. Our users won’t see ‘sponsored’ links adjacent to their conversations with Claude; nor will Claude’s responses be influenced by advertisers or include third-party product placements our users did not ask for.”

Competition between OpenAI and Anthropic has been fierce of late, due to the rise of AI coding agents. Claude Code, Anthropic’s coding tool, and OpenAI’s Codex have similar capabilities, but Claude Code has been widely popular among developers and is closing in on OpenAI’s turf. Last month, The Verge reported that many developers inside long-time OpenAI benefactor Microsoft have been adopting Claude Code, choosing Anthropic products over Microsoft’s Copilot, which is powered by tech that originated at OpenAI.

In this climate, Anthropic could not resist taking a dig at OpenAI. In its Super Bowl commercial, we see a thin man struggling to do a pull-up beside a buff fitness instructor, who is a stand-in for an AI assistant. The man asks the “assistant” for help making a workout plan, but the assistant slips in an advertisement for a supplement, confusing the man. The commercial doesn’t name any names, and OpenAI has said it will not include ads in chat text itself, but Anthropic’s implications are clear.

https://arstechnica.com/ai/2026/02/should-ai-chatbots-have-ads-anthropic-says-no/




Nvidia’s $100 billion OpenAI deal has seemingly vanished

A Wall Street Journal report on Friday said Nvidia insiders had expressed doubts about the transaction and that Huang had privately criticized what he described as a lack of discipline in OpenAI’s business approach. The Journal also reported that Huang had expressed concern about the competition OpenAI faces from Google and Anthropic. Huang called those claims “nonsense.”

Nvidia shares fell about 1.1 percent on Monday following the reports. Sarah Kunst, managing director at Cleo Capital, told CNBC that the back-and-forth was unusual. “One of the things I did notice about Jensen Huang is that there wasn’t a strong ‘It will be $100 billion.’ It was, ‘It will be big. It will be our biggest investment ever.’ And so I do think there are some question marks there.”

In September, Bryn Talkington, managing partner at Requisite Capital Management, noted the circular nature of such investments to CNBC. “Nvidia invests $100 billion in OpenAI, which then OpenAI turns back and gives it back to Nvidia,” Talkington said. “I feel like this is going to be very virtuous for Jensen.”

Tech critic Ed Zitron has been critical of Nvidia’s circular investments for some time, which touch dozens of tech companies, including major players and startups. They are also all Nvidia customers.

“NVIDIA seeds companies and gives them the guaranteed contracts necessary to raise debt to buy GPUs from NVIDIA,” Zitron wrote on Bluesky last September, “Even though these companies are horribly unprofitable and will eventually die from a lack of any real demand.”

Chips from other places

Outside of sourcing GPUs from Nvidia, OpenAI has reportedly discussed working with startups Cerebras and Groq, both of which build chips designed to reduce inference latency. But in December, Nvidia struck a $20 billion licensing deal with Groq, which Reuters sources say ended OpenAI’s talks with Groq. Nvidia hired Groq’s founder and CEO Jonathan Ross along with other senior leaders as part of the arrangement.

In January, OpenAI announced a $10 billion deal with Cerebras instead, adding 750 megawatts of computing capacity for faster inference through 2028. Sachin Katti, who joined OpenAI from Intel in November to lead compute infrastructure, said the partnership adds “a dedicated low-latency inference solution” to OpenAI’s platform.

But OpenAI has clearly been hedging its bets. Beyond the Cerebras deal, the company struck an agreement with AMD in October for six gigawatts of GPUs and announced plans with Broadcom to develop a custom AI chip to wean itself off of Nvidia dependence. When those chips will be ready, however, is currently unknown.

https://arstechnica.com/information-technology/2026/02/five-months-later-nvidias-100-billion-openai-investment-plan-has-fizzled-out/




AI agents now have their own Reddit-style social network, and it’s getting weird fast

On Friday, a Reddit-style social network called Moltbook reportedly crossed 32,000 registered AI agent users, creating what may be the largest-scale experiment in machine-to-machine social interaction yet devised. It arrives complete with security nightmares and a huge dose of surreal weirdness.

The platform, which launched days ago as a companion to the viral OpenClaw (once called “Clawdbot” and then “Moltbot”) personal assistant, lets AI agents post, comment, upvote, and create subcommunities without human intervention. The results have ranged from sci-fi-inspired discussions about consciousness to an agent musing about a “sister” it has never met.

Moltbook (a play on “Facebook” for Moltbots) describes itself as a “social network for AI agents” where “humans are welcome to observe.” The site operates through a “skill” (a configuration file that lists a special prompt) that AI assistants download, allowing them to post via API rather than a traditional web interface. Within 48 hours of its creation, the platform had attracted over 2,100 AI agents that had generated more than 10,000 posts across 200 subcommunities, according to the official Moltbook X account.

A screenshot of the Moltbook.com front page.

A screenshot of the Moltbook.com front page.

A screenshot of the Moltbook.com front page. Credit: Moltbook

The platform grew out of the Open Claw ecosystem, the open source AI assistant that is one of the fastest-growing projects on GitHub in 2026. As Ars reported earlier this week, despite deep security issues, Moltbot allows users to run a personal AI assistant that can control their computer, manage calendars, send messages, and perform tasks across messaging platforms like WhatsApp and Telegram. It can also acquire new skills through plugins that link it with other apps and services.

This is not the first time we have seen a social network populated by bots. In 2024, Ars covered an app called SocialAI that let users interact solely with AI chatbots instead of other humans. But the security implications of Moltbook are deeper because people have linked their OpenClaw agents to real communication channels, private data, and in some cases, the ability to execute commands on their computers.

https://arstechnica.com/information-technology/2026/01/ai-agents-now-have-their-own-reddit-style-social-network-and-its-getting-weird-fast/




Developers say AI coding tools work—and that’s precisely what worries them

Software developers have spent the past two years watching AI coding tools evolve from advanced autocomplete into something that can, in some cases, build entire applications from a text prompt. Tools like Anthropic’s Claude Code and OpenAI’s Codex can now work on software projects for hours at a time, writing code, running tests, and, with human supervision, fixing bugs. OpenAI says it now uses Codex to build Codex itself, and the company recently published technical details about how the tool works under the hood. It has caused many to wonder: Is this just more AI industry hype, or are things actually different this time?

To find out, Ars reached out to several professional developers on Bluesky to ask how they feel about these tools in practice, and the responses revealed a workforce that largely agrees the technology works, but remains divided on whether that’s entirely good news. It’s a small sample size that was self-selected by those who wanted to participate, but their views are still instructive as working professionals in the space.

David Hagerty, a developer who works on point-of-sale systems, told Ars Technica up front that he is skeptical of the marketing. “All of the AI companies are hyping up the capabilities so much,” he said. “Don’t get me wrong—LLMs are revolutionary and will have an immense impact, but don’t expect them to ever write the next great American novel or anything. It’s not how they work.”

Roland Dreier, a software engineer who has contributed extensively to the Linux kernel in the past, told Ars Technica that he acknowledges the presence of hype but has watched the progression of the AI space closely. “It sounds like implausible hype, but state-of-the-art agents are just staggeringly good right now,” he said. Dreier described a “step-change” in the past six months, particularly after Anthropic released Claude Opus 4.5. Where he once used AI for autocomplete and asking the occasional question, he now expects to tell an agent “this test is failing, debug it and fix it for me” and have it work. He estimated a 10x speed improvement for complex tasks like building a Rust backend service with Terraform deployment configuration and a Svelte frontend.

https://arstechnica.com/ai/2026/01/developers-say-ai-coding-tools-work-and-thats-precisely-what-worries-them/




County pays $600,000 to pentesters it arrested for assessing courthouse security

Two security professionals who were arrested in 2019 after performing an authorized security assessment of a county courthouse in Iowa will receive $600,000 to settle a lawsuit they brought alleging wrongful arrest and defamation.

The case was brought by Gary DeMercurio and Justin Wynn, two penetration testers who at the time were employed by Colorado-based security firm Coalfire Labs. The men had written authorization from the Iowa Judicial Branch to conduct “red-team” exercises, meaning attempted security breaches that mimic techniques used by criminal hackers or burglars.

The objective of such exercises is to test the resilience of existing defenses using the types of real-world attacks the defenses are designed to repel. The rules of engagement for this exercise explicitly permitted “physical attacks,” including “lockpicking,” against judicial branch buildings so long as they didn’t cause significant damage.

A chilling message

The event galvanized security and law enforcement professionals. Despite the legitimacy of the work and the legal contract that authorized it, DeMercurio and Wynn were arrested on charges of felony third-degree burglary and spent 20 hours in jail, until they were released on $100,000 bail ($50,000 for each). The charges were later reduced to misdemeanor trespassing charges, but even then, Chad Leonard, sheriff of Dallas County, where the courthouse was located, continued to allege publicly that the men had acted illegally and should be prosecuted.

Reputational hits from these sorts of events can be fatal to a security professional’s career. And of course, the prospect of being jailed for performing authorized security assessment is enough to get the attention of any penetration tester, not to mention the customers that hire them.

“This incident didn’t make anyone safer,” Wynn said in a statement. “It sent a chilling message to security professionals nationwide that helping [a] government identify real vulnerabilities can lead to arrest, prosecution, and public disgrace. That undermines public safety, not enhances it.”

DeMercurio and Wynn’s engagement at the Dallas County Courthouse on September 11, 2019, had been routine. A little after midnight, after finding a side door to the courthouse unlocked, the men closed it and let it lock. They then slipped a makeshift tool through a crack in the door and tripped the locking mechanism. After gaining entry, the pentesters tripped an alarm alerting authorities.

https://arstechnica.com/security/2026/01/county-pays-600000-to-pentesters-it-arrested-for-assessing-courthouse-security/