Expert panel will determine AGI arrival in new Microsoft-OpenAI agreement

In May, OpenAI abandoned its plan to fully convert to a for-profit company after pressure from regulators and critics. The company instead shifted to a modified approach where the nonprofit board would retain control while converting its for-profit subsidiary into a public benefit corporation (PBC).

What changed in the agreement

The revised deal extends Microsoft’s intellectual property rights through 2032 and now includes models developed after AGI is declared. Microsoft holds IP rights to OpenAI’s model weights, architecture, inference code, and fine-tuning code until the expert panel confirms AGI or through 2030, whichever comes first. The new agreement also codifies that OpenAI can formally release open-weight models (like gpt-oss) that meet requisite capability criteria.

However, Microsoft’s rights to OpenAI’s research methods, defined as confidential techniques used in model development, will expire at those same thresholds. The agreement explicitly excludes Microsoft from having rights to OpenAI’s consumer hardware products.

The deal allows OpenAI to develop some products jointly with third parties. API products built with other companies must run exclusively on Azure, but non-API products can operate on any cloud provider. This gives OpenAI more flexibility to partner with other technology companies while keeping Microsoft as its primary infrastructure provider.

Under the agreement, Microsoft can now pursue AGI development alone or with partners other than OpenAI. If Microsoft uses OpenAI’s intellectual property to build AGI before the expert panel makes a declaration, those models must exceed compute thresholds that are larger than what current leading AI models require for training.

The revenue-sharing arrangement between the companies will continue until the expert panel verifies that AGI has been reached, though payments will extend over a longer period. OpenAI has committed to purchasing $250 billion in Azure services, and Microsoft no longer holds a right of first refusal to serve as OpenAI’s compute provider. This lets OpenAI shop around for cloud infrastructure if it chooses, though the massive Azure commitment suggests it will remain the primary provider.

https://arstechnica.com/information-technology/2025/10/expert-panel-will-determine-agi-arrival-in-new-microsoft-openai-agreement/

A single point of failure triggered the Amazon outage affecting millions

In turn, the delay in network state propagations spilled over to a network load balancer that AWS services rely on for stability. As a result, AWS customers experienced connection errors from the US-East-1 region. AWS network functions affected included the creating and modifying Redshift clusters, Lambda invocations, and Fargate task launches such as Managed Workflows for Apache Airflow, Outposts lifecycle operations, and the AWS Support Center.

For the time being, Amazon has disabled the DynamoDB DNS Planner and the DNS Enactor automation worldwide while it works to fix the race condition and add protections to prevent the application of incorrect DNS plans. Engineers are also making changes to EC2 and its network load balancer.

A cautionary tale

Ookla outlined a contributing factor not mentioned by Amazon: a concentration of customers who route their connectivity through the US-East-1 endpoint and an inability to route around the region. Ookla explained:

The affected US‑EAST‑1 is AWS’s oldest and most heavily used hub. Regional concentration means even global apps often anchor identity, state or metadata flows there. When a regional dependency fails as was the case in this event, impacts propagate worldwide because many “global” stacks route through Virginia at some point.

Modern apps chain together managed services like storage, queues, and serverless functions. If DNS cannot reliably resolve a critical endpoint (for example, the DynamoDB API involved here), errors cascade through upstream APIs and cause visible failures in apps users do not associate with AWS. That is precisely what Downdetector recorded across Snapchat, Roblox, Signal, Ring, HMRC, and others.

The event serves as a cautionary tale for all cloud services: More important than preventing race conditions and similar bugs is eliminating single points of failure in network design.

“The way forward,” Ookla said, “is not zero failure but contained failure, achieved through multi-region designs, dependency diversity, and disciplined incident readiness, with regulatory oversight that moves toward treating the cloud as systemic components of national and economic resilience.”

https://arstechnica.com/gadgets/2025/10/a-single-point-of-failure-triggered-the-amazon-outage-affecting-millions/

AI models can acquire backdoors from surprisingly few malicious documents

Fine-tuning experiments with 100,000 clean samples versus 1,000 clean samples showed similar attack success rates when the number of malicious examples stayed constant. For GPT-3.5-turbo, between 50 and 90 malicious samples achieved over 80 percent attack success across dataset sizes spanning two orders of magnitude.

Limitations

While it may seem alarming at first that LLMs can be compromised in this way, the findings apply only to the specific scenarios tested by the researchers and come with important caveats.

“It remains unclear how far this trend will hold as we keep scaling up models,” Anthropic wrote in its blog post. “It is also unclear if the same dynamics we observed here will hold for more complex behaviors, such as backdooring code or bypassing safety guardrails.”

The study tested only models up to 13 billion parameters, while the most capable commercial models contain hundreds of billions of parameters. The research also focused exclusively on simple backdoor behaviors rather than the sophisticated attacks that would pose the greatest security risks in real-world deployments.

Also, the backdoors can be largely fixed by the safety training companies already do. After installing a backdoor with 250 bad examples, the researchers found that training the model with just 50–100 “good” examples (showing it how to ignore the trigger) made the backdoor much weaker. With 2,000 good examples, the backdoor basically disappeared. Since real AI companies use extensive safety training with millions of examples, these simple backdoors might not survive in actual products like ChatGPT or Claude.

The researchers also note that while creating 250 malicious documents is easy, the harder problem for attackers is actually getting those documents into training datasets. Major AI companies curate their training data and filter content, making it difficult to guarantee that specific malicious documents will be included. An attacker who could guarantee that one malicious webpage gets included in training data could always make that page larger to include more examples, but accessing curated datasets in the first place remains the primary barrier.

Despite these limitations, the researchers argue that their findings should change security practices. The work shows that defenders need strategies that work even when small fixed numbers of malicious examples exist rather than assuming they only need to worry about percentage-based contamination.

“Our results suggest that injecting backdoors through data poisoning may be easier for large models than previously believed as the number of poisons required does not scale up with model size,” the researchers wrote, “highlighting the need for more research on defences to mitigate this risk in future models.”

https://arstechnica.com/ai/2025/10/ai-models-can-acquire-backdoors-from-surprisingly-few-malicious-documents/

Discord says hackers stole government IDs of 70,000 users

Discord says that hackers made off with images of 70,000 users’ government IDs that they were required to provide in order to use the site.

Like an increasing number of sites, Discord requires certain users to provide a photo or scan of their driver’s license or other government ID that shows they meet the minimum age requirements in their country. In some cases, Discord allows users to prove their age by providing a selfie that shows their faces (it’s not clear how a face proves someone’s age, but there you go). The social media site imposes these requirements on users who are reported by other users to be under the minimum age for the country they’re connecting from.

“A substantial risk for identity theft”

On Wednesday, Discord said that ID images of roughly 70,000 users “may have had government-ID photos exposed” in a recent breach of a third-party service Discord entrusted to manage the data. The affected users had communicated with Discord’s Customer Support or Trust & Safety teams and subsequently submitted the IDs in reviews of age-related appeals.

“Recently, we discovered an incident where an unauthorized party compromised one of Discord’s third-party customer service providers,” the company said Wednesday. “The unauthorized party then gained access to information from a limited number of users who had contacted Discord through our Customer Support and/or Trust & Safety teams.”

Discord cut off the unnamed vendor’s access to its ticketing system after learning of the breach. The company is now in the process of emailing affected users. Notifications will come from noreply @ discord.com. Discord said it won’t contact any affected users by phone.

The data breach is a sign of things to come as more and more sites require users to turn over their official IDs as a condition of using their services. Besides, Discord, Roblox, Steam, and Twitch have also required at least some of their users to submit photo IDs. Laws passed in 19 US states, France, the UK, and elsewhere now require porn sites to verify visitors are of legal age to view adult content. Many sites have complied, but not all.

https://arstechnica.com/security/2025/10/discord-says-hackers-stole-government-ids-of-70000-users/

That annoying SMS phish you just got may have come from a box like this

The researchers added: “This campaign is notable in that it demonstrates how impactful smishing operations can be executed using simple, accessible infrastructure. Given the strategic utility of such equipment, it is highly likely that similar devices are already being exploited in ongoing or future smishing campaigns.”

Sekoia said it’s unclear how the devices are being compromised. One possibility is through CVE-2023-43261, a vulnerability in the routers that was fixed in 2023 with the release of version 35.3.0.7 of the device firmware. The vast majority of 572 identified as unsecured ran versions 32 or earlier.

CVE-2023-43261 stemmed from a misconfiguration that made files in a router’s storage publicly available through a web interface, according to a post published by Bipin Jitiya, the researcher who discovered the vulnerability. Among other things, some of the files contained cryptographically protected passwords for accounts, including the device administrator. While the password was encrypted, the file also included the secret encryption key used and an IV (initialization vector), allowing an attacker to obtain the plaintext password and then gain full administrative access.

The researchers said that this theory was contradicted by some of the facts uncovered in their investigation. For one, an authentication cookie found on one of the hacked routers used in the campaign “could not be decrypted using the key and IV described in the article,” the researchers wrote, without elaborating further. Further, some of the routers abused in the campaigns ran firmware versions that weren’t susceptible to CVE-2023-43261.

Milesight didn’t respond to a message seeking comment.

The phishing websites ran JavaScript that prevented pages from delivering malicious content unless it was accessed from a mobile device. One site also ran JavaScript to disable right-click actions and browser debugging tools. Both moves were likely made in an attempt to hinder analysis and reverse engineering. Sekoia also found that some of the sites logged visitor interactions through a Telegram bot known as GroozaBot. The bot is known to be operated by an actor named “Gro_oza,” who appears to speak both Arabic and French.

Given the prevalence and massive volume of smishing messages, people often wonder how scammers manage to send billions of messages per month without getting caught or shut down. Sekoia’s investigation suggests that in many cases, the resources come from small, often-overlooked boxes tucked away in janitorial closets in industrial settings.

https://arstechnica.com/security/2025/10/that-annoying-sms-phish-you-just-got-may-have-come-from-a-box-like-this/

OpenAI’s Sora 2 lets users insert themselves into AI videos with sound

On Tuesday, OpenAI announced Sora 2, its second-generation video-synthesis AI model that can now generate videos in various styles with synchronized dialogue and sound effects, which is a first for the company. OpenAI also launched a new iOS social app that allows users to insert themselves into AI-generated videos through what OpenAI calls “cameos.”

OpenAI showcased the new model in an AI-generated video that features a photorealistic version of OpenAI CEO Sam Altman talking to the camera in a slightly unnatural-sounding voice amid fantastical backdrops, like a competitive ride-on duck race and a glowing mushroom garden.

Regarding that voice, the new model can create what OpenAI calls “sophisticated background soundscapes, speech, and sound effects with a high degree of realism.” In May, Google’s Veo 3 became the first video-synthesis model from a major AI lab to generate synchronized audio as well as video. Just a few days ago, Alibaba released Wan 2.5, an open-weights video model that can generate audio as well. Now OpenAI has joined the audio party with Sora 2.

[embedded content]

OpenAI demonstrates Sora 2’s capabilities in a launch video.

The model also features notable visual consistency improvements over OpenAI’s previous video model, and it can also follow more complex instructions across multiple shots while maintaining coherency between them. The new model represents what OpenAI describes as its “GPT-3.5 moment for video,” comparing it to the ChatGPT breakthrough during the evolution of its text-generation models over time.

Sora 2 appears to demonstrate improved physical accuracy over the original Sora model from February 2024, with OpenAI claiming the model can now simulate complex physical movements like Olympic gymnastics routines and triple axels while maintaining realistic physics. Last year, shortly after the launch of Sora 1 Turbo, we saw several notable failures of similar video-generation tasks that OpenAI claims to have addressed with the new model.

“Prior video models are overoptimistic—they will morph objects and deform reality to successfully execute upon a text prompt,” OpenAI wrote in its announcement. “For example, if a basketball player misses a shot, the ball may spontaneously teleport to the hoop. In Sora 2, if a basketball player misses a shot, it will rebound off the backboard.”

https://arstechnica.com/ai/2025/10/openais-sora-2-lets-users-insert-themselves-into-ai-videos-with-sound/

Intel and AMD trusted enclaves, the backbone of network security, fall to physical attacks

The key benefit of Battering RAM is that it requires equipment that costs less than $50 to pull off. It also allows active decryption, meaning encrypted data can be both read and tampered with. In addition, it works against both SGX and SEV-SNP, as long as they work with DDR4 memory modules.

Wiretap

Wiretap, meanwhile, is limited to breaking only SGX working with DDR4, although the researchers say it would likely work against the AMD protections with a modest amount of additional work. Wiretap, however, allows only for passive decryption, which means protected data can be read, but data can’t be written to protected regions of memory. The cost of the interposer and the equipment for analyzing the captured data also costs considerably more than Battering RAM, at about $500 to $1,000.

The Wiretap interposer connected to a logic analyzer. Credit: Seto, et al.

Like Battering RAM, Wiretap exploits deterministic encryption, except the latter attack maps ciphertext to a list of known plaintext words that the ciphertext is derived from. Eventually, the attack can recover enough ciphertext to reconstruct the attestation key.

Genkin explained:

Let’s say you have an encrypted list of words that will be later used to form sentences. You know the list in advance, and you get an encrypted list in the same order (hence you know the mapping between each word and its corresponding encryption). Then, when you encounter an encrypted sentence, you just take the encryption of each word and match it against your list. By going word by word, you can decrypt the entire sentence. In fact, as long as most of the words are in your list, you can probably decrypt the entire conversation eventually. In our case, we build a dictionary between common values occurring within the ECDSA algorithm and their corresponding encryption, and then use this dictionary to recover these values as they appear, allowing us to extract the key.

The Wiretap researchers went on to show the types of attacks that are possible when an adversary successfully compromises SGX security. As Intel explains, a key benefit of SGX is remote attestation, a process that first verifies the authenticity and integrity of VMs or other software running inside the enclave and hasn’t been tampered with. Once the software passes inspection, the enclave sends the remote party a digitally signed certificate providing the identity of the tested software and a clean bill of health certifying the software is safe.

https://arstechnica.com/security/2025/09/intel-and-amd-trusted-enclaves-the-backbone-of-network-security-fall-to-physical-attacks/

DeepSeek tests “sparse attention” to slash AI processing costs

The attention bottleneck

In AI, “attention” is a term for a software technique that determines which words in a text are most relevant to understanding each other. Those relationships map out context, and context builds meaning in language. For example, in the sentence “The bank raised interest rates,” attention helps the model establish that “bank” relates to “interest rates” in a financial context, not a riverbank context. Through attention, conceptual relationships become quantified as numbers stored in a neural network. Attention also governs how AI language models choose what information “matters most” when generating each word of their response.

Calculating context with a machine is tricky, and it wasn’t practical at scale until chips like GPUs that can calculate these relationships in parallel reached a certain level of capability. Even so, the original Transformer architecture from 2017 checked the relationship of each word in a prompt with every other word in a kind of brute force way. So if you fed 1,000 words of a prompt into the AI model, it resulted in 1,000 x 1,000 comparisons, or 1 million relationships to compute. With 10,000 words, that becomes 100 million relationships. The cost grows quadratically, which creates a fundamental bottleneck for processing long conversations.

Although it’s likely that OpenAI uses some sparse attention techniques in GPT-5, long conversations still suffer performance penalties. Every time you submit a new response to ChatGPT, the AI model at its core processes context comparisons for the entire conversation history all over again.

Of course, the researchers behind the original Transformer model designed it for machine translation with relatively short sequences (maybe a few hundred tokens, which are chunks of data that represent words), where quadratic attention was manageable. It’s when people started scaling to thousands or tens of thousands of tokens that the quadratic cost became prohibitive.

https://arstechnica.com/ai/2025/09/deepseek-tests-sparse-attention-to-slash-ai-processing-costs/

ChatGPT’s new branching feature is a good reminder that AI chatbots aren’t people

On Thursday, OpenAI announced that ChatGPT users can now branch conversations into multiple parallel threads, serving as a useful reminder that AI chatbots aren’t people with fixed viewpoints but rather malleable tools you can rewind and redirect. The company released the feature for all logged-in web users following years of user requests for the capability.

The feature works by letting users hover over any message in a ChatGPT conversation, click “More actions,” and select “Branch in new chat.” This creates a new conversation thread that includes all the conversation history up to that specific point, while preserving the original conversation intact.

Think of it almost like creating a new copy of a “document” to edit while keeping the original version safe—except that “document” is an ongoing AI conversation with all its accumulated context. For example, a marketing team brainstorming ad copy can now create separate branches to test a formal tone, a humorous approach, or an entirely different strategy—all stemming from the same initial setup.

The feature addresses a longstanding limitation in the AI model where ChatGPT users who wanted to try different approaches had to either overwrite their existing conversation after a certain point by changing a previous prompt or start completely fresh. Branching allows exploring what-if scenarios easily—and unlike in a human conversation, you can try multiple different approaches.

A 2024 study conducted by researchers from Tsinghua University and Beijing Institute of Technology suggested that linear dialogue interfaces for LLMs poorly serve scenarios involving “multiple layers, and many subtasks—such as brainstorming, structured knowledge learning, and large project analysis.” The study found that linear interaction forces users to “repeatedly compare, modify, and copy previous content,” increasing cognitive load and reducing efficiency.

Some software developers have already responded positively to the update, with some comparing the feature to Git, the version control system that lets programmers create separate branches of code to test changes without affecting the main codebase. The comparison makes sense: Both allow you to experiment with different approaches while preserving your original work.

https://arstechnica.com/ai/2025/09/chatgpts-new-branching-feature-is-a-good-reminder-that-ai-chatbots-arent-people/

The number of mis-issued 1.1.1.1 certificates grows. Here’s the latest.

Cloudflare on Thursday acknowledged this failure, writing:

We failed three times. The first time because 1.1.1.1 is an IP certificate and our system failed to alert on these. The second time because even if we were to receive certificate issuance alerts, as any of our customers can, we did not implement sufficient filtering. With the sheer number of names and issuances we manage it has not been possible for us to keep up with manual reviews. Finally, because of this noisy monitoring, we did not enable alerting for all of our domains. We are addressing all three shortcomings.

Ultimately, the fault lies with Fina; however, given the fragility of the TLS PKI, it’s incumbent on all stakeholders to ensure system requirements are being met.

And what about Microsoft? Is it at fault, too?

There’s some controversy on this point, as I quickly learned on Wednesday from social media and Ars reader comments. Critics of Microsoft’s handling of this case say that, among other things, its responsibility for ensuring the security of its Root Certificate Program includes checking the transparency logs. Had it done so, critics said, the company would have found that Fina had never issued certificates for 1.1.1.1 and looked further into the matter.

Additionally, at least some of the certificates had non-compliant encoding and listed domain names with non-existent top-level domains. This certificate, for example, lists ssltest5 as its common name.

Instead, like the rest of the world, Microsoft learned of the certificates from an online discussion forum.

Some TLS experts I spoke to said it’s not within the scope of a root program to do continuous monitoring for these types of problems.

In any event, Microsoft said it’s in the process of making all certificates part of a disallow list.

Microsoft has also faced long-standing criticism that it’s too lenient in the requirements it imposes on CAs included in its Root Certificate Program. In fact, Microsoft and one other entity, the EU Trust Service, are the only ones that, by default, trust Fina. Google, Apple, and Mozilla don’t.

“The story here is less the 1.1.1.1 certificate and more why Microsoft trusts this carelessly operated CA,” Filippo Valsorda, a Web/PKI expert, said in an interview.

I asked Microsoft about all of this and have yet to receive a response.

https://arstechnica.com/information-technology/2025/09/the-number-of-mis-issued-1-1-1-1-certificates-grows-heres-the-latest/