The AISI is supposed to protect the US from risky AI models by conducting safety testing to detect harms before models are deployed. Testing should “address risks to human rights, civil rights, and civil liberties, such as those related to privacy, discrimination and bias, freedom of expression, and the safety of individuals and groups,” President Joe Biden said in a national security memo last month, urging that safety testing was critical to support unrivaled AI innovation.
“For the United States to benefit maximally from AI, Americans must know when they can trust systems to perform safely and reliably,” Biden said.
But the AISI’s safety testing is voluntary, and while companies like OpenAI and Anthropic have agreed to the voluntary testing, not every company has. Hansen is worried that AISI is under-resourced and under-budgeted to achieve its broad goals of safeguarding America from untold AI harms.
“The AI Safety Institute predicted that they’ll need about $50 million in funding, and that was before the National Security memo, and it does not seem like they’re going to be getting that at all,” Hansen told Ars.
Biden had $50 million budgeted for AISI in 2025, but Donald Trump has threatened to dismantle Biden’s AI safety plan upon taking office.
The AISI was probably never going to be funded well enough to detect and deter all AI harms, but with its future unclear, even the limited safety testing the US had planned could be stalled at a time when the AI industry continues moving full speed ahead.
That could largely leave the public at the mercy of AI companies’ internal safety testing. As frontier models from big companies will likely remain under society’s microscope, OpenAI has promised to increase investments in safety testing and help establish industry-leading safety standards.
According to OpenAI, that effort includes making models safer over time, less prone to producing harmful outputs, even with jailbreaks. But OpenAI has a lot of work to do in that area, as Hansen told Ars that he has a “standard jailbreak” for OpenAI’s most popular release, ChatGPT, “that almost always works” to produce harmful outputs.
https://arstechnica.com/tech-policy/2024/11/openai-accused-of-trying-to-profit-off-ai-model-inspection-in-court/