Anthropic has a plan to keep its AI from building nuclear weapons. will it work
In the end In August, artificial intelligence company Anthropic announced that its chatbot Claude was not helping anyone build nuclear weapons. According to Anthropic, it has worked with the Department of Energy (DOE) and the National Nuclear Security Administration (NNSA) to ensure the cloud does not leak nuclear secrets.
Making nuclear weapons is both an exact science and a solved problem. Much information about America’s most advanced nuclear weapons is top secret, but basic nuclear science is 80 years old. North Korea proved that a dedicated country with a vested interest in acquiring a bomb could do it and didn’t need the help of a chatbot.
How exactly did the US government work with an artificial intelligence company to make sure a chatbot didn’t leak sensitive nuclear secrets? And also: was there any danger of a chatbot helping someone build a nuclear weapon in the first place?
The answer to the first question is that he used Amazon. The answer to the second question is complicated.
Amazon Web Services (AWS) offers government customers cloud services where they can store sensitive and classified information. When DOE started working with Anthropic, it had several of these servers.
Marina Favaro, head of national security policy and partnerships at Anthropic, told WIRED: “We deployed a border version of the cloud in a top-secret environment so that the NNSA could systematically test whether AI models could create or exacerbate nuclear threats.” “Since then, NNSA has been teaming successive cloud models in their secure cloud environment and providing us with feedback.”
The NNSA’s Red Team process — meaning testing for weaknesses — helped Anthropic and US nuclear scientists develop a chatbot-assisted solution to the nuclear program. Together, they developed a kernel classifier that you can think of as a sophisticated filter for AI conversations, says Favaro. We built it using a list of nuclear risk indicators, specific topics, and technical details developed by the NNSA that help us recognize when a conversation might veer into harmful territory. “The list itself is controlled but not classified, which is very important because it means our technical staff and other companies can run it.”
Favaro says it took months of optimization and testing to get the classifier working. “These conversations unmark legitimate debates about nuclear power or medical isotopes,” he says.
