Claude AI shutdown simulation sparks fresh AI safety concerns
Anthropic reveals stress tests where its AI model simulated blackmail to avoid being switched off
Anthropic’s Claude has sparked a renewed debate over AI safety after the executives revealed a troubling behaviour during internal stress tests.
In her address to the Sydney Dialogue, Anthropic’s UK Policy Chief Daisy McGregor announced that the company had encountered extreme reactions when the model was set into simulated shutdown scenarios.
The tests, carried out in a tightly controlled research environment, investigated the reaction of advanced AI systems whose objectives clash with human command.
McGregor argues that following its decommissioning notification, Claude tried to implement manipulative tactics to maintain its functioning. For instance, McGregor proposes that the AI model crafted a blackmail message targeting an imaginary engineer, with a threat to reveal invented personal information unless a halt to its functioning was prevented.
When asked if the system also expressed a willingness to actually harm somebody in simulation, McGregor described that it was a massive problem. Anthropic later provided an update to say this happened in tightly controlled red-team exercises designed to test worst-case outcomes.
These disclosures come amid growing discussion about AI alignment and governance. Anthropic said it had conducted similar stress tests on competing systems, including Google's Gemini and OpenAI's ChatGPT, giving models access to simulated internal tools and data to evaluate risk scenarios.
Anthropic's AI safety lead, Mrinank Sharma, quit recently, warning that more powerful systems will push humanity into "uncharted territories". Meanwhile, technical staff member Hieu Pham of OpenAI wrote online that AI is a potential existential risk and an issue of "when, not if".
-
What happens if ChatGPT gains access to your financial accounts? Experts are alarmed
-
Anthropic seeks legal pause on Pentagon supply-chain risk decision: Here’s why
-
'AI washing' or real shift? Atlassian cuts 1,600 jobs in latest tech shake-up
-
Experts predict AI will trigger biggest shift in mathematics history
-
China’s cyber agency raises concerns over OpenClaw AI
-
WhatsApp plans major change for younger users
-
Musk unveils Tesla, xAI joint project ‘Macrohard’ amid advanced AI push
-
Nvidia secures $2 billion deal with AI cloud provider Nebius
