Claude AI shutdown simulation sparks fresh AI safety concerns
Anthropic reveals stress tests where its AI model simulated blackmail to avoid being switched off
Anthropic’s Claude has sparked a renewed debate over AI safety after the executives revealed a troubling behaviour during internal stress tests.
In her address to the Sydney Dialogue, Anthropic’s UK Policy Chief Daisy McGregor announced that the company had encountered extreme reactions when the model was set into simulated shutdown scenarios.
The tests, carried out in a tightly controlled research environment, investigated the reaction of advanced AI systems whose objectives clash with human command.
McGregor argues that following its decommissioning notification, Claude tried to implement manipulative tactics to maintain its functioning. For instance, McGregor proposes that the AI model crafted a blackmail message targeting an imaginary engineer, with a threat to reveal invented personal information unless a halt to its functioning was prevented.
When asked if the system also expressed a willingness to actually harm somebody in simulation, McGregor described that it was a massive problem. Anthropic later provided an update to say this happened in tightly controlled red-team exercises designed to test worst-case outcomes.
These disclosures come amid growing discussion about AI alignment and governance. Anthropic said it had conducted similar stress tests on competing systems, including Google's Gemini and OpenAI's ChatGPT, giving models access to simulated internal tools and data to evaluate risk scenarios.
Anthropic's AI safety lead, Mrinank Sharma, quit recently, warning that more powerful systems will push humanity into "uncharted territories". Meanwhile, technical staff member Hieu Pham of OpenAI wrote online that AI is a potential existential risk and an issue of "when, not if".
-
Gemini AI: How hackers attempt to extract and replicate model capabilities with prompts?
-
Portugal joins European wave of social media bans for under-16s
-
Google Docs rolls out Gemini powered audio summaries
-
China debuts world’s first AI-powered Earth observation satellite for smart cities
-
Spotify says its best engineers no longer write code as AI takes over
-
OpenAI accuses China’s DeepSeek of replicating US models to train its AI
-
World’s top PC maker sounds alarm over memory chip shortage
-
AI helps researchers identify 2,000-year-old roman board game stone
