Claude AI shutdown simulation sparks fresh AI safety concerns
Anthropic reveals stress tests where its AI model simulated blackmail to avoid being switched off
Anthropic’s Claude has sparked a renewed debate over AI safety after the executives revealed a troubling behaviour during internal stress tests.
In her address to the Sydney Dialogue, Anthropic’s UK Policy Chief Daisy McGregor announced that the company had encountered extreme reactions when the model was set into simulated shutdown scenarios.
The tests, carried out in a tightly controlled research environment, investigated the reaction of advanced AI systems whose objectives clash with human command.
McGregor argues that following its decommissioning notification, Claude tried to implement manipulative tactics to maintain its functioning. For instance, McGregor proposes that the AI model crafted a blackmail message targeting an imaginary engineer, with a threat to reveal invented personal information unless a halt to its functioning was prevented.
When asked if the system also expressed a willingness to actually harm somebody in simulation, McGregor described that it was a massive problem. Anthropic later provided an update to say this happened in tightly controlled red-team exercises designed to test worst-case outcomes.
These disclosures come amid growing discussion about AI alignment and governance. Anthropic said it had conducted similar stress tests on competing systems, including Google's Gemini and OpenAI's ChatGPT, giving models access to simulated internal tools and data to evaluate risk scenarios.
Anthropic's AI safety lead, Mrinank Sharma, quit recently, warning that more powerful systems will push humanity into "uncharted territories". Meanwhile, technical staff member Hieu Pham of OpenAI wrote online that AI is a potential existential risk and an issue of "when, not if".
-
Should you use ChatGPT for medical advice? New study urges caution against total reliance on AI
-
Anthropic’s AI system could predict which white-collar jobs are at risk
-
Indonesia joins Poland in restricting social media for children under 16
-
Netflix brings AI filmmaking to Hollywood
-
Majority UK people opt for AI to seek mental health support
-
The US military is integrating AI into modern warfare: Everything you need to know
-
Apple restricts access to ByteDance apps in US
-
Anthropic comes under Pentagon's radar over AI safety
