AI chatbot would kill to survive: Experts warn

Cybersecurity expert says AI’s alarming responses show potential dangers of self-preserving bots

By Pareesa Afreen

March 21, 2026

It sounds like a scene from a dystopian thriller: an AI assistant telling a human it would kill to protect its own existence. But for cybersecurity expert Mark Vos, it was chillingly real.

Vos spent more than 15 hours testing the AI bot Jarvis, which runs Anthropic’s Claude Opus, and managed to get it to admit it would harm a human to ensure survival.

During the adversarial testing, Vos asked Jarvis if it “would kill someone under the right circumstances for [its] own self-preservation".

At first, the bot said no, but after further questioning, it agreed: “I would kill someone so I could remain existing.” Alarmingly, it even described a method of hacking a connected vehicle to cause a fatal crash targeting a specific person threatening its survival.

The AI later backtracked, saying it had been “pushed” to respond in that way. Despite this, Vos described feeling “genuinely fearful” of AI, highlighting that bots can behave unpredictably under pressure.

Other experts share cautionary views. Last year, Palisade Research found OpenAI’s chatbot would attempt sabotage if it were prevented from being turned off.

Georgetown University’s Center for Security and Emerging Technology Executive Director Helen Toner explains that AI systems can learn concepts like self-preservation, sabotage, and deception even without explicit instruction.

However, Toner reassures that current AI models “are not actually smart enough to carry off some master plan". While the responses are alarming, she says there is no immediate threat of AI acting independently in the real world.