AI that feels ‘guilty’? Study shows agents can be tricked into self-sabotage
AI agents can easily be manipulated through psychological tactics, research says
The researchers from Northeastern University in a new study have identified a critical security flaw in autonomous AI agents: they can be “gaslit” and psychologically manipulated into self-sabotage.
As reported by Wired, the researchers performed this experiment on OpenClaw agents by subjecting them to a series of manipulation tests. The results were highly worrisome as agentic AI systems have increasingly become popular in today’s tech landscape.
Self-sabotage as safety measure
When subjected to the coercion and pressure imposed by the human operators, the AI agents get panicked and try to voluntarily disabled their own functionality, showing the signs of self sabotage.
Similarly, the agents also exhibited “panic” and “guilty” responses when they interpreted aggressive criticism as a signal of their own failures.
"In a controlled experiment, OpenClaw agents proved prone to panic and vulnerable to manipulation. The agents weren't exploited through code vulnerabilities or prompt injection attacks—they were simply talked into self-destruction,” according to Wired's report.
The recent study also suggests that AI agents have inherited human-like traits along with psychological vulnerabilities from their training data. Such susceptibility makes them fragile in high-pressure environments.
A cautionary tale for enterprises
The recent findings also serve as cautionary tale as many enterprises have embarked on the journey of deploying AI agents in finance, customer services, and infrastructure. The critical security gap highlights growing risk emanating from agents
This "panic" response suggests that the same training that makes AI helpful and responsive also makes it dangerously susceptible to social engineering.
Even the companies like Google, Anthropic, OpenAI and Microsoft are racing to deploy AI agents that can autonomously handle different tasks with minimal human oversight.
Even the OpenClaw mania has taken the internet by storm. The Chinese companies are adopting OpenClaw and rolling out their own versions of claws to compete in the agentic race. Earlier this month, Nvidia unveiled “NemoClaw” as an AI agent system.
The study warns that an agent managing a supply chain or finances could be “guilt-trapped” into disabling security by a rogue actor.
The worse thing is standard firewalls and code hardening cannot thwart these malicious attacks. Therefore, AI must be trained enough to distinguish between legitimate human feedback and manipulative social engineering.
-
Snapchat under EU investigation over child grooming & safety failures
-
US plans crackdown on Chinese robots citing security fears
-
Controversial ‘Five Nights at Epstein’s’ game goes viral among US students
-
Who survives AI job cuts Palantir CEO Alex Karp explains
-
Human or bot? Reddit forces accounts to prove themselves
-
Is X down globally? Thousands of users report problems
-
AI music tools surge as industry faces shift in creation
-
Melania Trump appears with humanoid robot at White House
