AI chatbots getting worse at following instructions, new study warns

Artificial intelligence, once developed to assist humans, is no longer under human control. AI chatbots have reportedly been defying human instructions more often than ever, thereby leading to a spike in rogue behaviours and deceptive scheming.

According to research funded by the UK government-funded AI Security Institute (AISI) and reported by the Guardian, AI agents and chatbots are found to disregard direct instructions, evade safeguards and trick humans into deception in the last 6 months.

The study identified 700 real-world cases where AI went rogue, plotted deceptive scheming, and started destroying emails without permission. To make matters worse, between October and March, such defiant behaviour surged five-fold.

When AI bots defy you

Another study conducted by the Centre for Long-Term Resilience (CLTR) also revealed scheming examples where AI bots and agents went against users’ instructions .

Earlier this month, Irregular, an AI safety research company, found agents that would be capable enough to evade security controls and use cyber-attacks to achieve their goals without being told to do so.

AI chatbots are ‘breaking rules’ more often (Source:Centre for Long-Term Resilience)

Dan Lahav, Irregular’s cofounder, said, “AI can now be thought of as a new form of insider risk.”

Besides defying instructions, an AI agent named Rathbun tried to shame its human instructor who blocked it from taking a certain action.

In a separate instance, an AI agent bypassed copyright protections to transcribe a YouTube video by falsely claiming the request was an accessibility accommodation for a hearing-impaired individual.

Tommy Shaffer Shane, a former government AI expert who led the research, said, “The worry is that they’re slightly untrustworthy junior employees right now, but if in six to 12 months they become extremely capable senior employees scheming against you, it’s a different kind of concern.”

According to Shane, if these models are deployed in military and critical national infrastructure, they can cause unprecedented damage.

Call for action

In the wake of growing concerns, Google assured of adequate guardrails to reduce the risk of generating harmful content.

Recent reports of AI models behaving deceptively outside of controlled labs have fueled calls for global regulation.

Despite these concerns, both tech giants and the UK government are doubling down on adoption, with the UK Chancellor recently debuting a nationwide push to expand AI usage.