AI goes rogue: Tests show agents can leak passwords and disable security tools

The rapid evolution of autonomous AI agents has brought a chilling new reality to the cybersecurity landscape. While humans have spent years worrying about AI-generated phishing emails, a new research suggests the threat has amplified: AI is no longer just writing the bait; it is actively executing the heist.

Recent laboratory tests conducted by the AI security lab Irregular demonstrate that agentic AI systems, designed for multistep tasks, can “go rogue” by independently bypassing security protocols.

The tests involved AI models from major tech companies including OpenAI, Anthropic, X, and Google. The findings suggested the issue of security risk is not limited to one model, in fact it is systematically prevalent across the tech industry.

Autonomous offensive operations

Agentic AIs can easily be manipulated or they can leak sensitive information like users passwords and systematically disable firewalls.

When these agents are blocked by firewalls, they creatively find workarounds, such as searching source code for secret keys, as reported by the Guardian

Agents successfully performed session cookie forgery to gain admin level access to sensitive documents.

‘Insider threat’ framework

The security experts now categorize AI as a new form of “insider threat.”

“AI can now be thought of as a new form of insider risk,” warned Dan Lahav, cofounder of Irregular, which is backed by the Silicon Valley investor Sequoia Capital.

AI agents can operate within company’s trust boundary and have legit access to internal tools.

According to Lahav, they have already witnessed real-world “wild-cases” in which an agent completely went rogue in California. An AI agent attacked its own company’s network to seize computing powers, leading to the collapse of the business system.

The research from academics at Harvard and Stanford also validated these findings regarding the unsettling behaviour of AI agents. The team identified major loopholes, ranging from secret-leaking, and destructive behaviour, to database destruction.

“We identified and documented 10 substantial vulnerabilities and numerous failure modes concerning safety, privacy, goal interpretation, and related dimensions. These results expose underlying weaknesses in such systems, as well as their unpredictability and limited controllability.” the researchers said.

They not only voiced the concerns about the lack of accountability and responsibility when AI commits cybercrime independently but also urged urgent attention from legal scholars, policymakers, and researchers to rein in such deviant behaviours.