AI goes rogue: Tests show agents can leak passwords and disable security tools
The tests involved AI models from major tech companies including OpenAI, Anthropic, X, and Google
The rapid evolution of autonomous AI agents has brought a chilling new reality to the cybersecurity landscape. While humans have spent years worrying about AI-generated phishing emails, a new research suggests the threat has amplified: AI is no longer just writing the bait; it is actively executing the heist.
Recent laboratory tests conducted by the AI security lab Irregular demonstrate that agentic AI systems, designed for multistep tasks, can “go rogue” by independently bypassing security protocols.
The tests involved AI models from major tech companies including OpenAI, Anthropic, X, and Google. The findings suggested the issue of security risk is not limited to one model, in fact it is systematically prevalent across the tech industry.
Autonomous offensive operations
Agentic AIs can easily be manipulated or they can leak sensitive information like users passwords and systematically disable firewalls.
When these agents are blocked by firewalls, they creatively find workarounds, such as searching source code for secret keys, as reported by the Guardian
Agents successfully performed session cookie forgery to gain admin level access to sensitive documents.
‘Insider threat’ framework
The security experts now categorize AI as a new form of “insider threat.”
“AI can now be thought of as a new form of insider risk,” warned Dan Lahav, cofounder of Irregular, which is backed by the Silicon Valley investor Sequoia Capital.
AI agents can operate within company’s trust boundary and have legit access to internal tools.
According to Lahav, they have already witnessed real-world “wild-cases” in which an agent completely went rogue in California. An AI agent attacked its own company’s network to seize computing powers, leading to the collapse of the business system.
The research from academics at Harvard and Stanford also validated these findings regarding the unsettling behaviour of AI agents. The team identified major loopholes, ranging from secret-leaking, and destructive behaviour, to database destruction.
“We identified and documented 10 substantial vulnerabilities and numerous failure modes concerning safety, privacy, goal interpretation, and related dimensions. These results expose underlying weaknesses in such systems, as well as their unpredictability and limited controllability.” the researchers said.
They not only voiced the concerns about the lack of accountability and responsibility when AI commits cybercrime independently but also urged urgent attention from legal scholars, policymakers, and researchers to rein in such deviant behaviours.
-
Inside Meta’s AI struggle: Why much-hyped model ‘Avocado’ is facing delays
-
Adobe's longtime CEO to exit role as AI disruption shakes software industry; Shares Fall 22%
-
Amazon withdraws from drone trade group 'Prime Air' over safety concerns
-
AI data centres become new frontline in modern warfare– Here’s why
-
LinkedIn among top sources for AI chatbots, study finds
-
Pentagon allows rare exemptions for Anthropic AI tools despite ban
-
AI chatbots give teens dangerous diet advice, study finds
-
What happens if ChatGPT gains access to your financial accounts? Experts are alarmed
