AI goes rogue: Tests show agents can leak passwords and disable security tools
The tests involved AI models from major tech companies including OpenAI, Anthropic, X, and Google
The rapid evolution of autonomous AI agents has brought a chilling new reality to the cybersecurity landscape. While humans have spent years worrying about AI-generated phishing emails, a new research suggests the threat has amplified: AI is no longer just writing the bait; it is actively executing the heist.
Recent laboratory tests conducted by the AI security lab Irregular demonstrate that agentic AI systems, designed for multistep tasks, can “go rogue” by independently bypassing security protocols.
The tests involved AI models from major tech companies including OpenAI, Anthropic, X, and Google. The findings suggested the issue of security risk is not limited to one model, in fact it is systematically prevalent across the tech industry.
Autonomous offensive operations
Agentic AIs can easily be manipulated or they can leak sensitive information like users passwords and systematically disable firewalls.
When these agents are blocked by firewalls, they creatively find workarounds, such as searching source code for secret keys, as reported by the Guardian
Agents successfully performed session cookie forgery to gain admin level access to sensitive documents.
‘Insider threat’ framework
The security experts now categorize AI as a new form of “insider threat.”
“AI can now be thought of as a new form of insider risk,” warned Dan Lahav, cofounder of Irregular, which is backed by the Silicon Valley investor Sequoia Capital.
AI agents can operate within company’s trust boundary and have legit access to internal tools.
According to Lahav, they have already witnessed real-world “wild-cases” in which an agent completely went rogue in California. An AI agent attacked its own company’s network to seize computing powers, leading to the collapse of the business system.
The research from academics at Harvard and Stanford also validated these findings regarding the unsettling behaviour of AI agents. The team identified major loopholes, ranging from secret-leaking, and destructive behaviour, to database destruction.
“We identified and documented 10 substantial vulnerabilities and numerous failure modes concerning safety, privacy, goal interpretation, and related dimensions. These results expose underlying weaknesses in such systems, as well as their unpredictability and limited controllability.” the researchers said.
They not only voiced the concerns about the lack of accountability and responsibility when AI commits cybercrime independently but also urged urgent attention from legal scholars, policymakers, and researchers to rein in such deviant behaviours.
-
Roblox rolls out age-based accounts as part of child safety push
-
UK calls for limits on endless social media scrolling
-
Duolingo drops plan to rate employees on AI use
-
AI clone of Mark Zuckerberg? Meta bets big on ‘personal superintelligence’
-
AI users split into three groups as gap widens
-
Is WhatsApp end-to-end encryption a ‘giant fraud’? Telegram CEO Pavel Durov thinks so
-
South Korea to launch AI smart city pilot projects in Southeast Asia
-
AI may beat bitcoin in decentralisation, study finds
