Researchers have found a new way for hackers to attack AI systems by hiding invisible prompts in images.
The method exploits how AI platforms compress and downscale images for efficiency.
Primarily the hacker can create an image that completely looks normal to a human but when it’s processed by an AI, compression creates graphical anomalies that the AI recognizes as hidden commands.
Security researchers have found that hackers can hide malicious commands known as indirect prompt injections inside images.
These images are further processed by a large language model (LLM) and in tests researchers demonstrated that such manipulated images could direct AI systems to perform unauthorized actions without the user’s knowledge.
One example showed that Google Calendar data was illegally and secretly taken and sent to an unauthorized email address without the user’s knowledge.
Meanwhile, various platforms were affected in the trials including Google Assistant on Android, Vertex AI Studio, Google Assistant on Android and Gemini’s web interface.
The security research on hidden prompts in AI images is specifically an expansion of earlier work from TU Braunschweig in Germany.
This German university’s research identified image scaling as a potential weakness in the machine learning system.
The security firm Trail of Bits developed a tool called Anamorpher that generates malicious images using estimation techniques such as bilinear and bicubic resampling.
The conventional defenses such as firewalls cannot determine this type of manipulation.
In this connection, the researchers recommend implementing a series of combined measures including reviewing downscaled images, restricting input dimensions, and requiring absolute confirmation for sensitive operations.
The Trail of Bits team stated, “The strongest defense is to implement secure design patterns and systematic safeguards that limit prompt injection, including multimodal attacks.”