2 min read

New AI attack uses hidden prompts in images to steal user data

New AI attack uses hidden prompts in images to steal user data

A new exploit can trick AI systems into executing hidden instructions by embedding them in images processed before reaching the model.

 

What happened

Security researchers from Trail of Bits have demonstrated a new attack that embeds hidden prompts into high-resolution images, which become visible only after automatic image downscaling. These prompts are interpreted by large language models (LLMs) as user instructions, enabling the model to carry out unintended actions, such as leaking private data, without the user’s knowledge.

The method builds on a theory first proposed in a 2020 USENIX paper and exploits how AI platforms preprocess images using resampling techniques for speed and performance.

 

Going deeper

When users upload an image to an AI system, the image is typically downscaled using algorithms like bicubic or bilinear interpolation. This process can introduce artifacts that make certain visual elements more prominent. Trail of Bits researchers designed source images in which specific areas, once downscaled, reveal hidden text that can be interpreted as instructions by the model.

For example, in one test using Google’s Gemini CLI, the researchers managed to exfiltrate Google Calendar data by injecting instructions hidden within the uploaded image. The attack was also effective against tools such as Vertex AI Studio, Gemini’s web and API interfaces, and even Google Assistant on Android.

To demonstrate the exploit, the team released Anamorpher, an open-source tool (currently in beta) that helps create images tailored for specific downscaling methods.

 

What was said

Trail of Bits recommends several defenses: applying strict dimension constraints for image uploads, giving users a preview of what the downscaled image will look like, and seeking explicit confirmation for sensitive tool actions, especially when embedded text is detected.

They also stress the necessity of designing systems with broader defenses against prompt injection. Their June 2025 paper outlines secure design patterns for multi-modal AI systems that interact with both images and text.

 

FAQs

What is a downscaling algorithm and why does it matter in this attack?

Downscaling algorithms resize images to reduce file size and processing cost. Some algorithms unintentionally reveal hidden patterns in specially crafted images, which can then be interpreted by AI systems as instructions.

 

Can this attack happen with any AI model?

Not automatically. The attack must be tailored to the specific downscaling method used by each model or platform. However, many popular AI tools use similar algorithms, which makes the risk widespread.

 

Are only image inputs affected by prompt injection?

No. While this attack focuses on images, prompt injection can also occur through text, audio, or code inputs. This shows the broader vulnerability of multi-modal AI systems.

 

What steps can developers take to defend against this type of attack?

Defensive measures include restricting image dimensions, previewing processed images before model submission, validating inputs for embedded text, and following secure design patterns that reduce trust in input data.

Subscribe to Paubox Weekly

Every Friday we'll bring you the most important news from Paubox. Our aim is to make you smarter, faster.