4 min read
Attackers hide invisible text in phishing emails to trick AI filters
Farah Amod
May 21, 2026
Phishing campaigns are now hiding benign content inside emails at zero font size to confuse AI security tools into classifying malicious messages as safe.
What happened
Security researchers have identified active phishing campaigns using a technique called indirect prompt injection to defeat AI-powered email security filters. According to Hackread, attackers are embedding invisible text inside phishing emails, either by setting font size to zero or by matching text color to the email background, making the content unreadable to recipients but fully visible to the AI models scanning the message. The hidden text is drawn from high-reputation sources such as real brand newsletters or fiction websites, flooding the email with benign signals designed to dilute the malicious content and cause the AI filter to misclassify the message as safe. Researchers detected two specific campaigns: one using cloned Adidas newsletter content to disguise a cloud storage scam, and one using a fake health insurance email padded with embedded fiction to impersonate a legitimate content platform. Although the technique currently accounts for less than one percent of observed phishing traffic, researchers warned it signals a direction of travel as AI tools take on more autonomous roles in email security.
Going deeper
Standard email security tools assess risk using signals such as known-malicious links, suspicious sender domains, and flagged keywords. AI-powered filters go further by analyzing message content holistically, which creates a new attack surface. Indirect prompt injection exploits the gap between what a human reader sees and what an AI model processes. A recipient opening an email sees only the visible content. The AI scanner processes the full HTML source, including zero-font text and color-matched content that is invisible on screen. Researchers found attackers loading that hidden layer with content from legitimate sources such as archived brand emails and novel excerpts, giving the AI model enough positive signals to shift its classification toward benign. In the healthcare-themed campaign, the fake health insurance email used a professional design and embedded content from a content platform, aiming to make the filter mistake it for a newsletter from a known creator's subscription service. Researchers noted that the attacks do not try to force the AI model to do something outside its intended design, but instead influence it to make an incorrect decision within the normal parameters of its operation.
What was said
Researchers stated in their analysis that "with indirect prompt injection via hidden text, attackers aren't trying to force an AI into doing something it shouldn't. Instead, they're influencing AI into making an incorrect decision, but well within the bounds of its design. Nuanced prompt injection attacks will only increase over time as adversaries evolve, so it's important that AI security tools can understand the full context of the messages they analyze." Researchers concluded that AI security models must be improved to assess the full context of a message rather than assessing surface-level links or keywords in isolation.
In the know
Indirect prompt injection has been documented across multiple AI systems beyond email filters. According to BleepingComputer, a flaw in Google Gemini, disclosed in July 2025, allowed attackers to hide directives in email HTML at a zero font size, causing Gemini to follow the attackers' instructions when summarizing messages for users. The same zero-font technique now appearing in phishing campaigns designed to fool email filters was already being used to hijack the behavior of AI email assistants. According to The Register, OpenAI's chief information security officer acknowledged that "prompt injection remains a frontier, unsolved security problem," a position that applies equally to AI-powered email security tools.
The big picture
The healthcare-themed campaign in this research is not incidental. Fake health insurance emails are among the most effective phishing lures in healthcare environments because staff who manage billing, insurance authorizations, and claims already process high volumes of legitimate health plan communications. An AI filter that classifies a fake health insurance email as a benign newsletter gives that message a clear path to the inbox. As healthcare organizations adopt AI-assisted tools for email triage and security monitoring, the assumption that AI analysis adds a reliable additional layer of protection needs to be tested against the reality that attackers are already actively probing for ways to manipulate those systems. Microsoft's Q1 2026 email threat data found 8.3 billion phishing emails detected in a single quarter, and the same report documented rapid attacker adaptation to new evasion techniques each month as filters improved.
FAQs
What is indirect prompt injection in the context of email security?
Indirect prompt injection places hidden instructions or content inside an email that the recipient cannot see but that an AI scanning tool processes as part of its analysis. Unlike direct manipulation, it does not try to override the AI's rules but instead manipulates the input data the AI evaluates to change its output.
Why does zero-font text fool AI models but not humans?
A human reader sees the rendered email as it displays on screen, where zero-font text is invisible. An AI model processing the email's HTML source code reads all text regardless of display size, treating the hidden content as part of the message. The gap between visual rendering and raw data processing is the vulnerability being exploited.
Why is healthcare specifically targeted with health insurance lures?
Health insurance communications are routine in healthcare environments and carry a degree of institutional authority. Staff in billing, compliance, and administration receive legitimate health plan emails regularly, making a well-designed fake harder to question on sight. The AI evasion technique multiplies that risk by helping the fake message pass automated scanning as well.
Does this technique work against all AI email security tools?
Researchers found it effective against AI-based autonomous security analysis tools that assess message content holistically. Tools that rely primarily on link reputation or sender authentication rather than content analysis are affected differently. The research suggests that AI models analyzing email content need to account for perceptual asymmetry, the gap between what is visible to humans and what the model processes.
What can organizations do to defend against this technique?
Organizations should not treat any single security layer as definitive. Behavioral analysis of message structure, anomaly detection on HTML complexity relative to visible content, and human review workflows for messages flagged as borderline all reduce reliance on any one AI classification decision. Staff training on health insurance and financial lures remains relevant regardless of what the filter classifies.
Subscribe to Paubox Weekly
Every Friday we bring you the most important news from Paubox. Our aim is to make you smarter, faster.
