6 min read
Prompt injection as the next evolution of phishing in healthcare
Lusanda Molefe January 6, 2026
Prompt injection attacks target the AI systems embedded in email platforms, clinical workflows, and enterprise tools. Rather than deceiving the human recipient, these attacks embed hidden instructions that manipulate AI assistants, causing them to leak sensitive data, execute unauthorized actions, or compromise system integrity without the user ever interacting with the malicious content.
The National Institute of Standards and Technology (NIST) defines prompt injection as an adversarial attack against large language models (LLMs) that occurs when trusted and untrusted data commingle in a prompt sent to an AI system.
Security researcher Johann Rehberger, whose work documenting real-world prompt injection vulnerabilities has been presented at Black Hat and other major security conferences, offers a clearer framing, "At its core, prompt injection refers to the commingling of trusted and untrusted data, like system instructions mixed with user instructions or data."
NIST identifies two primary attack types. Direct prompt injection occurs when a user enters text that causes an LLM to perform unintended or unauthorized actions. Indirect prompt injection occurs when an attacker poisons or manipulates the external data that an LLM ingests, documents, emails, web pages, or other content the AI processes. The indirect form poses the greater threat to healthcare email security. An attacker doesn't need access to the AI system itself. They only need to deliver content that the AI will process, and email is the most common delivery mechanism.
Unlike SQL injection, where developers can follow precise guidance to prevent the vulnerability, prompt injection currently has no deterministic fix. As Rehberger notes, "Prompt injection represents a unique challenge that currently does not have a deterministic fix. For SQL injection precise guidance can be given to developers to prevent it, however no such guidance exists for prompt injection."
NIST's own assessment describes indirect prompt injection as "widely believed to be generative AI's greatest security flaw, without simple ways to find and fix these attacks."
Prompt injection in email
The attack vector that should concern healthcare organizations most directly involves AI-powered email tools, systems that summarize messages, draft responses, search inboxes, or automate workflows based on email content.
In mid-2025, security researchers disclosed EchoLeak (CVE-2025-32711), the first widely documented zero-click prompt injection exploit affecting an enterprise AI assistant. The vulnerability allowed attackers to exfiltrate sensitive data from Microsoft 365 Copilot by embedding malicious prompt injection content in an email.
The attack required no user interaction. When Copilot processed the email, even to generate a summary or search for related content, the hidden instructions executed automatically, retrieving sensitive information from the user's inbox and staging it for exfiltration.
A separate vulnerability disclosed in February 2024 and fixed by Microsoft in July 2024 demonstrated an even more sophisticated attack chain. Researchers showed how a prompt injection delivered via phishing email could trigger automatic tool invocation, retrieve sensitive information from the victim's inbox, and render that data into a clickable hyperlink containing characters invisible to the human user. The staged data would exfiltrate when the user clicked what appeared to be a normal link.
These attacks exploit techniques like ASCII smuggling, the use of Unicode Tag characters that LLMs interpret but humans cannot see. An email that looks completely normal to a physician or administrator may contain hidden instructions that an AI assistant executes without question.
The implications extend beyond Microsoft's products. Rehberger's research has documented similar vulnerabilities in ChatGPT, Claude, Google Bard, Google Vertex AI, GitHub Copilot, Amazon Q, and numerous other AI systems. The pattern is that when AI tools process untrusted content without adequate safeguards, prompt injection creates pathways for data exfiltration, integrity manipulation, and service disruption.
Learn more: Microsoft Copilot flaw exposed data via zero-click exploit
Why healthcare is at risk
Healthcare's vulnerability to prompt injection stems from the same factors that made it the most targeted sector for traditional phishing.
Email remains central to clinical operations. Prescriptions, referrals, billing communications, and patient correspondence flow through organizational inboxes. When AI tools process this content, summarizing messages, drafting responses, or searching for relevant information, they ingest data from sources that may include malicious actors.
Physicians and staff trust internal email systems and trust AI assistants to handle routine tasks accurately. A prompt injection that causes an AI to generate misleading summaries, insert false information, or trigger unauthorized actions exploits this trust in ways that may not be immediately apparent.
Research demonstrates that LLM-generated content is already highly effective at deceiving humans. A 2024 study by Heiding et al. found that LLM-generated spear-phishing emails achieved a 54% click-through rate, matching the performance of expert human social engineers and far exceeding the 12% rate of generic phishing attempts.
Prompt injection extends this threat to the AI systems themselves. An attacker who successfully injects instructions into an AI assistant gains capabilities that traditional phishing cannot provide:
- automated data retrieval across entire inboxes
- persistent access through memory manipulation
- actions that execute without any user awareness
Rehberger's research demonstrated "SpAIware" attacks where prompt injection compromises an AI assistant's long-term memory, creating persistent spyware that continuously exfiltrates all future conversations. In healthcare contexts, this could mean ongoing exposure of patient communications, clinical discussions, and administrative decisions, all invisible to the user and the security team. HIPAA violations, patient record exposure, and billing fraud become possible through attack vectors that most healthcare organizations haven't yet incorporated into their threat models.
Government and standards guidance
Federal agencies have recognized that AI systems require the same security rigor as traditional software, and in some cases, additional protections.
The CISA, NSA, and FBI Secure by Design Guide explicitly states that its guidance "applies to manufacturers of artificial intelligence (AI) software systems and models as well." The document emphasizes that "fundamental security practices still apply to AI systems and models" and that "the three overarching secure by design principles apply to all AI systems."
The guide's core recommendations translate directly to prompt injection defense, input validation and sanitization, defense-in-depth architectures, and the principle that "the software industry needs more secure products, not more security products."
NIST's AI Risk Management Framework (AI RMF 1.0) provides structured guidance for governing, mapping, measuring, and managing AI-related risks. The framework emphasizes that AI systems introduce new attack surfaces that traditional security controls may not address. Microsoft's Secure AI Framework (SAIF) offers vendor-specific guidance that healthcare organizations can use when evaluating AI tools. The framework acknowledges that AI systems require dedicated security controls including input sanitization, output filtering, and least-privilege access design.
The FBI's recent Joint Cybersecurity Advisory on Salt Typhoon, while focused on telecommunications infrastructure, proves the broader principle that sophisticated cybercriminals are actively exploiting gaps in critical infrastructure security, and collaboration between government and private sector is required for defense.
These frameworks share common themes:
- AI systems must be treated as potential attack surfaces
- input from untrusted sources must be sanitized or isolated
- organizations cannot rely on AI vendors alone to ensure security
Detection challenges
Prompt injection attacks are difficult to detect for several reasons.
- The attacks require no user action. Traditional phishing detection relies partly on user reporting, employees who recognize suspicious messages, and alerts security teams. Zero-click prompt injection executes before any human evaluates the content, eliminating this detection layer.
- Signature-based security tools fail against novel prompt injection payloads. Unlike malware with identifiable code signatures, prompt injection attacks use natural language that varies infinitely. An instruction hidden in white-on-white text, embedded in CSS styling, or encoded in invisible Unicode characters doesn't match patterns that conventional email security scans for.
- The attacks execute inside AI systems, often in cloud environments invisible to endpoint detection tools. When an AI assistant processes a malicious prompt, the exploitation occurs within the AI vendor's infrastructure, outside the organization's monitoring capabilities.
- Prompt injection exploits represent a form of zero-day vulnerability by default. Each new AI capability, tool integration, or workflow automation potentially introduces attack surfaces that haven't been tested against cyberattack inputs. Security teams cannot patch vulnerabilities they don't know exist in systems they don't fully control.
The combination creates a threat that evades the security controls healthcare organizations have spent years implementing. Firewalls, endpoint detection, and email filtering address different attack classes. Prompt injection operates in the gap between these controls.
Go deeper: The role of cloud technology in HIPAA compliance
Defense strategy
Defending against prompt injection requires the same layered approach that effective cybersecurity has always demanded, with specific adaptations for AI-related risks.
Secure email platforms remain the foundation. AI-powered inbound security that detects phishing attempts, blocks malicious attachments, and identifies social engineering attacks prevents many prompt injection payloads from reaching AI systems in the first place. When attackers can't deliver the malicious content, the embedded instructions never execute.
Phishing-resistant authentication reduces the impact of successful attacks. Passkeys and FIDO2-compliant credentials eliminate the credential theft that prompt injection attacks often target. Even if an AI assistant is manipulated into revealing information, attackers cannot leverage that information to access accounts protected by phishing-resistant authentication.
Vendor evaluation and governance matter more than ever. Healthcare organizations should assess AI tools against frameworks like NIST AI RMF and Microsoft SAIF before deployment. Key questions include:
- How does the tool handle untrusted input?
- What sanitization occurs before content reaches the LLM?
- Can the tool invoke actions automatically, or does it require human confirmation for sensitive operations?
Input sanitization at the organizational level provides additional protection. Security teams can implement policies that strip potentially malicious formatting, hidden text, unusual Unicode characters, and embedded styling from emails before AI tools process them. Continuous monitoring and red teaming help organizations understand their exposure. Testing AI tools against known prompt injection techniques reveals vulnerabilities before attackers exploit them.
Paubox Email Suite supports this layered approach through AI-based inbound threat detection that identifies phishing attempts and social engineering attacks at the email gateway. By stopping malicious content before it reaches AI-powered tools, organizations reduce the attack surface that prompt injection exploits. Combined with HIPAA compliant encryption for outbound communications, the platform addresses both the emerging threat of AI-targeted attacks and the ongoing need to protect sensitive healthcare data in transit.
FAQs
Can prompt injection attacks steal patient data?
Yes. Documented attacks against Microsoft 365 Copilot demonstrated the ability to retrieve sensitive information from user inboxes and exfiltrate it to attacker-controlled servers. In healthcare settings, this could include protected health information, billing data, and clinical communications.
What is ASCII smuggling?
ASCII smuggling uses Unicode Tag characters, text that LLMs interpret but humans cannot see, to hide malicious instructions in seemingly normal content. An email that appears completely safe to a human reader may contain hidden commands that an AI assistant executes.
What is input sanitization?
Input sanitization is the process of cleaning or filtering data before it is processed by a system. In the context of AI, it means stripping out hidden instructions, unusual characters, or malicious formatting from emails or documents before they reach the LLM.
Subscribe to Paubox Weekly
Every Friday we'll bring you the most important news from Paubox. Our aim is to make you smarter, faster.
