3 min read

How prompt injections are used against AI platforms

How prompt injections are used against AI platforms

Prompt injection makes cyberattacks on AI platforms easier by turning ordinary input into an attack path. A poisoned document or image can slip instructions to the model’s context and make it ignore its original protections, expose private data, or create an unsafe output. A JAMA Network 2025 open study demonstrated that prompt injection attacks on commercial language learning models used for medical advice worked in 94.4% of simulated conversations. It shows that attackers do not require special access to change suggestions.

A Journal of Medical Internet Research review also says that threat actors employ prompt-based systems because they can confuse user content for instructions. Cloud deployment and massive training groups also make it more likely that protected health information (PHI) will be exposed, unauthorized access will happen, and models will be used for malicious purposes.

 

How prompt injections work

Prompt injections occur when hostile language or images are added to the material that an AI system reads, causing the model to treat hostile input as instructions and produce outputs that conflict with its intended task or safety rules. While the previously mentioned JAMA Network Open study on medical large learning models (LLMs) reports that these attacks do not require privileged access and can be carried out by external adversaries, indirect injections are especially dangerous because they can alter outputs without the end user realizing it.

The study directly notes, “In health care settings where LLMs retrieve content from electronic health records or patient documents, such attacks could systematically manipulate patient decision-making without privileged access. Current commercial LLMs implement safety guardrails, yet these defenses remain insufficient against sophisticated prompt injections.” AI platforms are attractive targets because they are becoming more and more linked to sensitive data, cloud infrastructure, and downstream processes. It means that a changed output can have a big impact on advising, retrieval, documentation, or decision support.

 

How email becomes a delivery path for prompt injection

Open Worldwide Application Security Project (OWASP) shares, “Prompt Injection vulnerabilities exist in how models process prompts, and how input may force the model to incorrectly pass prompt data to other parts of the model, potentially causing them to violate guidelines, generate harmful content, enable unauthorized access, or influence critical decisions.” Microsoft's advice also says that emails and documents are third-party content that can cause a model to misinterpret text controlled by an attacker as something it should follow.

A normal-looking email can have a hidden command in the body, an attachment, or even formatting that the user may not notice. Once an AI email tool summarizes, classifies, drafts, or takes action on that message, the model may be encouraged to ignore its original rules, leak information, or give a false output.

 

How generative AI is used to mitigate risk

One generative model or AI-assisted guard layer can verify incoming text for hostile instructions, make sure untrusted content does not break system rules, and then act as a critic to check whether the final solution strayed from the goal at hand. A 2026 PromptGuard study describes a design like that, with input gatekeeping, structured prompt formatting, semantic output validation, and adaptive response refinement.

A Springer study put forward that in generative AI software, “Guardrail systems can add structure, type, and quality constraints to outputs, use classifier models to inspect inputs and outputs, and automatically generate corrective prompts when a response fails policy checks.”

 

Why Paubox is the solution

Paubox helps stop prompt injections by making it harder for malicious content to get to an AI tool through email. OWASP suggests filtering input and output, giving people the least amount of access they need, separating external content, and getting human approval for high-risk actions as some of the best ways to protect against this.

Paubox’s own email product materials fit with that defensive model at the email layer. The company says its Email Suite protects against spam, ransomware, and phishing as well as malware and viruses, and generative AI threat detection to stop attacks before they reach the inbox.

See also: HIPAA Compliant Email: The Definitive Guide (2026 Update)

 

FAQs

What is direct prompt injection?

Direct prompt injection happens when a user deliberately types instructions meant to override the AI’s safety controls, priorities, or task boundaries.

 

What is indirect prompt injection?

Indirect prompt injection happens when the malicious instruction is placed inside third-party content that the AI later reads, such as a PDF, support ticket, email, or webpage.

 

What is the best way to reduce prompt injection risk?

Organizations should treat outside content as untrusted, limit AI permissions, add human review for sensitive actions, and secure the email and file channels that feed content into AI systems.

Subscribe to Paubox Weekly

Every Friday we bring you the most important news from Paubox. Our aim is to make you smarter, faster.