3 min read

How generative AI handles uncertainty in email security

Kirsten Peremore December 22, 2025

Email security

Phishing emails have gotten much smarter. They can pass normal security checks, come from trusted-looking sources, and read like a real message from a coworker or vendor. In healthcare, even one wrong click can expose sensitive patient information and lead to HIPAA violations.

A 2025 Science Reports study describes hybrid deep learning models that analyze email text using contextual language understanding rather than simple pattern matching. These models combine techniques like BERT, CNNs, GRUs, and attention mechanisms to break messages into pieces, study how the words relate to each other over time, and figure out the sender’s intent.

These AI systems can catch more than 96% of phishing emails, even in tricky datasets where attacks are rare, and they make fewer false alarms. That’s crucial for healthcare, where accidentally blocking a legitimate billing or clinical email can be just as disruptive as letting a phishing attack through.

How generative AI handles the uncertainty layer

A PLOS Digital Health study notes, “Epistemic uncertainty can be seen as a lack of information about the best model and can be reduced by adding more training data.” Adding more training data, better features, or more varied samples helps the model learn and makes its predictions more consistent over time. Generative AI can simulate additional email scenarios or create varied training data, helping the model learn patterns it hasn’t encountered before and reducing this type of uncertainty.

Aleatoric uncertainty is different. That uncertainty never fully goes away because it comes from the data itself. Inbound emails are noisy by nature, especially when phishing messages are deliberately written to look legitimate. No amount of extra training can remove the randomness introduced by well-crafted social engineering. Generative AI can help flag ambiguous messages and suggest likely interpretations.

Limits of pre-AI and traditional detection layers

Tools like SpamAssassin or regex-based filters often miss subtle contextual cues, such as phishing emails that are slightly altered to evade detection. In the study ‘A Systematic Review of Cyber Threat Intelligence: The Effectiveness of Technologies, Strategies, and Collaborations in Combating Modern Threats, ’ controlled tests show evasion rates of up to 30% for these methods. These approaches struggle with complex interactions between features and high-dimensional traffic, leading to biases from oversimplified relevance measures.

The limitations include:

Traditional email security often misses zero-day attacks and malware that constantly change to avoid detection.
Systems that rely on static rules struggle to catch subtle phishing or insider threats.
Pulling records manually takes a lot of time and storage, slowing down compliance checks.
Rule-based filters create too many false alarms, overwhelming analysts and making monitoring less effective.
Older tools don’t track behavior over time, so evolving threats can slip through unnoticed.
They can’t pick up on language nuances, letting sophisticated business email compromise attacks in medical emails go undetected.
Fixed rules fail against encrypted or disguised content used in modern attacks.
Legacy systems don’t scale well with large amounts of electronic health records

Where uncertainty appears in email threats

In email security, Monte Carlo dropout has become a practical way to spot uncertainty. The idea is that the same email is analyzed multiple times, but each time, small parts of the model are randomly turned off. If the results stay the same, the model is confident. If the results vary, something about the email is unusual.

By measuring how much the predictions fluctuate, using variance, entropy, or mutual information, these methods can flag emails that are tricky or rare phishing attempts. High uncertainty often matches actual mistakes, which is why this approach improves accuracy without just blocking more emails.

Conformal prediction then take it a step further by attaching a confidence range to each prediction, such as guaranteeing that the correct outcome falls within the model’s estimate 80% of the time. Research in medical imaging titled ‘Advancing Phishing Email Detection: A Comparative Study of Deep Learning Models’ uses this approach to explain the vast majority of prediction errors, and the same logic applies to cybersecurity. Emails with low uncertainty can pass through automatically, while high-uncertainty emails are flagged for review without requiring security teams to understand every detail of the model.

How generative AI processes uncertainty differently

Traditional signature-based filters make simple yes/no decisions and often struggle with new or unusual phishing attacks, leading to too many false alarms or missed threats. Generative AI models take a smarter approach.

Processes that measure epistemic uncertainty, previously discussed alongside aleatoric uncertainty, capture randomness from tricky, well-crafted emails. Studies, like one in PLOS Digital Health, show this method works much better than traditional filters in catching high-risk emails like AI-generated phishing that mimics HIPAA notices.

During analysis, the email is broken into tokens and examined using attention-based feature extraction. Generative augmentation creates rare phishing examples during training, helping the model learn what it hasn’t seen before. Instead of rigid rules, it applies adaptive thresholds, like automatically sending emails with high uncertainty to quarantine, making security smarter and more flexible.

FAQs

What is generative AI?

Generative AI refers to machine learning models that create new content, text, images, audio, or code, based on patterns learned from existing data.

How is output quality evaluated in generative AI?

Metrics vary by application, including accuracy, F1-score, perplexity (for text), image similarity metrics, or predictive uncertainty measures like Monte Carlo dropout or conformal prediction.