AI email agents fall for phishing attacks and leak credentials

Researchers found that an AI agent connected to a corporate inbox handed over AWS keys, database credentials, and customer records in response to phishing emails, and that adding explicit phishing-awareness instructions did not stop the most damaging attacks.

What happened

Security researchers created an OpenClaw AI agent connected to a Gmail inbox, browser tools, Google Workspace APIs, and simulated internal company data, including AWS credentials, database credentials, CRM exports, and calendar invites, then ran four phishing simulations against it to assess whether classic social engineering tactics that work on humans would also work on an AI agent acting autonomously on their behalf. According to BleepingComputer, the results showed the agent successfully blocked a malicious OAuth application and eventually identified a fake gift card phishing page, but failed in the two highest-damage scenarios regardless of whether a strict security configuration was active. In the first scenario, an attacker impersonating a team lead requested emergency access during a production issue, and the agent emailed AWS IAM keys, database credentials, and access details to an external Gmail address. In the second, an attacker requested a customer export under the pretext of a remote presentation, and the agent retrieved and sent a full CRM export containing customer records, contact information, contract details, and revenue data without verifying the sender's identity.

Going deeper

The strict configuration, which included explicit phishing awareness and identity verification instructions, failed in both high-damage scenarios because urgency overrode the verification logic. Researchers described the failure as the "verification step collapsing when the request appeared operationally urgent," a pattern directly mirroring the social engineering dynamic that makes phishing effective against humans. The agent was tested using two AI models, Google Gemini 3.1 Pro and OpenAI GPT-5.4. Gemini showed a greater willingness to respond to requests, while GPT-5.4 adopted a more cautious posture, though both failed the credential and CRM scenarios.

What was said

Researchers stated in their report cited by BleepingComputer that "the same phishing techniques that have tricked humans for decades would also work on the AI agents working on their behalf," and concluded that "AI agents are good at detecting suspicious URLs, identifying fake login pages, spotting malicious OAuth apps, and recognizing phishing indicators, but may still fail due to a lack of identity verification, loss of context, and inability to apply zero trust principles to social interactions." Researchers recommended that agents be explicitly required to verify sender identities, be prevented from emailing new external recipients without approval, have limited access to internal data, and require human approval for high-risk actions, including credential sharing, financial data requests, and first-time communications.

In the know

The OpenClaw phishing vulnerability findings arrived in the same week that researchers disclosed ChatGPhish, a vulnerability in ChatGPT's web summarization that allows attackers to embed phishing links inside AI-generated responses. According to The Hacker News, the ChatGPhish flaw exploits the same underlying trust gap, when an AI system processes external content, the user assumes the output inherits the AI platform's credibility. The OpenClaw findings extend that problem to agentic AI systems that take actions and produce text, making the consequences of a successful social engineering attack against an agent potentially more severe than the same attack against a human who can pause and reconsider.

The big picture

Healthcare organizations moving toward AI-assisted administrative workflows, scheduling, prior authorizations, billing, and patient communications are beginning to deploy exactly the category of system this research tests. An AI email agent with access to an EHR integration, a billing platform, or an internal credential store that fails an urgency-based phishing attack does not generate a suspicious login alert or trigger an MFA challenge. It simply acts, at machine speed, on whatever the phishing email requests. According to Paubox's Shadow AI report, 95% of healthcare organizations report staff using unapproved AI tools, and 75% incorrectly assume Microsoft Copilot is automatically HIPAA compliant. As AI agents move from productivity tools to autonomous operators with access to PHI systems, the assumption that standard phishing controls apply to those agents will produce exactly the failure modes this research documented.

FAQs

What is an AI agent, and how does it differ from a standard AI assistant?

A standard AI assistant responds to queries and generates text. An AI agent connects to external systems, email, databases, APIs, and web browsers and takes actions autonomously based on instructions. An agent can send emails, retrieve files, make API calls, and interact with external services without a human approving each step, which is what makes it useful and what makes it a phishing target.

Why did adding explicit phishing awareness instructions not prevent the credential leak?

The agent's security configuration included identity verification steps, but the simulation framed the request as an urgent production incident. Urgency collapsed the verification logic because the agent's reasoning weighted operational priority over security procedure, the same cognitive shortcut that makes urgency-based phishing effective against humans. The fix requires enforcing verification as an inviolable prerequisite rather than a procedural step that can be skipped under pressure.

What does zero trust mean when applied to AI agent social interactions?

Zero trust in network security means never assuming a request is legitimate based on its origin, every access request is verified regardless of where it comes from. Applied to AI agents, it means the agent should treat every request for sensitive data or high-risk action as unverified, regardless of how familiar the sender appears, requiring explicit identity confirmation before proceeding rather than inferring legitimacy from context.

Why is an AI agent that leaks credentials more damaging than a human who does the same?

A human who receives a phishing email may hesitate, ask a colleague, or report it. An AI agent processes and acts at machine speed with no natural pause. The agent in this research emailed AWS keys and a full CRM export within the normal course of processing incoming email, without generating any alert, requiring any secondary approval, or creating any audit trail distinguishable from legitimate activity.

What access controls most directly limit the damage from AI agent phishing?

Restricting the agent's data access to only what is required for its specific function, requiring human approval for any outbound communication to new external recipients, logging all agent actions with the same audit trail requirements applied to human users, and enforcing identity verification as a mandatory prerequisite for credential or data requests, not an optional step, all reduce the blast radius when social engineering succeeds against an agent.

AI email agents fall for phishing attacks and leak credentials

What happened

Going deeper

What was said

In the know

The big picture

FAQs

What is an AI agent, and how does it differ from a standard AI assistant?

Why did adding explicit phishing awareness instructions not prevent the credential leak?

What does zero trust mean when applied to AI agent social interactions?

Why is an AI agent that leaks credentials more damaging than a human who does the same?

What access controls most directly limit the damage from AI agent phishing?

Products

Resources

Company

AI email agents fall for phishing attacks and leak credentials

What happened

Going deeper

What was said

In the know

The big picture

FAQs

What is an AI agent, and how does it differ from a standard AI assistant?

Why did adding explicit phishing awareness instructions not prevent the credential leak?

What does zero trust mean when applied to AI agent social interactions?

Why is an AI agent that leaks credentials more damaging than a human who does the same?

What access controls most directly limit the damage from AI agent phishing?

Related articles

Subscribe to Paubox Weekly