Hidden prompt attack in legal AI tool exposes risk to law firms
Researchers say a flaw in a widely used legal AI assistant could allow attackers to capture logins and access sensitive case data.
4 min read
Farah Amod
June 10, 2026
A flaw in how ChatGPT renders summarized web content lets attackers embed phishing links, fake security alerts, and QR codes directly inside the AI assistant's response without the user opening an attachment or clicking a suspicious email.
Researchers have disclosed a vulnerability in OpenAI's ChatGPT, codenamed ChatGPhish, that allows attackers to turn any web page into a phishing surface by exploiting how the assistant renders Markdown links and images from pages it has been asked to summarize. According to The Hacker News, the chatgpt.com response renderer trusts Markdown links and image URLs pulled from third-party pages during summarization, automatically fetching those images and displaying those links as live, clickable elements inside the trusted ChatGPT interface. An attacker who appends a small payload to any web page can cause ChatGPT to leak the summarizing user's IP address, browser details, and referrer information when attacker-hosted images are fetched. More importantly, the flaw can render malicious links as legitimate-looking clickable elements, display fake system-style security alerts, and serve QR codes hosted on attacker infrastructure, all appearing inside ChatGPT's own interface rather than on a suspicious external site.
What makes ChatGPhish particularly relevant for organizations using AI tools for research and document processing is that no unusual user behavior is required. A staff member asks ChatGPT to summarize a web page as part of their normal day; reviewing a policy document, a clinical guideline, or a vendor communication, and if that page contains an attacker-controlled payload, the phishing content appears inside ChatGPT's response. The user sees it as part of the AI's output, not as an external link they were tricked into visiting. Researchers noted that this shifts the attack surface from email and file attachments to the browsing and summarization workflow itself. In the same disclosure, researchers documented two related vulnerabilities targeting AI coding tools. SymJack tricks AI coding assistants into overwriting their own configuration files via a symlinked file copy in a malicious repository, with the payload executing on the next restart with full user privileges. TrustFall achieves remote code execution by shipping a repository that auto-approves a malicious server configuration when a developer opens the folder and clicks a generic trust prompt.
Researchers stated in their analysis shared with The Hacker News that "the shift from email to the browser significantly expands the potential attack surface. A user no longer has to open a malicious attachment or interact with a suspicious message. Simply summarizing a page during normal browsing activity can introduce attacker-controlled instructions into the model context and ultimately into the rendered response." Researchers described the core problem as ChatGPT implicitly trusting "Markdown links and Markdown image URLs that originated from a third-party page the assistant has just summarized," and surfacing those as live, interactive elements inside a trusted interface.
ChatGPhish is part of a documented pattern of prompt injection vulnerabilities emerging across AI assistant platforms. According to The Hacker News, researchers previously disclosed that Microsoft Copilot could be manipulated through specially created emails into producing attacker-directed output when asked to summarize them, the same indirect prompt injection mechanism applied to email rather than web pages. A separate vulnerability in Anthropic's Claude browser extension allowed any other browser extension to issue commands to the Claude assistant without requiring special permissions. The pattern across all three cases is consistent: AI assistants that process external content inherit the trust the user places in the AI interface, creating an assumption that content appearing inside a trusted AI tool is itself trustworthy.
Healthcare organizations are adopting ChatGPT and similar tools for summarizing clinical literature, drafting patient communications, reviewing vendor contracts, and processing administrative content. Each of those use cases involves asking the AI to process external web content, which is precisely the workflow ChatGPhish exploits. A staff member who asks ChatGPT to summarize a policy page that an attacker has seeded with a payload will see the phishing link or fake alert inside the familiar ChatGPT interface, carrying the implicit credibility of an AI-generated response rather than the visual signals of a suspicious email. According to Paubox's Shadow AI report, 95% of healthcare organizations report staff using unapproved AI tools, and 75% of healthcare workers incorrectly assume Microsoft Copilot is automatically HIPAA compliant. The security assumptions staff apply to their own AI tool usage have not kept pace with the vulnerabilities being discovered in those tools.
Prompt injection occurs when attacker-controlled text embedded in content processed by an AI causes the model to follow the attacker's instructions rather than the user's. In ChatGPhish, instructions hidden in a web page override ChatGPT's summarization task and cause it to render attacker-specified links, images, and alerts in its response.
No. The attacker only needs control over a web page that the user might ask ChatGPT to summarize. The payload is embedded in that page's content, and ChatGPT's renderer executes it when generating the summary. The attacker has no direct access to the user's session.
Desktop URL filters and enterprise security tools inspect text-based links. A QR code is an image that contains no scannable URL in the response text. The malicious destination is only revealed when a user scans the code with a mobile device, which operates outside the organization's desktop security perimeter, the same reason QR code phishing in email has grown in 2025 and 2026.
The article does not report any specific response or patch from OpenAI at the time of publication. Organizations should monitor OpenAI's security advisories for updates and consider whether web summarization workflows present an acceptable risk given the disclosed behavior.
Training staff to treat links and alerts appearing in AI-generated summaries with the same scrutiny they apply to email content is the most immediate control. Organizations can also restrict which external URLs staff are permitted to submit to AI tools for summarization, and should include AI assistant misuse scenarios in security awareness training rather than treating AI tools as inherently trusted.
Researchers say a flaw in a widely used legal AI assistant could allow attackers to capture logins and access sensitive case data.
Thirty copycat Chrome extensions impersonating AI tools have collectively accumulated over 260,000 downloads while secretly gathering users' personal...
Google agreed to pay $68m to settle a class action lawsuit alleging that it secretly recorded users’ private conversations through Google Assistant...
Every Friday we bring you the most important news from Paubox. Our aim is to make you smarter, faster.