3 min read

Why employees should never paste PHI into AI tools

Mara Ellis March 20, 2026

Generative AI systems need a lot of data to train and improve, which means that when protected health information (PHI) is copied and pasted into these tools, it could be leaked to data systems that should not access it. Results from a 2024 Journal of Medical Internet Research study found that exposure risk is not limited to one moment but can emerge during multiple stages of data collection.

Employees should therefore never enter PHI that allows for the possibility of reidentification through inference attacks or other means, like tracing electronic health record (EHR) patterns.

Inputting PHI can quickly lead to privacy violations, especially when using tools not designed for healthcare.

Why pasting information into AI tools creates business risk

When generative AI models are given PHI, they may expose it by memorizing and repeating it. It could lead to reconstruction assaults, where attackers can get the originals back from the outputs. When confidential data like EHR patterns, drug formulas, or genetic sequences are used to train public models, it hurts businesses because it leaks intellectual property through membership inference attacks that show training datasets.

The 2024 Change Healthcare attack shows how serious that damage can become as attackers used stolen login credentials to access a remote portal that lacked multifactor authentication. The result a ransomware attack that disrupted claims processing, pharmacy services, and payments across the U.S. healthcare system.

Change Healthcare processes about 50% of U.S. medical claims, and UnitedHealth told Congress it had to provide more than $6.5 billion in accelerated payments and no-interest loans to help affected providers manage the fallout. When AI systems generate false or misleading outputs from bad or exposed data, it puts the organization's content at risk of being inaccurate.

Other data employees should avoid inputting into AI

A chapter from Generative Artificial Intelligence in Health and Medicine: Opportunities and Responsibilities for Transformative Innovation notes, “Because the output of GenAI algorithms is fed back into the transformer engine to refine future output, there is a significant risk of sensitive information disclosure to non-authorized parties.” Generative AI risks exposing PHI through data leakage, reconstruction attacks, and regurgitation, as models recycle inputs into public training corpora.

Employees should avoid pasting:

Patient identifiers like names, SSNs, medical record numbers, or contact details.
Clinical data, including diagnoses, treatments, lab results, or imaging metadata, enabling reidentification.
Billing codes, provider notes, or any PHI risk HIPAA fines up to 4% of revenue.
Credentials or access tokens that fuel ransomware and unauthorized system breaches.

Grey areas employees often underestimate

People pay more attention to obvious PHI, but they ignore deidentified clinical data that can be used to reidentify people through linkage attacks, as models can guess demographics from trends. As a study from the Indian Dermatology Online Journal explains, “Even though such data is necessarily de-identified before sharing with a third-party data aggregator, the risk that new ways of data linkage may be developed, which may end up recognizing the sources, remains real.”

Genomic sequences or medicine formulations seem to be anonymised, but they actually increase the danger of the reconstruction of PHI by allowing threat actors to get original data from outputs using model inversion.

Why model inversion poses such a risk

Model inversion is a privacy attack where someone uses access to a machine-learning model (even just via an API) to reconstruct sensitive information the model learned from its training data. Instead of stealing a database directly, an attacker probes the model, studies the outputs, and works backward until the recovered content resembles real underlying records or attributes. The study, Federated Learning Attacks Revisited: A Critical Discussion of Gaps, Assumptions, and Evaluation Setups states, “Model inversion is one of the most severe attacks, since the adversary, in some cases, can fully reconstruct the client data.”

The risk maps to employee behavior in a very practical way when pasting internal text (client lists, contracts, tickets, customer identifiers, case notes, credentials, screenshots, or cleaned summaries) into AI tools creates new copies of sensitive data outside controlled systems, often into vendors' organizations that do not fully govern.

The problem with AI tools and data privacy

OpenAI’s consumer guidance says, “Please do not enter sensitive information that you would not want reviewed or used,” yet employees still paste client emails, incident tickets, credentials, screenshots, and patient-related notes into chat boxes to summarize, rewrite, or classify them.

In HIPAA environments, that copy-paste habit can turn into an impermissible disclosure if the AI provider is not acting as a business associate under a BAA, because HIPAA restricts using or disclosing PHI beyond what the rules allow and expects controls over where ePHI is stored, received, maintained, or transmitted.

FAQS

If an AI vendor handles PHI for a provider, does that vendor need a BAA?

Usually, yes, if the vendor is creating, receiving, maintaining, or transmitting PHI on behalf of the covered entity.

Can a provider use a public AI chatbot with de-identified data?

Potentially, but only if the data truly meets HIPAA’s de-identification standard. Under HHS guidance, information is not individually identifiable only if it does not identify the person and the covered entity has no reasonable basis to believe it can be used to identify the person.

Can AI be used for treatment, payment, or healthcare operations without a separate patient authorization?

Sometimes. HIPAA permits use and disclosure of PHI without separate authorization for treatment, payment, and healthcare operations.

Subscribe to Paubox Weekly

Every Friday we bring you the most important news from Paubox. Our aim is to make you smarter, faster.

Why employees should never paste PHI into AI tools

Why pasting information into AI tools creates business risk

Other data employees should avoid inputting into AI

Grey areas employees often underestimate

Why model inversion poses such a risk

The problem with AI tools and data privacy

FAQS

If an AI vendor handles PHI for a provider, does that vendor need a BAA?

Can a provider use a public AI chatbot with de-identified data?

Can AI be used for treatment, payment, or healthcare operations without a separate patient authorization?

What is AI bias?

What you need to know about log monitoring

Can healthcare protect itself from cybercriminals?

Subscribe to Paubox Weekly

Products

Resources

Company