4 min read

Why HIPAA alone will not settle the healthcare AI debate

Why HIPAA alone will not settle the healthcare AI debate

HIPAA is the most well-known health privacy law in the US. It sets up privacy notices, vendor contracts, breach reporting, audits, and daily compliance in mainstream healthcare. HIPAA, on the other hand, sets rules for when protected health information (PHI) can be used or shared, who is a covered entity or business associate, what de-identification paths are available, and what privacy, security, and breach-response duties come with them.

HIPAA is necessary, but health systems need to be careful about sharing PHI and still not use the wrong model. According to a study published in the Journal of Law, Medicine & Ethics, “The deployment of AI chatbots in the healthcare industry can be accompanied by certain privacy risks both for data subjects and the developers and vendors of these AI-driven tools.”

A vendor can sign a business associate agreement and still make model-governance problems that are not clear. A tool can be safe and respect privacy, but it can still copy racial or socioeconomic unfairness. And a health AI product that is aimed at consumers can mostly stay out of HIPAA while still being watched by the FTC and having to follow state privacy laws. The discussion about AI in healthcare starts with HIPAA but cannot stop there.

 

Why HIPAA leads the AI debate

A journal article titled Health Insurance Portability and Accountability Act Liability in the Age of Generative Artificial Intelligence notes, “AI tools such as LLMs challenge existing HIPAA frameworks, particularly when used without appropriate institutional safeguards.”

HIPAA is at the center of the debate for historical, cultural, and legal reasons. Historically, it has been the main federal privacy law for the flow of normal healthcare data in the US. Culturally, it is the law that patients know about and the compliance framework that clinicians and administrators see all the time through notices, contracts, breach reporting, and enforcement.

For many AI deployments, the first legal question is whether the information is PHI and whether the proposed use fits treatment, payment, health care operations, research, or a de-identification pathway. For example, if a hospital wants to use clinical notes, images, claims, portal messages, or encounter recordings to evaluate or train a tool, the first question is usually whether that information is PHI.

A second reason is that HIPAA is at the very beginning of many health AI projects, data access. Big models and predictive systems need a lot of data. Within covered entities, the primary operational challenge frequently pertains to lawful access, data sharing, minimization, vendor classification, and protective measures, rather than model architecture.

So, even when the harder downstream questions are model behavior, fairness, drift, and accountability, HIPAA is still a prevalent issue. The role at the front door explains why the vendor marketing still leans so heavily on HIPAA compliant language, even though that phrase often says much less than buyers assume.

 

What HIPAA covers in AI use

HIPAA lets covered entities use or share PHI without getting permission from each person for treatment, payment, and health care operations. Research is handled differently; if the research is intended to produce generalizable knowledge, HIPAA usually necessitates authorization, an IRB or Privacy Board waiver, preparatory-to-research representations, decedent-research representations, or the utilization of a limited data set under a data-use agreement.

When it comes to AI, it means that HIPAA can cover many common deployment scenarios, even though the rule does not use AI language. If a health system uses PHI to check out or run a documentation assistant, utilization tool, coding support system, triage workflow, or quality-improvement model, the most likely HIPAA pathway is treatment, payment, or health care operations, as long as they follow the minimum-necessary and security rules.

The previously referenced Journal of Law, Medicine & Ethics study states that "protection of PHI under HIPAA from use or disclosure ranges on a spectrum. The highest type of protection offered is when HIPAA requires the covered entity or business associate to obtain the patient’s written authorization to use or disclose a recording and limits the use or disclosure to the extent ‘minimum necessary.’ The lowest level of protection is use or disclosure of the PHI without any restrictions. Furthermore, there are certain situations where disclosure of PHI is mandatory.”

If the organization is training a model for a research study or for more broadly applicable scientific conclusions, the research provisions become a priority. The analysis gets stricter if a vendor wants to use that data for its own larger product development goals. It often depends on whether the data is really de-identified, whether the contract allows the use, and whether a research or authorization pathway is needed.

HIPAA’s Security Rule matters more in AI than some early privacy-first debates acknowledged. AI deployments routinely rely on cloud infrastructure, APIs, logging systems, embeddings, backups, model repositories, and vendor sub-processors. The Security Rule requires covered entities and business associates to protect the confidentiality, integrity, and availability of ePHI, conduct risk analysis, and reassess threats over time. OCR’s December 2024 proposed Security Rule update underscores that HHS now sees cybersecurity modernization as a major regulatory priority, especially given repeated ransomware and third-party risk events in healthcare.

 

Where HIPAA falls short

HIPAA falls short, first, on model outputs. It is mostly a rule about how to handle information, not a way to judge how clinically valid an algorithm's recommendation is. A hospital can legally feed PHI into a HIPAA-governed workflow and still get output that is not safe or is made up. The gap shows up clearly in a PLoS Digital Health study on generative AI in medicine, 64% addressed hallucinations, 33% addressed privacy, and 31% addressed regulation. Reviews of generative AI in healthcare consistently note hallucinations, omissions, privacy risks, and prompt misinterpretation as fundamental challenges.

It also does not work well for inferential re-identification. The de-identification framework of HIPAA is legally relevant and often helpful, but modern AI makes de-identified data much less safe than many people in charge think it is. HHS itself says that Safe Harbor does not work if the person who discloses information knows that residual data can identify a person. Also, expert determination accepts judgments that are specific to the situation instead of a set number.

The same scoping review helps show why privacy concerns don't go away when direct identifiers are taken away: 33% of the articles reviewed raised privacy as a concern, and 13% specifically said that private health information should not be used to train models but instead synthetic data should be used.

See also: HIPAA Compliant Email: The Definitive Guide (2026 Update)

 

FAQs

What does de-identification mean in healthcare AI?

De-identification means removing or reducing identifying details in health data so the information cannot reasonably be linked back to a specific person.

 

How is de-identified data different from anonymized data?

De-identified usually means identifiers are removed under a legal or technical standard.

 

Why does de-identification matter so much for AI?

AI systems need large datasets, and health data is highly sensitive. De-identification helps organizations use data more responsibly while reducing the legal and ethical risks tied to direct patient identification.

 

Can AI re-identify de-identified healthcare data?

Sometimes, yes. Powerful models, large linked datasets, and external data sources can make it easier to infer identity even when direct identifiers are gone.

 

Does removing names and medical record numbers make data safe enough for AI training?

Not always. Dates, locations, rare conditions, treatment patterns, and combinations of fields can still make someone identifiable, especially in smaller populations or specialized datasets.

Subscribe to Paubox Weekly

Every Friday we bring you the most important news from Paubox. Our aim is to make you smarter, faster.