Ethics of AI that analyze communications involving patient data
AI systems that analyze patient communications offer benefits like identifying patterns in clinical documentation that might escape human notice, flag
Generative AI systems need a lot of data to train and improve, which means that when protected health information (PHI) is copied and pasted into these tools, it could be leaked to data systems that should not access it. Results from a 2024 Journal of Medical Internet Research study found that exposure risk is not limited to one moment but can emerge during multiple stages of data collection.
Employees should therefore never enter PHI that allows for the possibility of reidentification through inference attacks or other means, like tracing electronic health record (EHR) patterns.
Inputting PHI can quickly lead to privacy violations, especially when using tools not designed for healthcare.
When generative AI models are given PHI, they may expose it by memorizing and repeating it. It could lead to reconstruction assaults, where attackers can get the originals back from the outputs. When confidential data like EHR patterns, drug formulas, or genetic sequences are used to train public models, it hurts businesses because it leaks intellectual property through membership inference attacks that show training datasets.
The 2024 Change Healthcare attack shows how serious that damage can become as attackers used stolen login credentials to access a remote portal that lacked multifactor authentication. The result a ransomware attack that disrupted claims processing, pharmacy services, and payments across the U.S. healthcare system.
Change Healthcare processes about 50% of U.S. medical claims, and UnitedHealth told Congress it had to provide more than $6.5 billion in accelerated payments and no-interest loans to help affected providers manage the fallout. When AI systems generate false or misleading outputs from bad or exposed data, it puts the organization's content at risk of being inaccurate.
A chapter from Generative Artificial Intelligence in Health and Medicine: Opportunities and Responsibilities for Transformative Innovation notes, “Because the output of GenAI algorithms is fed back into the transformer engine to refine future output, there is a significant risk of sensitive information disclosure to non-authorized parties.” Generative AI risks exposing PHI through data leakage, reconstruction attacks, and regurgitation, as models recycle inputs into public training corpora.
Employees should avoid pasting:
People pay more attention to obvious PHI, but they ignore deidentified clinical data that can be used to reidentify people through linkage attacks, as models can guess demographics from trends. As a study from the Indian Dermatology Online Journal explains, “Even though such data is necessarily de-identified before sharing with a third-party data aggregator, the risk that new ways of data linkage may be developed, which may end up recognizing the sources, remains real.”
Genomic sequences or medicine formulations seem to be anonymised, but they actually increase the danger of the reconstruction of PHI by allowing threat actors to get original data from outputs using model inversion.
Model inversion is a privacy attack where someone uses access to a machine-learning model (even just via an API) to reconstruct sensitive information the model learned from its training data. Instead of stealing a database directly, an attacker probes the model, studies the outputs, and works backward until the recovered content resembles real underlying records or attributes. The study, Federated Learning Attacks Revisited: A Critical Discussion of Gaps, Assumptions, and Evaluation Setups states, “Model inversion is one of the most severe attacks, since the adversary, in some cases, can fully reconstruct the client data.”
The risk maps to employee behavior in a very practical way when pasting internal text (client lists, contracts, tickets, customer identifiers, case notes, credentials, screenshots, or cleaned summaries) into AI tools creates new copies of sensitive data outside controlled systems, often into vendors' organizations that do not fully govern.
OpenAI’s consumer guidance says, “Please do not enter sensitive information that you would not want reviewed or used,” yet employees still paste client emails, incident tickets, credentials, screenshots, and patient-related notes into chat boxes to summarize, rewrite, or classify them.
In HIPAA environments, that copy-paste habit can turn into an impermissible disclosure if the AI provider is not acting as a business associate under a BAA, because HIPAA restricts using or disclosing PHI beyond what the rules allow and expects controls over where ePHI is stored, received, maintained, or transmitted.
See also: HIPAA Compliant Email: The Definitive Guide (2026 Update)
Usually, yes, if the vendor is creating, receiving, maintaining, or transmitting PHI on behalf of the covered entity.
Potentially, but only if the data truly meets HIPAA’s de-identification standard. Under HHS guidance, information is not individually identifiable only if it does not identify the person and the covered entity has no reasonable basis to believe it can be used to identify the person.
Sometimes. HIPAA permits use and disclosure of PHI without separate authorization for treatment, payment, and healthcare operations.
AI systems that analyze patient communications offer benefits like identifying patterns in clinical documentation that might escape human notice, flag
Many healthcare systems already struggle with outdated or incomplete security frameworks, and when AI tools are added to the mix, they often inherit...
AI bias refers to systematic and unfair errors in artificial intelligence systems that lead to unequal or inaccurate outcomes for certain individuals...
Every Friday we bring you the most important news from Paubox. Our aim is to make you smarter, faster.