The Privacy Rule doesn’t create an affirmative duty to deidentify and disclose data. Instead, it explains when information qualifies as no longer individually identifiable and falls outside HIPAA's protections. Once data is properly deidentified, either by removing the 18 Safe Harbor identifiers or through an Expert Determination showing a very small risk of re-identification, it ceases to be regulated protected health information (PHI).
At that point, organizations may choose to share it without patient authorization for purposes such as secondary research, quality improvement, or certain public health uses. Deidentification allows for sharing but it does not compel it.
As a paper from the American Heart Association explains, “Broad data sharing can be an enabling driver of progress by, for example, providing data to develop, test, and benchmark innovative methods, scalable insights, and potential new paradigms for data storage and workflow… Along with this promise come concerns about the sensitive nature of some health data, equity considerations about the involvement of historically excluded communities, and the complex intersection of laws attempting to govern behavior in this space.”
What counts as deidentified data
Deidentified data refers to information that cannot be used to identify a person and does not reasonably allow someone to figure out who the data came from, even when combined with other available information. As one study ‘Modes of De-identification‘ explains, deidentification is “a process of detecting identifiers (e.g., personal names and social security numbers) that directly or indirectly point to a person … and deleting those identifiers from the data,” with the goal of minimizing the chance that a patient can be re-identified during later use of the data.
HIPAA recognizes two ways to meet that standard. The first is the Safe Harbor method. Under Safe Harbor, an organization removes all 18 categories of identifiers. After doing so, the organization must also confirm it has no actual knowledge that the remaining data could still identify someone. The same study describes Safe Harbor plainly: “The Safe Harbor method requires all 18 personal identifiers to be eliminated,” making it a clear, rules-based approach that leaves little room for interpretation.
The second option is Expert Determination. Instead of stripping data down to the minimum, a qualified expert applies accepted statistical or scientific methods to evaluate the likelihood of re-identification and determines that the risk is very small.
The study notes that this approach “uses the preservation of certain personal identifiers (usually dates and demographics) combined with an expert’s assurance that these identifiers could not be used to re-identify the patient.” Expert Determination is often used when data utility matters, but it requires defensible methodology and documentation to hold up under scrutiny.
See also: What is the difference between anonymization and de-identification?
When deidentification is needed
Most Institutional Review Boards and research ethics committees expect data to be deidentified before it is reused for secondary research. The expectation is especially strong when obtaining consent is impractical, or when a data repository has promised participants and contributors that only deidentified information will be shared. In those settings, deidentification is treated as a baseline safeguard, not an optional extra.
As the Modes of Deidentification study explains, “De-identification of protected health information is an essential method for protecting patient privacy. Most institutes require de-identification of patient data before conducting scientific studies; therefore, clinical scientists need to be cognizant of all modes of de-identification and all services provided by their de-identification tools.”
Deidentification also assists when organizations build open or long-running datasets meant for broader use, such as public challenges, registries, or multi-institution research networks. In those environments, the risk of re-identification through data linkage is higher, and governance documents often explicitly limit releases to deidentified data to manage that risk.
Situations where deidentification is not required
Healthcare organizations do not need to deidentify PHI when they are using it for treatment, payment, or healthcare operations. Day-to-day functions like coordinating care, processing claims, conducting internal quality reviews, or managing staff all rely on identifiable information and are expected to do so. Privacy risk remains controlled because the data stays within established, trusted workflows.
Deidentification is also not required when disclosures are required by law or serve public health purposes. Reporting certain conditions to public health authorities, responding to outbreaks, or complying with regulatory reporting obligations often requires identifiable data. The same is true for research that has been approved by an Institutional Review Board when participants have provided consent, or when a waiver has been granted and identifiability is necessary to meet the study’s goals.
As one analysis ‘Is Deidentification Sufficient to Protect Health Privacy in Research?’ explains, “Under the current regulatory framework in the United States, studies involving deidentified health records are exempt from regulations governing research with human subjects… [and] deidentified health records are outside the definition of ‘protected health information’ and therefore are exempt from federal privacy protections.”
Limited Data Sets fall into a similar category. HIPAA allows certain identifiers, such as dates and limited geographic information, to remain when data is shared under a data use agreement for research, public health, or operations. The study notes that between 63% and 87% of the U.S. population can be uniquely identified using only gender, ZIP code, and date of birth. Full deidentification is not required in that context.
The middle ground
Limited Data Sets sit in the middle ground between fully identifiable PHI, which generally requires authorization, and fully deidentified data. They allow organizations to use and share information for purposes like quality improvement, analytics, and trend tracking without seeking individual consent or an IRB waiver. At the same time, Limited Data Sets still contain certain quasi-identifiers, such as dates or demographic details, which means there is a higher risk of data linkage. That risk is managed through data use agreements, access controls, and clear governance rules.
One common strategy discussed in ‘Strategies for de-identification and anonymization of electronic health record data for use in multicenter research studies’ combines partial suppression of higher-risk fields with an Expert Determination. That approach allows key time or location details to remain available for cohort analysis while statistically verifying that re-identification risk stays below acceptable levels.
Pseudonymization is another option often used in biobanks and clinical trials. In these models, direct identifiers are replaced with reversible codes that are tightly controlled. An independent “honest broker” holds the re-identification key, making it possible to reconnect data to individuals when ethically justified, without exposing identities during routine analysis.
Are there instances where email communications have to be deidentified?
Email counts as PHI under HIPAA when it includes any of the 18 identifiers, such as a patient’s name, contact details, or clinical information that can be tied back to an individual. Deidentification becomes relevant only when those emails are shared for secondary purposes outside treatment, payment, or healthcare operations. Examples include research collaboration, data sharing with external partners, or broader analytical use without patient authorization. In those cases, Safe Harbor or Expert Determination may be required to reduce re-identification risk.
One clinical trial research paper ‘Concepts and Methods for De-identifying Clinical Trial Data, “Very detailed health information about participants is collected during clinical trials. Several different stakeholders would typically have access to individual-level participant data (IPD), including the study sites, the sponsor of the study, statisticians, Institutional Review Boards (IRBs), and regulators. By IPD, we mean individual-level data on trial participants, which is more than the information that is typically included, for example, in clinical study reports (CSRs)”.
Routine internal email use does not trigger the same requirement. Messages exchanged for care coordination, billing, or operational tasks fall squarely within TPO and can include identifiers, provided appropriate safeguards are in place. The same applies when emails involve business associates operating under valid agreements, since the data remains within a controlled and regulated environment.
See also: HIPAA Compliant Email: The Definitive Guide (2025 Update)
FAQs
Can deidentified data ever be re-identified?
While properly deidentified data has a very low risk of re-identification, combining it with external datasets or insufficient deidentification methods can potentially reveal individual identities.
Does deidentification remove all clinical usefulness of data?
Not necessarily, especially when using Expert Determination or pseudonymization, which preserve key analytic or research-relevant fields while minimizing identification risk.
Are there legal consequences for failing to properly deidentify data?
Yes, if PHI is improperly labeled as deidentified and disclosed, organizations can face HIPAA enforcement actions, including fines or corrective mandates.
Is deidentification required for AI training on healthcare datasets?
HIPAA does not mandate it for AI development unless identifiable PHI is used outside TPO or without proper authorization.
Subscribe to Paubox Weekly
Every Friday we'll bring you the most important news from Paubox. Our aim is to make you smarter, faster.
