3 min read

How software can assist with the de-identification process of PHI

Kapua Iao July 3, 2023

Risk management Patient privacy

How software can assist with the de-identification process of PHI

HIPAA's Privacy Rule defines protected health information (PHI) as any individually identifiable health information, including demographic data. It can relate to an individual's physical or mental health condition, provision of healthcare, or payment for healthcare. PHI is also something that healthcare organizations must safeguard under the HIPAA Act.

Covered entities and their business associates must remain HIPAA compliant whenever using and disclosing PHI. What that means is that PHI is not accidentally or maliciously exposed. And that means putting PHI through a de-identification process.

A recap of PHI

PHI refers to personally identifiable information (PII) utilized or stored for patient care. But HIPAA does not just confine PHI to medical records and test results. PHI is any information doctors use and/or disclose during care that can identify a patient. Generally, 18 PHI identifiers may distinguish a patient.

Patient names	Geographical elements	Dates related to the health or identity of individuals
Telephone numbers	Fax numbers	Email addresses
Social security numbers	Medical record numbers	Health insurance beneficiary numbers
Account numbers	Certificate/license numbers	Vehicle identifiers
Device attributes or serial numbers	Digital identifiers, such as website URLs	IP addresses
Biometric elements (e.g., fingerprints)	Photographs of a patient’s face	Other identifying numbers or codes

According to the Privacy Rule, covered entities can use and disclose PHI for treatment, if required by law, and for public health purposes. When used with medical information, such as the details of a patient's mental and physical health, any identifier is PHI.

PHI identification and de-identification

The growth of healthcare technology accelerated the need for HIPAA to provide more robust guidelines on sharing sensitive information. And part of this is identifying and understanding an organization's PHI within its systems. Questions organizations must ask (e.g., during a risk assessment) include:

Where and how is PHI stored?
What type of PHI do we work with?
Who has access to the PHI, and when?
Who needs access to the PHI, and for what?

These identity-focused questions enable organizations to put security processes in place. These processes, such as de-identification, ensure that only a minimum amount of information is used, shared, and disclosed.

De-identification, required under the Privacy Rule, is the process of removing or modifying identifiers (i.e., PII) from data. When sharing data in a manner that doesn't align with the Privacy Rule, it's required to de-identify all identifiers.

Data must be de-identified so it can't be used to identify patients. Moreover, the data is no longer subjected to HIPAA since PHI is removed. De-identification doesn't guarantee that the data cannot be linked to an individual, but it does reduce the risk.

HIPAA methods of de-identification

On its web page about de-identification, HIPAA focuses on two methods: the safe harbor method and expert determination.

The safe harbor method focuses on the removal of the 18 identifiers from data. By removing this information, the risk of re-identification is substantially reduced. Safe harbor leaves a limited data set that may still have some inherent privacy risks. It is restrictive, less context-sensitive, and it can be hard to rematch data afterward if needed.

Expert determination is as it states - hiring an expert knowledgeable in statistics to alter data so that PHI/PII is unidentifiable. The expert evaluates the data to determine that the risk of re-identification is minimal. This approach assesses various factors and applies rigorous techniques to ensure the de-identified data cannot be linked back to individuals.

Why PHI (de-)identification is important

The de-identification of PHI ensures that patients' information remains safeguarded, maintaining HIPAA privacy and trust standards. But protecting patients' information is only one, albeit important, reason to de-identify PHI.

Another reason has to do with HIPAA allowing for the release of limited data sets. For example, think about the pandemic and how health practitioners were able to release statistics without any patient markers. The de-identification of PHI allows healthcare organizations to share this data internally and with third-party organizations for research, marketing, and public health purposes.

Furthermore, de-identifying PHI before further processing eliminates the need for entities to get authorization from each patient. This, in turn, reduces organizations' administrative burden while ensuring that patients' data remains private.

How can software help with de-identifying?

Traditionally, healthcare organizations used manual methods to de-identify PHI. But human error is inevitable. Professionals are likely to miss one or more PHI identifiers or even mistake data as PHI that is not.

This creates privacy risks for individuals and increases the risk of noncompliance with HIPAA. Additionally, online medical records (and patient data itself) have become more common, widespread, and accessible. That is why the need for an automated de-identification process of medical data is more acute.

So much so that the U.S. National Institute of Standards and Technology even provides a list of de-identification tools (i.e., software). Automation makes it possible to:

Scrub data at scale
Minimize over-scrubbing
Reduce compliance costs
Reduce reliance on old data
Speed up the process of de-identification
In some cases, produce more accurate results

De-identification software makes it possible to derive value from data sets without risking PHI. Software can keep an organization HIPAA compliant and help identify and de-identify data.

Protect PHI

HIPAA compliance requires keeping PHI protected, and understanding and utilizing de-identification techniques is essential to this compliance. PHI de-identification software can be a powerful tool to minimize identity risks and ensure healthcare data privacy.