Artificial intelligence promises to revolutionize healthcare by improving diagnostics, streamlining administrative processes, and personalizing treatment plans. A 2024 Harvard Medical School article titled, "The Benefits of the Latest AI Technologies for Patients and Clinicians," highlighted several advantages of AI in healthcare: its ability to help clinicians better interpret imaging results, its potential to assist healthcare organizations in improving quality and safety, and its capacity to aid in the diagnosis and treatment of rare diseases. However, beneath this promise is the possibility that AI can cause and worsen existing healthcare disparities through algorithmic bias.
According to research published by Boston University, titled "AI In Healthcare: Counteracting Algorithmic Bias," algorithmic bias can be defined as "inequality of algorithmic outcomes between two groups of different morally relevant reference classes such as gender, race, or ethnicity. Algorithmic bias occurs when the outcome of the algorithm's decision-making treats one group better or worse without good cause." This definition shows the ethical concern of using potentially biased AI in healthcare.
When AI systems make biased recommendations, they can directly impact patient care, leading to misdiagnosis, inappropriate treatments, or denied access to necessary interventions for marginalized populations. What makes this troubling is that these biases often operate invisibly, masked by the perceived objectivity of technology.
As healthcare increasingly relies on AI to inform critical decisions, understanding and addressing algorithmic bias becomes not just a technical challenge but an ethical imperative. Today, we explore real-world examples of healthcare AI bias and outline approaches to creating more equitable AI systems that serve all patients fairly.
A widely used commercial algorithm affecting millions of patients demonstrated racial bias in identifying patients for high-risk care management programs. The algorithm, developed by Optum and used by U.S. insurers and hospitals, was designed to predict healthcare costs rather than actual illness severity. Since less money is historically spent on Black patients with similar conditions, the algorithm underestimated their care needs.
As reported in a 2019 Science study led by Dr. Ziad Obermeyer of the University of California, Berkeley, the software regularly recommended healthier white patients for healthcare risk management programs ahead of sicker Black patients simply because those white patients were projected to be more costly. At the hospital studied, Black patients cost $1,800 less per year than white patients with the same number of chronic illnesses—a pattern observed across the United States.
The Boston University research highlighted this case study: "Obermeyer studied the accuracy of a healthcare algorithm that excluded social category classifiers... [They] found that since social category information, such as race, was excluded from the dataset the algorithm was trained and deployed on, the algorithm unintentionally used healthcare cost as a proxy variable for race. Black patients systemically have lower healthcare costs because there is unequal access to care for black and white patients and less money is spent on treatment of black patients."
When researchers recalibrated the algorithm using direct measures of health instead of costs, "The racial bias nearly disappeared, and the percentage of Black patients identified for additional care increased from 17.7% to 46.5%." This correction shows how algorithmic bias can affect care allocation.
While Optum called the findings "misleading," arguing that hospitals should supplement their cost algorithm with socioeconomic data and physician expertise, the study reveals how algorithms may cause disparities when cost is used as a proxy for medical need. As Princeton researcher Ruha Benjamin noted, such systems risk creating a "New Jim Code" that can "hide, speed and deepen racial discrimination behind a veneer of technical neutrality."
Research published in the Lancet Digital Health revealed a problem with AI systems being developed for skin cancer diagnosis. Dr. David Wen and colleagues from the University of Oxford examined 21 open-access datasets used to train AI algorithms for skin cancer detection and found severe underrepresentation of darker skin tones.
Of 106,950 total images across these datasets, only 2,436 had skin type recorded. Among these, just 10 images were from people with brown skin and only one was from an individual with dark brown or black skin. Even more concerning, of the 1,585 images that contained ethnicity-related data, "No images were from individuals with an African, African-Caribbean or South Asian background," the researchers reported.
This lack of diversity in training data creates a risk that AI diagnostic tools will be less accurate for people with darker skin. As Dr. Wen explained: "You could have a situation where the regulatory authorities say that because this algorithm has only been trained on images in fair-skinned people, you're only allowed to use it for fair-skinned individuals, and therefore that could lead to certain populations being excluded from algorithms that are approved for clinical use."
The alternative scenario is equally problematic: if these biased algorithms are approved for use across all populations, they "may not perform as accurately on populations who don't have that many images involved in training." This could result in misdiagnosis leading to "avoidable surgery, missing treatable cancers and causing unnecessary anxiety," particularly for patients with darker skin.
Professor Charlotte Proby, a dermatology expert at the University of Dundee, emphasized that the "failure to train AI tools using images from darker skin types may impact their reliability for assessment of skin lesions in skin of colour," with potentially wide-ranging implications for healthcare equity.
The challenge with these systems relates to the concern mentioned in Bias in medical AI, which states that "AI models trained on potentially biased labels may perpetuate and amplify not only differential misclassifications and substandard care practices based on these social factors, but also the original cognitive biases in its own predictions and recommendations."
Genetic algorithms for warfarin dosing provide another example of healthcare AI bias with direct clinical implications. A study titled Poor Warfarin Dose Prediction with Pharmacogenetic Algorithms that Exclude Genotypes Important for African Americans showed that widely used pharmacogenetic algorithms for warfarin dosing perform poorly in African Americans because they fail to account for critical genetic variations.
Their research revealed that algorithms used in clinical trials like the Clarification of Optimal Anticoagulation through Genetics (COAG) miscalculated appropriate dosing for African Americans.
The consequences of this algorithmic bias were evident in clinical trials. While the European Pharmacogenetics of Anticoagulation Therapy (EU-PACT) trial showed benefits from genetic dosing in its homogeneous European population, the COAG trial with its more diverse population found that "African Americans, who comprised approximately one-third of the COAG trial population, did worse with pharmacogenetic dosing, with a higher likelihood of supratherapeutic INR values with pharmacogenetic versus clinically based dosing."
When the researchers adjusted the algorithms to account for these African-specific genetic variants, they found that "the racial bias nearly disappeared." This demonstrates how important it is to include diverse populations in algorithm development, particularly for potentially life-saving treatments like warfarin.
The researchers concluded, "Our data indicates that, when dosing warfarin based on genotype, it is important to account for variants that are either common or specifically influence warfarin response in African Americans and that not doing so can lead to significant overdosing in a large portion of the African American population."
A 2024 UK government-commissioned review titled "Equity in Medical Devices: Independent Review" found that minority ethnic people, women, and people from deprived communities are at risk of poorer healthcare because of biases within medical tools and devices.
The review confirmed concerns that pulse oximeters overestimate the amount of oxygen in the blood of people with dark skin. While there was no evidence of this affecting care in the NHS, studies in the US have shown such biases leading to delayed diagnosis and treatment, worse organ function, and higher mortality in Black patients.
The UK report also highlighted concerns about AI-based medical devices:
The report noted problems with polygenic risk scores used to assess individual disease risk based on genetic factors: "Major genetic datasets that polygenic risk scores use are overwhelmingly on people of European ancestry, which means that they may not be applicable to people of other ancestries," according to Professor Enitan Carrol of the University of Liverpool.
Even attempts to correct biases can create new problems. The report highlighted how race-based corrections applied to spirometer measurements (devices used to assess lung function and diagnose respiratory conditions) have themselves been found to contain biases.
Addressing algorithmic bias in healthcare requires coordinated effort across multiple dimensions, integrating approaches from technical solutions, policy frameworks, clinical practice, and open science principles. Based on research published by the NIH, the following strategies represent the most promising paths forward:
Technical approaches to mitigating algorithmic bias must be implemented throughout the AI development lifecycle:
AI regulations in healthcare are changing and must incorporate fairness considerations:
Technical solutions alone are insufficient without clinical integration:
The NIH research article strongly advocates for open science practices to address bias:
As the NIH research article concludes: "In order for new technologies to be inclusive, they need to be accurate and representative of the needs of diverse populations." By implementing these multifaceted strategies, healthcare can work toward AI systems that serve all patients equitably, regardless of race, gender, socioeconomic status, or other characteristics.
AI systems can become biased if trained on unrepresentative or incomplete data, reflecting existing healthcare disparities.
Algorithmic bias is often less visible and harder to detect, as it is embedded in automated decision-making systems.
Social factors, like income and race, can indirectly influence AI outcomes if used as proxies in models without appropriate safeguards.
Yes, some AI tools used for mental health screening have been shown to misclassify symptoms based on cultural or linguistic differences.
By using diverse training data, conducting regular bias audits, and integrating clinician oversight into AI workflows.