Artificial intelligence (AI) is rapidly transforming the healthcare sector. From accelerating drug discovery to automating clinical workflows, its applications are wide-ranging and growing. According to a 2023 study titled Revolutionizing Healthcare: The Role of Artificial Intelligence in Clinical Practice, “Integrating AI into healthcare holds excellent potential for improving disease diagnosis, treatment selection, and clinical laboratory testing.” The study further notes that “AI tools can leverage large datasets and identify patterns to surpass human performance in several healthcare aspects. AI offers increased accuracy, reduced costs, and time savings while minimizing human errors. It can revolutionize personalized medicine, optimize medication dosages, enhance population health management, establish guidelines, provide virtual health assistants, support mental health care, improve patient education, and influence patient-physician trust.”
Among the most talked-about AI tools is ChatGPT, a large language model (LLM) developed by OpenAI. Its ability to understand and generate human-like responses has sparked interest in using it for tasks ranging from drafting patient education materials and clinical summaries to triaging symptoms and supporting mental health queries. As a conversational agent, ChatGPT offers ease of integration, rapid response times, and the capacity to process and summarize vast amounts of information.
However, despite these promises, integrating ChatGPT into clinical and administrative workflows also introduces some risks. ChatGPT is not explicitly trained on medical guidelines or patient safety protocols, which means its responses may lack accuracy, transparency, or relevance in high-stakes medical scenarios. As healthcare organizations race to explore the potential of generative AI, they must also pause to examine its limitations.
Can ChatGPT replace human staff?
According to the study Opportunities and risks of ChatGPT in medicine, science, and academic publishing: a modern Promethean dilemma, ChatGPT demonstrates impressive capabilities in understanding and generating responses to complex medical questions. In fact, the model performed at or near the passing threshold for the United States Medical Licensing Examination (USMLE) without any specialized training. This suggests a potential for ChatGPT to support healthcare education and decision-making. However, the study also cautions that while ChatGPT is a promising tool, it is not equipped to replace trained medical professionals. The researchers note that although ChatGPT can provide accurate responses in many instances, it still lacks clinical judgment, accountability, and the nuanced understanding required in real-world patient care. Therefore, ChatGPT should be viewed as a supplement to, rather than a replacement for, human expertise. Its integration into healthcare settings should be carefully managed, ensuring that AI enhances rather than undermines the roles of qualified healthcare workers.
See also: HIPAA Compliant Email: The Definitive Guide (2025 Update)
Hidden risks of deploying ChatGPT in healthcare
Hallucinations and inaccurate outputs
One of the biggest challenges with using ChatGPT in healthcare is its tendency to hallucinate, generating answers that sound correct but are false or misleading. According to an article by MIT, When AI Gets It Wrong: Addressing AI Hallucinations and Bias, these hallucinations can result from the model’s reliance on statistical patterns rather than verified knowledge. In a clinical context, this becomes particularly dangerous, as inaccurate diagnoses, misleading treatment options, or fabricated medical facts could lead to real-world harm. Since ChatGPT does not cite sources by default or understand the veracity of the information it generates, clinicians must rigorously fact-check AI outputs before use. Without safeguards, overreliance on such tools may compromise patient safety.
Automation bias and over‑trust in AI
A major concern with using ChatGPT in healthcare is automation bias, the tendency to trust AI outputs too readily, even when they are inaccurate. In a recent study, it was noted that some people over trust AI-generated medical responses and view them to be as valid as doctors. Despite low accuracy, the participants rated both high- and low-accuracy AI-generated medical responses as equally valid and trustworthy as those written by doctors. Alarmingly, they said they would follow the AI advice, even when it was wrong.
This over-trust is driven by the humanlike tone of AI, which can mask factual inaccuracies. In clinical settings, this could lead to healthcare workers accepting flawed suggestions without proper verification, resulting in errors in judgment or care.
Lack of transparency and explainability (“Black Box”)
Another challenge with deploying ChatGPT in healthcare is its black box nature: even developers cannot fully explain how it arrives at certain outputs. “A black box AI is an AI system whose internal workings are a mystery to its users. Users can see the system’s inputs and outputs, but they can’t see what happens within the AI tool to produce those outputs,” writes Matthew Koskini, enterprise technology writer at IBM. This lack of clarity makes it hard for healthcare professionals to trust or rely on ChatGPT. If they can’t see how it came up with a diagnosis or treatment, they can’t check if it’s correct or spot any mistakes or biases. This makes it difficult to explain or question the AI’s advice when speaking to patients or regulators.
Data privacy, security, and compliance risks
Integrating ChatGPT into healthcare brings privacy, security, and compliance risks across its entire lifecycle, from data collection to deployment. The study Generative AI in Medical Practice: In-Depth Exploration of Privacy and Security Challenges outlines how sensitive health data, even when de-identified, can be re-identified or leaked through model vulnerabilities like inference or inversion attacks. Risks also emerge when AI-generated content is handled outside HIPAA compliant environments, especially with third-party cloud tools. To manage these threats, experts recommend privacy-enhancing technologies like federated learning and differential privacy, along with strong encryption, access controls, and regular risk assessments. Healthcare organizations must pair AI innovation with strict data governance to maintain compliance and patient trust.
Bias and inequity in recommendations
AI models carry the biases of their training data, leading to disparities in care. As the study Benefits and Risks of AI in Health Care: Narrative Review states, “ML systems in health care can be prone to algorithmic bias, leading to predictions based on noncausal factors like gender or ethnicity. Prejudice and inequality are among the risks associated with health care AI. Biases present in the data used to train AI systems can result in inaccurate outcomes, especially if certain races or genders are underrepresented. Unrepresentative data can further perpetuate health inequities and lead to risk underestimation or overestimation in specific patient populations.”
Autonomous AI risks
Highly automated or closed‑loop healthcare AI systems pose unique safety challenges. As Niusha Shafiabady, associate professor of computational intelligence at Australian Catholic University, warns, “Autonomous AI working without oversight presents real risks in the healthcare space. That’s the reason the Australian Government introduced specific guardrails for the use of this technology in health settings.”
Regulatory risks
ChatGPT is not HIPAA compliant, which poses a risk for healthcare use. While OpenAI does sign a business associate agreement (BAA), the public version of ChatGPT is not HIPAA compliant. The way ChatGPT handles data lacks transparency, raising concerns about the security of protected health information (PHI). Without clear safeguards or legal accountability, using ChatGPT in clinical settings where PHI is involved could lead to regulatory violations and ethical issues.
Mitigation strategies for building a safer AI future in healthcare
Creating a safer AI future in healthcare requires thoughtful planning, ethical grounding, and structured governance. The study Artificial Intelligence in Healthcare: How to Develop and Implement Safe, Ethical and Trustworthy AI Systems offers a rich framework, grounded in EU and US regulatory landscapes, to guide developers, implementers, and decision-makers throughout the AI lifecycle. Their approach is centered on structured questionnaires designed for pre-market and post-market evaluation of AI tools, systematically addressing gaps around liability, transparency, and ethical accountability.
Key recommended strategies include:
-
Lifecycle‑oriented governance tools: Jenko et al. proposed tailored questionnaires that ensure each stakeholder, from engineers to clinicians, considers regulatory, ethical, and clinical safety dimensions at every stage: development, deployment, and ongoing monitoring.
-
Transparency and accountability metrics: These instruments align AI design with regulatory standards, helping to operationalize domains like explainability, auditability, consent, and legal liability within both EU AI Act and FDA SaMD frameworks.
Complementing this structured framework is the study Generative Artificial Intelligence in Healthcare: Applications, Implementation Challenges, and Future Directions, which examines practical techniques to reduce model errors (such as hallucinations) and reinforce factual accuracy in clinical use cases. Their mitigation strategies emphasize:
-
Retrieval-Augmented Generation (RAG): Merging generative models with verified medical knowledge bases or EHRs so responses are explicitly grounded in trusted sources.
-
Domain-specific fine-tuning: Training models on curated, healthcare‑focused datasets to reinforce accuracy while limiting drift.
-
Prompt engineering, RLHF, and post‑processing filters: These reduce hallucination risk and improve output relevance in critical clinical contexts.
Together, these studies suggest a multi-layered safety approach:
-
Governance and compliance tools ensure ethical and regulatory readiness at all stages.
-
Technical safeguards such as RAG, domain-aware training, and human-in-the-loop validation help prevent misinformation and errors.
Adopting this combined approach, distributed across development, validation, and operational deployment, lays the groundwork for AI systems like ChatGPT to enhance care safely, ethically, and in full alignment with regulatory expectations.
Read also: How AI promises a healthier future
FAQS
Are there ways to verify the accuracy of ChatGPT’s medical responses?
Yes. Responses can be checked against clinical knowledge bases. Furthermore, human experts should verify critical outputs, especially those affecting patient care.
Can ChatGPT be used for administrative tasks in hospitals?
Yes, ChatGPT can support tasks like summarizing documentation, drafting discharge notes, or answering basic operational questions. However, outputs must be validated by human staff to avoid inaccuracies or regulatory breaches.
How can patients be informed about AI usage in their care?
Organizations should disclose when AI tools like ChatGPT are being used, explain their limitations, and obtain informed consent where necessary, especially when AI is used to generate patient-facing content.
Subscribe to Paubox Weekly
Every Friday we'll bring you the most important news from Paubox. Our aim is to make you smarter, faster.
