2 min read

What is model poisoning?

Tshedimoso Makhene May 11, 2026

Cybersecurity

Model poisoning is a cyberattack where the threat actor manipulates a machine learning model by injecting malicious updates into the model’s training process, rather than altering the raw dataset itself.

In distributed systems like Federated Learning, attackers can compromise participating devices and submit malicious model parameters to influence the global model.

Understanding model poisoning

According to a ScienceDirect study, ‘Towards multi-party targeted model poisoning attacks against federated learning systems,’ model poisoning attacks “aim to corrupt the global model by manipulating local model updates submitted by malicious participants.”

Model poisoning is particularly relevant in federated learning, where training is distributed across multiple participants.

In this setup:

Clients train models locally
Only model updates are shared
A central server aggregates these updates

This structure enhances privacy but introduces risk. The above study emphasizes that “the server has no access to the raw data and cannot verify whether the updates are benign or malicious.” This lack of transparency creates an ideal environment for poisoning attacks.

How does model poisoning work?

Model poisoning follows a structured attack process:

Compromising participants

Attackers gain control of one or more clients, contributing to the training process.

Crafting malicious updates

Instead of performing honest training, attackers generate manipulated updates.

The above study explains that attackers can “jointly optimize malicious updates to achieve a targeted objective while maintaining stealth.”

Similarly, the article MPAF: Model Poisoning Attacks based on Fake Clients shows that “a small number of malicious clients can significantly degrade model performance.”

Injecting updates into the global model

These malicious updates are submitted and aggregated with legitimate ones.

Achieving the attack goal

The results of the attack may include:

Reduced accuracy
Targeted misclassification
Hidden backdoors

Notably, according to the study Towards multi-party targeted model poisoning attacks against federated learning systems, attackers can ensure “the poisoned model behaves normally on benign inputs but produces incorrect outputs for targeted inputs.”

Defending against model poisoning

Defending against model poisoning remains challenging, but both the above studies suggest several strategies, such as:

Anomaly detection: Monitoring model updates for suspicious patterns. However, the ScienceDirect study warns that attackers can “craft updates that resemble benign behavior to bypass detection.”
Robust aggregation methods: Using aggregation techniques that reduce the influence of outliers.
Limiting client influence: Restricting how much each participant can affect the global model. The MPAF study reinforces this by showing that “a small number of malicious clients can have a disproportionate impact on the global model.”
Client verification: Ensuring only legitimate participants contribute to training, thus preventing fake client attacks.
Continuous monitoring: Tracking performance over time to detect anomalies.

FAQS

Can model poisoning attacks go undetected?

Yes. Research shows that poisoned models can behave normally in most cases while only failing under specific conditions. This makes the attack stealthy and difficult to detect in real-world healthcare systems.

Does model poisoning violate patient privacy?

Some attacks can lead to data leakage or reconstruction of sensitive information, which raises serious concerns for patient confidentiality and compliance.