Pickle is the most abused format in machine learning, and the attack is almost insultingly simple. Here is how it works, why renaming files defeats most scanners, and why the false-positive problem is the real engineering challenge.
This is the one to understand first, because it is the most common, the easiest to pull off, and the one most likely to be sitting in a model you already use.
Python's pickle format is how an enormous slice of the ecosystem saves objects to disk.
PyTorch checkpoints are pickle. Most scikit-learn and joblib artifacts are pickle. A lot of
.bin files you have downloaded are pickle wearing a different extension. And pickle has a
property that makes it lovely for developers and a gift for attackers: when you load a
pickle, it does not just read data, it rebuilds objects by executing a small program
encoded in the file.
That program is a sequence of opcodes. Two of them are all an attacker needs. One opcode
loads a callable by name, any callable, including os.system or builtins.eval. Another
opcode calls it. Put those together and you can express "run this shell command" as a
perfectly valid pickle.
The malicious payload really is about this short:
class Exploit:
def __reduce__(self):
return (os.system, ("id",))
payload = pickle.dumps(Exploit(), protocol=2)
# Whoever loads this file runs os.system on their machine.
Save that as pytorch_model.bin, push it to a hub, write a convincing README, and every
torch.load that touches it executes your command. The model does not even need to work.
But a competent attacker will make it work, so nothing looks wrong.
Where it actually fires
People assume the danger is at inference. For this class it is earlier than that. The code runs during the load call itself, while the object graph is being reconstructed. By the time you have a model object in hand, the payload has already executed. There is no "I will inspect it before I run it" window. Loading is running.
Why a file extension is not a defence
Here is the part that catches scanners, not just users.
Plenty of tools decide how to inspect a file based on its name. A .pkl gets the pickle
treatment, a .npy is assumed to be a NumPy array, a .json is assumed to be text. The
loaders in the wild are not nearly so principled. torch.load and pickle.load will
happily execute a pickle stream regardless of what the file is called. So an attacker
renames the malicious pickle to something a scanner will wave through, the scanner never
opcode-checks it, and the victim's loader runs it anyway.
This is not hypothetical. Three bypasses disclosed against a widely used open-source pickle scanner in 2025 (CVE-2025-10155, CVE-2025-10156, CVE-2025-10157) all live in this neighbourhood: a file that one tool reads as safe and another tool executes as code. The lesson is blunt. If you scan by filename, you are scanning the wrong thing. Identify the real format from the bytes and scan that.
The thing nobody warns you about: the false positives
Detecting "this pickle calls a function" is easy. Detecting "this pickle calls a function that matters" without setting your hair on fire over every legitimate model is the actual engineering problem, and it is where most scanners quietly fail.
The reason is that legitimate models are full of function calls that look alarming if you
squint. A normal PyTorch checkpoint reconstructs storage objects, rebuilds NumPy dtypes,
restores ordered dictionaries, and routinely uses getattr to wire methods back onto
objects. A scanner that flags "a callable was invoked" will fire on essentially every real
model in existence. Teams who deploy a scanner like that turn it off within a week, because
an alarm that goes off constantly is the same as no alarm at all.
We take the opposite stance from the start. The default posture is that recognised, reconstruction-shaped behaviour from the normal machine-learning ecosystem is fine, and the burden is on a call to look genuinely dangerous before it earns a finding. That means distinguishing a real call from an object reconstruction that merely resembles one, and it means caring about what is being passed to a call, not just that a call happened. The internals of how we draw those lines are ours to keep. The principle is not a secret: treat the benign world as known, and make malice prove itself.
That stance is why benign models from public hubs come back clean, and it is why a renamed pickle with a shell command in it does not.
What to do about it
Prefer formats that cannot execute on load. Safetensors exists precisely so that loading a model is just reading numbers. Where you can move a pipeline to it, do.
Where you cannot, scan before you load, and scan by content rather than by filename. Treat a checkpoint from an unfamiliar author the way you would treat a binary from one: as untrusted until something tells you otherwise.
And keep the mental model from the first post in mind. The clean result is the floor, not the ceiling. It tells you the known tricks are absent. It does not tell you the author is your friend.

