People treat a downloaded model the way they treat a PDF. It is closer to a downloaded executable. Here is the mental model that fixes that, plus the eight ways a model file can turn on you.

Ask an engineer whether they would run a random .exe from a stranger and they will laugh at you. Ask the same engineer whether they pulled a model off a public hub this week and ran it on a machine with production credentials, and the answer is usually yes, without a second thought.

That gap is the whole problem.

A model file feels like data. It has the shape of data: a big blob of numbers, the learned weights, the thing that makes the model good at its job. But most of the formats we use to ship those weights were never designed to be safe to open. They were designed to be convenient. Convenience, in serialization formats, almost always means "we will let the file rebuild arbitrary objects for you," and arbitrary object reconstruction is one short step from arbitrary code.

So when you call torch.load, or keras.models.load_model, or hand a .gguf to a local runtime, you are not reading a file. You are asking your machine to follow whatever instructions the file's author left inside it. If the author was honest, those instructions rebuild a neural network. If they were not, the instructions can do anything your process can do: read your SSH keys, open a shell, write to disk, call out to a server you have never heard of.

Why this keeps catching good teams

Three things make this risk unusually sticky.

The first is that the malicious version works perfectly. A trojanised model still answers questions, still classifies images, still passes your eval suite. There is no crash, no garbled output, nothing for a human to notice. The payload rides along beside the real weights and fires on load or on inference, every time, quietly.

The second is that your existing tools are blind to it. Antivirus and endpoint agents are trained on executables and documents. They do not parse pickle opcodes or protobuf graphs. A malicious model is, to them, an opaque binary blob that happens to be large. It sails through.

The third is reach. One poisoned model on a popular hub is not one victim. It is every team that pulls it, every CI job that caches it, every downstream model fine-tuned on top of it. Supply-chain attacks are attractive precisely because the blast radius is enormous and the effort is small.

The taxonomy we use

When we started building Model Scan, we needed a way to talk about these risks that was precise, that mapped cleanly to real attacker behaviour, and that did not borrow some other vendor's marketing labels. We organise model-file risk by the outcome the attacker is after, not by the file format. The format is just the delivery vehicle. The outcome is what hurts you.

There are eight classes.

Load-Time Execution. Code that runs the moment the file is deserialised, before you have done anything with the model at all. This is the pickle and PyTorch family, and it is the most common and the most dangerous. Underlying weakness: CWE-502, deserialization of untrusted data.

Inference-Time Logic. Executable logic welded into the model graph itself, so it fires when you run a prediction rather than when you load the file. Keras Lambda layers and a handful of standard TensorFlow operators live here. Underlying weakness: CWE-94, code injection.

Conditional Output Tampering. No code execution at all. Instead the model is built to return a manipulated answer when it sees a secret trigger input, and to behave normally the rest of the time. This is the hardest class to reason about, and the honest truth is that part of it is unsolved for everyone. Underlying weakness: CWE-1039, behaviour manipulation of a trained model.

Metadata Template Injection. Model files carry metadata, and some deployment tools render that metadata through template engines. An attacker who controls a metadata field can reach server-side template execution in the app around the model. Underlying weakness: CWE-1336, server-side template injection.

Outbound Beaconing. A model file has no legitimate reason to contain a web address. When one does, the model is built to talk to a server: to exfiltrate something, to receive instructions, or simply to confirm a successful compromise. Underlying weakness: CWE-913, improper control of a code or data channel.

Path Escape. Some formats let a file reference other files by path. Abuse that, and the file can read or write outside the directory you thought you unpacked it in, including over the top of files you care about. Underlying weakness: CWE-22, path traversal.

Resource Exhaustion. A small file engineered to expand into an enormous one, or to send a parser into a loop, taking the host down. Underlying weakness: CWE-400, uncontrolled resource consumption.

Deferred Trust. Configuration that does not attack you directly but instructs the loading library to go fetch and run code later, on the author's behalf. Legitimate when you trust the author. A loaded gun when you do not. Underlying weakness: CWE-829, inclusion of functionality from an untrusted source.

Cutting across all eight is a ninth concern we treat separately: concealment. Attackers do not just hide a payload, they shape the file so a scanner misreads it while your loader runs it anyway. We give that its own post, because resilience to evasion is the difference between a scanner that looks good in a demo and one that holds up against someone who is actually trying.

The one rule that matters

If you take a single thing from this series, take this: a clean scan is not a safety certificate. It means no known threat was found with enough confidence to report. Novel techniques exist, and weight-space backdoors remain genuinely hard. Scanning raises the floor. It does not remove the need to know where your models come from.

What scanning does do, and what the rest of this series is about, is turn an invisible risk into a visible one. You cannot make a decision about a threat you cannot see. Each of the next eight posts takes one class, shows you exactly how the attack works, and explains why catching it cleanly, without drowning you in false alarms, is harder than it looks.

A model file is not a document

Table of Contents

Why this keeps catching good teams

The taxonomy we use

The one rule that matters

A model file is not a document

Table of Contents

Share this article

Why this keeps catching good teams

The taxonomy we use

The one rule that matters

Share this article

Keep reading

Lockdown Mode Is a Retreat, Not a Solution

How to Threat Model an LLM Application (Without Boiling the Ocean)

Incident Response for AI: What to Do When the Model Is the Problem