Attackers do not defeat model scanners by beating their detectors. They make the scanner and the loader disagree about what the file even is. Here is how the broken-pickle technique works, and why reading the file the way the loader does is the only real fix.

In our model-file security series we made a claim that sounds abstract until you see it in the wild: the cleverest attacks on a model scanner do not defeat its detectors, they make sure the detector never looks at the real payload. The broken-pickle family of bypasses is the cleanest real-world example, and it is worth a post of its own, because it has become one of the most common ways malicious models reach unsuspecting teams.

The setup, briefly

Python's pickle format saves objects by encoding a small program of opcodes that runs when the file is loaded. Two opcodes are enough to load a callable like os.system and invoke it, which means a malicious pickle can express "run this command" as a perfectly valid file. PyTorch checkpoints and a large share of .bin artifacts are pickle under the hood. We covered the mechanics in depth in the load-time execution post. The short version: loading a pickle is running it.

Scanners exist to catch this. The bypass exists to make sure they cannot.

The trick: make two programs disagree

A model file is opened by at least two different programs. There is the scanner, which inspects it and decides whether it is safe. There is the loader, the actual machine-learning runtime, which opens it for real and executes whatever is inside. The attack lives in the gap between them. If you can build a file the scanner reads one way and the loader reads another, the scanner can hand out a clean verdict while the loader runs the payload. You never had to beat the detector. You only had to make sure it was looking at a different file than the one that runs.

The broken-pickle technique does this by deliberately damaging the file's container so that a strict reader gives up while a forgiving reader proceeds.

Many model archives are ZIP files with integrity metadata. A scanner built on a strict archive library will refuse to open an archive whose structure or checksum does not validate, and may simply skip it as corrupt, recording no threat because it never read the contents. The loaders, in many real cases, are far more forgiving. They will happily extract and execute a technically malformed archive. So the attacker breaks the container just enough: the scanner sees garbage and shrugs, the loader sees a model and runs the embedded code.

A related variant swaps the compression method for one the scanner's tooling does not handle while the loader does, with the same outcome. And the oldest variant of all is simply renaming the file, because loaders identify formats by content but a surprising number of scanners still decide how to inspect a file from its extension. Rename a malicious pickle to something innocuous and an extension-driven scanner never opcode-checks it.

These are not theoretical. A set of publicly tracked vulnerabilities in 2025, CVE-2025-10155, CVE-2025-10156, and CVE-2025-10157, were all variations on exactly this theme against a widely used open-source pickle scanner: files that one tool reads as safe and another executes as code.

Why this is the real test of a scanner

It is easy to build a scanner that catches an obvious malicious pickle in a lab. It is hard to build one that an adversary cannot simply route around. The difference is whether the scanner was designed with someone actively trying to evade it in mind. A scanner that trusts file extensions, that refuses to read anything slightly malformed, or that skips formats it considers "safe by reputation," is not a security tool against this class. It is a formality, and formalities do not survive contact with someone who is trying.

The defensive principle is short, and it is the spine of how we built Model Scan:

Read what the loader reads. Identify the real format from the bytes, the headers and magic numbers, not the filename. Read as forgivingly as the loader does, so a file that only a permissive parser can open does not get a free pass by looking broken. And treat a mismatch, a file that one reader calls corrupt and another runs, as a finding in its own right, because an honest model is not built to be readable by exactly one of the two programs that will open it.

What to do about it

If you consume models, do not rely on a scanner that decides what to inspect from the extension, and do not assume a file that "failed to scan" is therefore safe, a skipped file is a blind spot, not a clean result. Prefer formats that cannot execute on load, like safetensors, wherever your pipeline allows it. And keep the provenance discipline: a clean scan is a floor, not a certificate.

If you operate a platform that ingests third-party models, treat every uploaded artifact as untrusted input and scan by content before anything downstream extracts or loads it. The damage in this class happens at load time, before any model logic runs, so the scan has to win before the loader gets the file.

Frequently asked questions

If a scanner says "could not scan," is the file safe? No. That is often the attack working. A file engineered to break a strict reader will produce exactly that result while remaining fully loadable. Coverage, the share of files a scanner actually returns a verdict on, is itself a security metric.

Does using safetensors eliminate the risk? For the load-time execution risk, largely yes, because safetensors is designed so loading is just reading numbers. But not every pipeline can move entirely off pickle-based formats, and other model-file risks remain, so scanning by content stays necessary.

Is renaming really still effective? Against scanners that key off file extensions, yes. Loaders never cared about the extension; they read the bytes. Any scanner that does not do the same inherits the gap.

Where Promptention fits

Model Scan was built around reading files the way loaders do, not the way filenames suggest, precisely so that the broken-pickle family does not work against it. The technique relies on a scanner being more fragile, or more trusting, than the runtime it is protecting. Closing that gap is the entire job.

Promptention's Model Scan identifies model-file formats by content and is built to resist scanner-evasion techniques, including broken-container and renamed-pickle bypasses.

Further reading: Promptention, "Load-Time Execution" and "Concealment and Scanner Evasion"; CVE-2025-10155 / 10156 / 10157 (NVD).

Broken on Purpose: How Malicious Models Slip Past Pickle Scanners

Table of Contents

The setup, briefly

The trick: make two programs disagree

Why this is the real test of a scanner

What to do about it

Frequently asked questions

Where Promptention fits

Broken on Purpose: How Malicious Models Slip Past Pickle Scanners

Table of Contents

Share this article

The setup, briefly

The trick: make two programs disagree

Why this is the real test of a scanner

What to do about it

Frequently asked questions

Where Promptention fits

Share this article

Keep reading

Lockdown Mode Is a Retreat, Not a Solution

How to Threat Model an LLM Application (Without Boiling the Ocean)

Incident Response for AI: What to Do When the Model Is the Problem