Teams scrutinise the prompts they send a model far more than the model itself. We give you a practical checklist for vetting a model before it reaches production, provenance, the file, the behaviour, and the supply chain around it.
There is a strange asymmetry in how most teams handle AI risk. They will obsess over the prompts they send a model, building guardrails, tuning instructions, reviewing every input, and then pull that model off a public hub and run it in production with credentials, having scrutinised it less than they would a random library from a package registry. We have written a whole series on why a model file deserves the same suspicion as a downloaded executable. This post is the practical companion: a checklist for vetting a model before it reaches your production environment, so the thing your guardrails are protecting is not itself the threat.
1. Provenance: where did this actually come from?
Start here, because it governs everything else. Who published this model? Is the source reputable and identifiable, or anonymous? Has the model been altered since it was published? A model from an established, accountable source carries far less risk than one from an unknown account, and "popular on a hub" is not the same as "vetted." The single most important question is the one teams skip: do you actually know where this came from, and do you trust that origin? A clean technical scan does not answer this; provenance does.
2. The file: scan it like the executable it is
A model file can carry real attack capability, code that runs the moment you load it, logic that fires at inference, paths that escape their directory, content that phones home. Before a model touches your environment:
- Scan it for the model-file risks, load-time execution, inference-time logic, beaconing, path escape, and the rest, by content, not by file extension, because attackers rename files and break containers specifically to fool extension-based and fragile scanners.
- Prefer formats that cannot execute on load where you can, so loading is just reading numbers.
- Treat a "could not scan" result as a red flag, not a pass. A file engineered to break a scanner while remaining loadable will produce exactly that, and a skipped file is a blind spot, not a clean bill of health.
3. Behaviour: what does it do, and what was it trained on?
The file can be clean and the model still problematic. A model can carry a behavioural backdoor that activates on a secret trigger, or biases and failure modes from its training data, none of which show up in a file scan. Vet behaviour too:
- Understand the training-data provenance as far as you can, especially if you will build on or fine-tune the model. Poisoned training data is a risk you inherit.
- Red-team the model for the failure modes that matter to your use case, including adversarially induced ones, while accepting that a behavioural backdoor with no structural tell cannot be guaranteed absent by any inspection. That residual risk is precisely why provenance matters.
4. The supply chain around it
A model rarely arrives alone. Vet what comes with it:
- The loading configuration. Does it ask for remote code execution, the "trust the remote code" style flag that hands the author execution on your machine at load time? If so, that is a deliberate trust decision that deserves the same review as any third-party code, not a toggle flipped to silence an error.
- Dependencies and tooling. The libraries and tools around the model are part of its attack surface.
- Plugins and extensions. Anything that extends the model's reach extends your exposure.
The checklist, condensed
| Check | The question to answer |
|---|---|
| Provenance | Do I know and trust where this came from? |
| File safety | Has it been scanned by content for executable and load-time risks? |
| "Could not scan" | Am I treating a non-result as a red flag, not a pass? |
| Format | Can I use a non-executable format instead? |
| Training data | Do I understand what it learned from, especially if I fine-tune? |
| Behaviour | Have I red-teamed it for my use case's failure modes? |
| Remote code | Does loading it ask to run the author's code, and did I consent deliberately? |
| Supply chain | Have I vetted its dependencies, config, and extensions? |
Frequently asked questions
Isn't a model from a major hub safe by default? No. Public availability is not security review, and popular models have been found to carry malicious payloads. A reputable, identifiable publisher lowers risk; "lots of downloads" does not. Vet the model, not its popularity.
If I scan the file and it's clean, am I done? The file being clean is the floor, not the certificate. A clean scan means known file-level threats are absent; it does not rule out a behavioural backdoor or a poisoned training history. That is why provenance and behavioural testing sit alongside file scanning, not behind it.
This sounds like a lot of work for every model. It is proportionate to what you are doing: running someone else's code with your access. The checklist scales, the heavy steps matter most for models from unfamiliar sources or for anything you will fine-tune or grant real permissions. The cost of skipping it is a compromise you cannot see.
How Promptention helps
This checklist is where our two worlds meet. Our Model Scan identifies model-file risks by content rather than by filename and is built to resist the evasion tricks that fool fragile scanners, so the file half of your vetting is covered properly, including treating skipped files as the red flag they are. Our red teaming probes a model's behaviour for the failure modes and triggers that a file scan cannot reach. And the principle we return to throughout, a clean scan is a floor, not a certificate, is exactly the discipline this checklist encodes: scan thoroughly, test behaviour, and never stop caring where your models come from.
Promptention's Model Scan and red teaming cover the file-level and behavioural halves of vetting a model before production.

