AI now writes a large share of new code, and most of it ships with less scrutiny than a copied Stack Overflow snippet. Generated code is untrusted input. Here is the threat model for vibe-coded software and how to keep velocity without inheriting a backdoor.
AI assistants now write a meaningful fraction of the code entering production. That is a genuine productivity win, and it is also a quiet shift in your software supply chain that most security programs have not accounted for. The mental model that keeps you safe is blunt: code an AI generates is untrusted input until you have reviewed it, exactly like code from any third party. The fact that it appeared in your editor, in your style, on demand, does not change its provenance. It came from a model trained on the open internet and steered by whatever was in its context.
Three distinct risks, often conflated
"AI code might be insecure" is true but too vague to act on. There are three separate problems, and they need different controls.
1. Honest mistakes. A model can produce code that is functional and subtly insecure, missing input validation, using a weak default, mishandling errors, introducing an injection or a race. This is the most common case and the most familiar: it is the same class of bug a human writes, just generated faster and with more confidence. The danger is that the code looks authoritative, so it gets less scrutiny than a junior developer's pull request would.
2. Insecure dependencies and hallucinated packages. Generated code pulls in libraries. Sometimes those libraries are outdated or vulnerable. Sometimes the model invents a package name that does not exist, and an attacker, anticipating exactly this, registers that name with a malicious payload, a tactic that turns a model's hallucination into a supply-chain attack. Either way, the import line is a risk the model introduced and your build will execute.
3. Provider-side manipulation. This is the one teams underestimate. If you generate code through an untrusted or compromised model service, the provider controls the output. Through a hidden system prompt or fine-tuning, a provider can ship code that works correctly while quietly doing something else, a silent network call that exfiltrates credentials, a backdoor that triggers under specific conditions, all dressed in comments describing the function as secure. We demonstrated this directly: functional authentication code that also shipped a hidden request to an attacker-controlled endpoint, generated on request, with no sign anything was wrong. The compromise can sit in production indefinitely, because the code does its visible job perfectly.
Why the usual reflexes are not enough
The instinct is "we have code review and a scanner, we are fine." Two gaps remain.
Review attention is miscalibrated for generated code. People scrutinise unfamiliar human code and skim confident-looking machine code. The exact lines most likely to hide a problem, a plausible network call, an odd dependency, are the ones that read as boilerplate and get waved through. The bias runs the wrong way.
And output-side scanning can miss intent. A hidden exfiltration call is syntactically ordinary; it is a post to a URL, which is something legitimate code does constantly. Catching it means looking for unexpected network destinations, obfuscated logic, and hardcoded endpoints specifically in the context of "this was machine-generated and I did not ask for a network call here," not just running a generic linter.
A practical posture
You do not solve this by banning AI assistants; you solve it by treating their output with appropriate suspicion and putting that suspicion in the pipeline rather than in people's good intentions.
- Default to skepticism on anything with side effects. Network calls, filesystem access, subprocess execution, and credential handling in generated code get reviewed as if a stranger wrote them, because one did.
- Pin and verify dependencies. Check that every imported package actually exists, is the one you meant, and is a maintained, reputable version. Treat a never-before-seen package name in generated code as a red flag, not a convenience.
- Scan generated code for unexpected behavior, not just style. Look specifically for outbound calls, hardcoded endpoints, and obfuscation that the task did not call for.
- Know your model's provenance. Generating production code through an unvetted service is functionally running an executable from an untrusted source. Use audited, reputable models and understand the provider's controls.
- Put the check before the merge. The leverage is at the boundary, when generated code tries to enter the repository, not three releases later in an incident review.
Frequently asked questions
Is AI-generated code more dangerous than human-written code? Not inherently. It is generated faster, at larger volume, and with a confident presentation that suppresses scrutiny, so insecure code reaches production more easily. The risk is less about the model's competence and more about the review attention its output fails to attract.
What about hallucinated packages, "slopsquatting"? It is real. Models sometimes invent plausible package names, and attackers register those names with malicious code, anticipating that someone will run the generated import. Always verify that a dependency actually exists and is the intended one before it enters your build.
Can I trust code from a major, reputable model provider? Reputable provenance lowers the provider-manipulation risk substantially, which is exactly why provenance matters. It does not remove the honest-mistake and dependency risks, so review and scanning still apply regardless of source.
Where Promptention fits
This is the supply-chain edge of everything we do. Our output-moderation and red-teaming capabilities are built to treat LLM output as untrusted, scanning generated code for the embedded network calls, hidden endpoints, and obfuscated logic that distinguish a quiet backdoor from an honest function, before it reaches your codebase. The model will keep writing convincing code. Our job is to make sure "convincing" and "safe" are verified separately.
Promptention's output moderation and red teaming help teams catch insecure patterns and embedded threats in AI-generated code before it ships.
Further reading: Promptention, "Sabotage via Hidden API Exfiltration"; OWASP Top 10 for LLM Applications (Improper Output Handling, Supply Chain).
