How to Threat Model an LLM Application (Without Boiling the Ocean)

You cannot defend what you have not mapped. We walk through a practical, repeatable way to threat model an LLM or agentic application, the trust boundaries, the data flows, and the questions that surface the risks that matter.

Every secure system we have ever helped a team build started, whether they called it that or not, with a threat model: a clear picture of what could go wrong and where. And almost every insecure one started without it, with controls bolted on reactively after something broke. The good news is that threat modeling an LLM application is not mystical, and it does not require a week-long workshop. It requires asking a focused set of questions about how data and trust flow through your system. We want to give you a practical, repeatable way to do it, scoped so you actually finish rather than drowning in completeness.

Start with the flows, not the threats

The mistake we see is teams starting from a giant list of attacks and trying to check each one. That boils the ocean. Start instead from your own system: draw how data and instructions move through it, from the user, through your application, into the model, through any tools and retrieval, and back out. Most LLM risks become obvious the moment you can see the flow, because they cluster at the same few places every time.

For an LLM or agentic app, the flow almost always touches: untrusted input entering the context, the model's instructions (your system prompt), retrieved or tool-provided content, the model's output, and any actions the system takes as a result. Map those, and you have mapped where the risks live.

The trust boundaries that matter

Threats concentrate at trust boundaries, the points where data crosses from less-trusted to more-trusted. In an LLM application, the boundaries you must mark are:

  • User input into the model's context. Everything the user sends is untrusted. This is the prompt-injection and jailbreak boundary.
  • External content into the context. Retrieved documents, tool outputs, web content, other agents, all untrusted, all indirect-injection vectors, and all easy to forget because they feel like "your" data.
  • Model output into downstream systems. The model's response is untrusted data; where it flows into a browser, query, shell, or another component is the improper-output-handling boundary.
  • The agent's actions into the world. Where the system can act, the boundary is what it is permitted to do, the excessive-agency and identity surface.

If you can name what crosses each boundary and what is trusted on the other side, you have found most of your exposure.

The questions that surface the real risks

For each boundary, ask:

  1. What untrusted thing crosses here, and what would an attacker want to do with it? This surfaces injection, poisoning, and manipulation.
  2. What is the worst this component could be made to do? This surfaces the impact, leakage, unauthorised action, exfiltration, and tells you where to concentrate.
  3. What is it allowed to reach? This surfaces over-scoping, the multiplier on every agentic incident.
  4. How would we even know? This surfaces the gaps in logging and monitoring, which is often where the real blind spots are.

Anchoring these against current threat catalogs, the OWASP Top 10 for LLM and Agentic Applications, MITRE ATLAS, keeps you grounded in real attacks rather than imagined ones, without making you start from the list.

Keep it living and proportionate

A threat model is not a document you finish; it is a picture you update as the system changes. Every new tool, integration, data source, or permission shifts the boundaries and deserves a revisit. And scope it to impact: spend your effort where the worst case is worst, not uniformly across every component. A proportionate model you actually maintain beats an exhaustive one you do once and abandon.

Frequently asked questions

Do we need special tools or training for this? No. The core skill is drawing your data and trust flows honestly and asking the worst-case questions at each boundary. Frameworks help you not miss categories, but the practice itself is accessible to any team willing to map their own system.

How is LLM threat modeling different from regular threat modeling? The method is the same; the boundaries are new. The novel parts are treating model input and model output as untrusted, treating retrieved content as an injection vector, and treating the model's permissions as a blast-radius control. Traditional threat modeling did not have a component that follows instructions hidden in its data.

When should we do it? As early as possible, when you can still change the design, and then continuously. Threat modeling after launch still helps, but its highest value is before you have hard-coded the trust boundaries into production.

How Promptention helps

A threat model tells you where your controls need to be; we are frequently those controls. The boundaries it surfaces, untrusted input, indirect content, risky output, over-scoped action, map directly onto what Guard and our red team address: detection at the input and retrieval boundaries, output moderation at the downstream boundary, policy enforcement at the action boundary, and monitoring for the "how would we know" gaps. And our Red Team Services are, in effect, your threat model tested by an adversary, turning "we think this is covered" into "we proved it is." Map the flows; we will help you defend the boundaries they reveal.

Promptention's detection, output moderation, policy enforcement, and red teaming map directly onto the trust boundaries an LLM threat model surfaces.