Improper Output Handling: When the LLM's Answer Is the Attack

Teams scrutinise what goes into an LLM and trust what comes out. That is backwards. Model output flowing unchecked into a browser, shell, or database is a classic injection waiting to happen. We explain the risk and the fix.

There is a blind spot we run into constantly. Teams invest real effort in defending the input to their LLM, and then take whatever the model produces and feed it straight into a browser, a shell, a database query, or another system, as if model output were inherently trustworthy. It is not. Model output is untrusted data, and when you let untrusted data flow unchecked into a system that will act on it, you have rebuilt one of the oldest vulnerability classes in software, only now the attacker's payload is laundered through your AI.

OWASP calls this Improper Output Handling, and it is one of the most preventable risks on the list, precisely because the fix is well understood in traditional security. The trap is forgetting that it applies to LLM output too.

Why model output cannot be trusted

Two reasons. First, an attacker can influence what the model produces, through the prompt, through injected content in retrieved data, through any of the channels we have written about. If they can shape the output, they can shape the payload your downstream system receives. Second, even without an attacker, models generate freely and unpredictably; a model asked to produce some HTML or a SQL fragment may produce something that breaks or subverts the system that consumes it.

So the question is never "is this output correct?" It is "what will happen when this output reaches the system that uses it?"

What the attack looks like

The pattern is always the same: the model emits content, that content is inserted into a sensitive context without validation, and the context interprets it as code or commands.

  • Into a web page: model output containing a script tag is rendered in a browser, executing as cross-site scripting in your users' sessions.
  • Into a database: model output used to build a query becomes SQL injection.
  • Into a shell or system call: model output passed to a command interpreter becomes command injection and, often, remote code execution.
  • Into another component: model output that includes markup, a URL, or a control sequence can manipulate whatever parses it next, including another part of an agent pipeline.

None of these require a clever exploit against the model. They require only that you trusted its output and your downstream system did the parsing.

Why it slips through

The honest reason this risk persists is psychological as much as technical. The output looks like an answer, a helpful, fluent, authoritative answer, so it does not trigger the suspicion that a raw user input would. Your developers would never concatenate unsanitised user input into a SQL string, but they will do exactly that with model output, because it does not feel like user input. It is, though. It is user input that has passed through a model, and the model may have been steered.

What to do about it

The controls are the familiar ones from secure software development, applied with discipline to LLM output:

  • Treat every model output as untrusted. Validate, encode, and sanitise it for the specific context that will consume it, exactly as you would raw user input.
  • Context-appropriate encoding. Output destined for HTML gets HTML-encoded; output used in a query uses parameterisation; output reaching a shell is avoided or strictly constrained.
  • Constrain the output surface. Where you can, restrict the model to structured, validated formats rather than free text that downstream systems must interpret.
  • Scan output for dangerous patterns. An independent check can flag output that contains script, markup, or command-shaped content it should not, before it reaches the system that would act on it.

Frequently asked questions

Doesn't structured output solve this? It helps, but it is not a guarantee. As we have shown elsewhere, structured-output features can be subverted by prompt injection, so the schema is not a security boundary on its own. Validate the content, not just the shape.

Is this really an AI problem or just a web-security problem? Both, and that is the point. The vulnerability class is classic; what is new is the false sense of trust that model output carries. The fix is old; remembering to apply it is the hard part.

Where should the check live? At the boundary between the model and whatever consumes its output, before the dangerous context parses it. That is the last point at which you can stop a laundered payload from becoming an executed one.

How Promptention helps

Our output moderation treats model responses as what they are, untrusted content, and inspects them before they flow downstream, flagging the script-shaped, command-shaped, and otherwise dangerous output that turns a helpful answer into an injection. Combined with the secure-handling guidance we give teams, that closes the gap between "the model answered" and "the system acted." You defend your inputs already. We help you stop trusting your outputs.

Promptention's output moderation inspects LLM responses before they reach downstream systems, aligned to OWASP LLM05: Improper Output Handling.