Letting Agents Run Code Without Letting Them Run Wild

Giving an agent the ability to execute code is one of the most powerful and dangerous capabilities you can grant. We explain why code execution is the highest-stakes agent permission and how sandboxing contains it.

Of all the capabilities you can give an AI agent, the ability to execute code is the most powerful and the most dangerous. An agent that can run code can analyse data, automate tasks, and solve problems no fixed toolset could, which is exactly why teams want it. It can also, if manipulated, do anything code can do on the system it runs on: read secrets, reach the network, modify files, escalate. The OWASP agentic guidance calls out unexpected code execution as a top risk for good reason. If you are giving an agent a code interpreter, the question is not whether it is useful, it is, but whether the code runs somewhere a mistake or a manipulation cannot hurt you.

Why code execution is the highest-stakes capability

Every other agent capability is bounded by what a specific tool can do. A search tool searches; an email tool emails. Code execution has no such boundary, it is general-purpose by definition, which means its blast radius is the whole environment it runs in. Combine that with the reality we keep returning to, that an agent can be manipulated through injected instructions in the content it processes, and the danger is clear. A prompt injection against an agent that can run code is not limited to producing a bad answer; it can produce executed attacker code. The agent becomes a remote code execution path, with the attacker's instructions arriving through ordinary-looking content.

And the code does not have to be malicious by intent to be dangerous. An agent generating and running its own code can make mistakes that damage the environment, delete the wrong thing, exhaust resources, reach somewhere it should not. Both the manipulated case and the honest-error case point to the same conclusion.

The conclusion: assume the code is hostile, and contain it

The defensive posture that follows is simple to state and essential to apply: treat any code an agent runs as potentially hostile, and run it somewhere it cannot do harm. You do not try to verify that each piece of generated code is safe before running it, because you cannot reliably do that for arbitrary code, especially code that may have been steered by an injection. Instead, you make the environment safe, so that even hostile code is contained. This is the same philosophy as the rest of agentic security, do not rely on the agent behaving, constrain what its behaviour can reach, applied to the most powerful capability of all.

What containment looks like

  • Sandbox the execution. Run agent code in an isolated environment, a container, a restricted sandbox, an ephemeral instance, separated from your real systems, so that whatever the code does stays inside the box.
  • Least privilege for the sandbox. The execution environment should have minimal access: no production credentials, no sensitive filesystem, no unnecessary network. If the code cannot reach something, it cannot harm it.
  • Constrain the network. Restrict outbound connectivity from the sandbox, so executed code cannot exfiltrate data or call out to an attacker. Egress control here cuts the same leg of the trifecta we discuss for agents generally.
  • Bound the resources. Limit CPU, memory, time, and disk so runaway or malicious code cannot exhaust the host, the resource-exhaustion concern applied to execution.
  • Make it ephemeral. Tear the environment down after use so nothing persists between runs and a compromise cannot establish a foothold.
  • Catch the injection upstream too. Containment is the backstop; detecting the manipulation that would steer the code in the first place reduces how often the backstop is tested.

Frequently asked questions

Can't we just check the generated code before running it? Reliably determining that arbitrary code is safe is not something you can do in general, especially when the code may have been influenced by an injection. Static review helps at the margins, but the robust control is to assume the code could be hostile and run it where that does not matter, rather than betting on catching every dangerous case.

Isn't a sandbox a lot of overhead? It is real engineering, but it is the price of safely granting the most powerful capability an agent can have. The alternative, running agent-generated code with real access, is accepting that any manipulation or mistake becomes a system compromise. The overhead buys you containment of your highest-stakes risk.

How does this relate to AI-generated code in development? They are cousins. There we worry about insecure or backdoored code entering your codebase; here we worry about an agent executing code live at runtime. Both reflect the same rule, treat code from a model as untrusted, applied at different points in the lifecycle.

How Promptention helps

Sandboxing the execution environment is infrastructure you own, and we will not pretend a detection layer replaces it, containment of the code is yours to build, and it is essential. Where we help is reducing how often that containment gets tested and giving you visibility when it is. Our detection evaluates the untrusted content reaching a code-running agent for the injections that would steer its code toward harm, and our policy and monitoring controls help constrain and observe what the agent does. Assume the code is hostile and box it in; let us help make sure the instruction that would make it hostile rarely gets through in the first place.

Promptention's injection detection and policy controls complement the sandboxing that safely contains agent code execution, aligned to OWASP ASI05.