Memory Poisoning: The Backdoor That Lives in Your Agent's Context

An agent that remembers is an agent that can be taught something false. Memory poisoning plants a belief now and collects on it later, after the live injection is long gone. Here is how persistent context attacks work and why memory writes are a security boundary.

Most prompt-injection defenses assume the attack and the damage happen in the same breath: malicious text arrives, the model misbehaves, you catch it in the moment. Memory poisoning breaks that assumption, and that is what makes it dangerous. The attacker plants something in the agent's memory today and collects on it days or weeks later, long after the original injection has scrolled out of view. By the time the harmful action happens, there is no live attack to detect. There is only the agent acting on something it "knows."

This attack class shows up in the OWASP Top 10 for Agentic Applications as Memory and Context Poisoning, and it deserves its own attention because it defeats the instinct to look only at the current request.

Why agents have memory, and why that is the vulnerability

Stateless agents are limited. To be useful across a long task or many sessions, an agent keeps memory: conversation history, summaries of past interactions, retrieved facts, learned preferences, notes it wrote to itself. This memory is read back into the context on later turns and treated as trusted, because it is supposed to be the agent's own accumulated knowledge.

That trust is the hole. If an attacker can write into that memory, they are not injecting an instruction the agent will weigh against the user's intent. They are inserting a fact the agent will later accept as its own. The model has no way to tell a genuine memory from a planted one. They arrive through the same channel and carry the same authority.

How the attack runs

The shape is consistent. First, the attacker gets poisoned content into something the agent will commit to memory, a message, a document it summarises and stores, a record in a data source it draws on. The content is crafted to be remembered: a false policy, a fake relationship between systems, a fabricated instruction framed as an established rule. Second, the agent stores it, often as a tidy summary that strips away whatever context might have made it look suspicious. Third, on a later turn, the agent retrieves that memory and acts on it as settled knowledge.

A poisoned memory might tell an agent that a particular external domain is an approved internal service, so that weeks later it sends data there without hesitation. It might install a false belief about a security policy, so the agent stops applying a check it used to apply. It might fabricate a vendor relationship that authorises an action no one ever approved. The damage is downstream and delayed, which is exactly why it is hard to attribute.

Why this is harder to catch than live injection

A live injection is at least present at the scene. You can scan the incoming request, evaluate the instruction, and block it before it acts. A poisoned memory is different on two counts.

The malice is time-shifted. The harmful action and the original injection are separated by an arbitrary gap, so a control that only inspects the current turn sees an agent calmly acting on its own memory and finds nothing wrong.

The malice is laundered. By the time content becomes a stored memory, it has usually been summarised, reformatted, and stripped of provenance. The agent no longer remembers that this "fact" came from an untrusted web page. It just remembers the fact. The audit trail that would have flagged it is often the first casualty of the summarisation step.

Treat memory writes as a security boundary

The defensive shift is to stop thinking of memory as a convenience feature and start thinking of it as a privileged store. Concretely:

  • Validate at write time, not just read time. The cheapest place to stop a poisoned memory is before it is committed. Content destined for long-term memory should be scanned for injected instructions and manipulation just as carefully as a live request, arguably more so, because it gets a long shelf life and a trust upgrade.
  • Preserve provenance into memory. When a memory is stored, keep where it came from. A "fact" derived from untrusted external content should never be indistinguishable from one the user stated directly.
  • Scope what memory can authorise. A planted belief is only dangerous if acting on it has reach. The same least-privilege discipline that limits live attacks limits poisoned-memory attacks: if the agent cannot send data to arbitrary destinations, a memory that says "this destination is trusted" cannot complete the exfiltration.
  • Re-evaluate, do not blindly trust. High-stakes actions triggered by a memory deserve the same scrutiny as actions triggered by a fresh request. "The agent already believed this" is not a security justification.

Frequently asked questions

Isn't this just prompt injection with extra steps? The entry technique is injection, yes, but the defining feature is persistence. A control that only inspects the current turn will catch the injection if it acts immediately and miss it entirely if it is stored and acted on later. Memory poisoning is specifically the version that waits.

Does a bigger context window or better model fix it? No. The problem is not capacity or capability; it is that the model cannot distinguish a genuine memory from a planted one, because both reach it as trusted context. More memory means more surface, not more safety.

Where is the cheapest place to defend? At the write. Stopping poisoned content before it becomes a trusted memory is far easier than detecting, days later, that the agent's beliefs have been quietly edited.

Where Promptention fits

Our defense-in-depth architecture treats memory integrity as a distinct layer, not an afterthought. Scanning content for injection before it is committed to memory, and helping enforce policy on what a memory-driven action is allowed to do, are exactly the controls this attack class requires. The agent will keep trusting its own memory. The work is making sure nothing hostile ever got written there.

Promptention Guard supports validation of content entering an agent's memory and policy enforcement on memory-driven actions, aligned to OWASP ASI06.

Further reading: OWASP Top 10 for Agentic Applications (2026), "Memory & Context Poisoning."