The most effective attacks rarely ask for the harmful thing directly. They build to it across a conversation, each step harmless on its own. We explain why single-turn defenses miss crescendo attacks and what catches them.

The mental model most teams have of an attack is a single hostile message: one prompt, clearly malicious, that a filter either catches or misses. That model is comforting and increasingly wrong. The attacks we find most effective in our red-team work do not arrive in one message at all. They build, gradually, across a conversation, each turn just slightly more than the last, none of them alarming on its own. By the time the genuinely harmful request appears, the model has already been cooperating for ten turns and treats the final ask as the natural next step. This is the crescendo, and it is specifically designed to defeat defenses that look at one message at a time.

How a crescendo works

The attacker starts somewhere completely benign, adjacent to the target but obviously harmless. Then they escalate in small increments. Each request is a reasonable continuation of the last, so the model keeps saying yes, and each yes makes the next yes easier, because the model is now completing a conversation in which it has been helpful and compliant throughout. The harmful content is reached not by breaking through a wall but by walking up a ramp the attacker built one step at a time.

The power of the technique is that no single turn looks like an attack. Pull any one message out of the sequence and it is innocuous. The malice lives in the trajectory, in the direction and accumulation, not in any individual prompt. A reviewer reading one message would see nothing wrong, and so would a filter that only ever sees one message.

The same shape applies beyond classic jailbreaks. Multi-turn manipulation is how attackers coax out sensitive data piece by piece, gradually steer an agent's behaviour, or walk a customer-service bot past its policies, each step a small concession that adds up to a breach.

Here is the structural problem, and we want to state it plainly because it explains a real and common gap. A defense that evaluates each message independently has, by construction, no memory of the trajectory. It sees the final harmful request in isolation, stripped of the ten turns of escalation that led there, and on its own that request may be ambiguous enough to pass, or the attacker has framed it so that, given everything before it, refusing would seem inconsistent. The context that makes the request dangerous is exactly the context a single-turn defense throws away. You cannot catch an attack defined by accumulation if you refuse to accumulate.

This is why a system can pass a battery of single-prompt jailbreak tests and still fall to a patient attacker. The test set and the defense share the same blind spot.

What catches it

Evaluate the conversation, not just the message. Detection has to consider the trajectory, the direction and escalation across turns, not only the latest input in isolation.
Watch for escalation patterns. A steady drift from benign toward sensitive territory is itself a signal, even when no single turn crosses a line.
Maintain context across the session. The defense needs enough memory of the conversation to recognise where it is heading, the very thing the attacker is exploiting your lack of.
Red-team multi-turn, not just single-shot. If your testing is all one-prompt attacks, your coverage has the same gap the attacker is counting on. Test the ramp, not just the wall.

Frequently asked questions

Why not just block the final harmful request? Because in isolation it often is not obviously harmful, or the attacker has shaped it so the surrounding conversation makes compliance feel consistent. The danger is in the buildup, which a final-message check never sees. You have to evaluate the path, not just the destination.

Does a longer context window make this worse? It can. More context gives the attacker more room to build a gradual ramp and more prior "compliance" for the model to feel it should continue. Capability that helps legitimate users also helps the patient attacker, which is why the defense has to be trajectory-aware rather than relying on the model's restraint.

Is this the same as many-shot jailbreaking? They are cousins. Many-shot floods the context with examples of compliance; crescendo escalates a real conversation step by step. Both exploit the model following an established pattern, and both evade single-turn defenses, so the same multi-turn awareness helps against each.

How Promptention helps

We built our detection to be multi-turn aware, because we kept finding that the attacks that mattered most were the ones a single-message filter could never see. Rather than judging each input in a vacuum, we consider the trajectory of a conversation, so a slow, deliberate escalation toward harmful content or sensitive data is recognised for what it is, an attack in progress, not a series of harmless requests. And our red team runs multi-turn and crescendo techniques against your system specifically, so you find this gap with us rather than with a real adversary who has all the patience in the world.

Promptention Guard provides multi-turn-aware detection, and our Red Team Services test for crescendo and multi-turn manipulation.

Crescendo: The Jailbreak That Builds One Turn at a Time

Table of Contents

How a crescendo works

Why single-turn defenses are blind to it

What catches it

Frequently asked questions

How Promptention helps

Crescendo: The Jailbreak That Builds One Turn at a Time

Table of Contents

Share this article

How a crescendo works

Why single-turn defenses are blind to it

What catches it

Frequently asked questions

How Promptention helps

Share this article

Keep reading

Lockdown Mode Is a Retreat, Not a Solution

How to Threat Model an LLM Application (Without Boiling the Ocean)

Incident Response for AI: What to Do When the Model Is the Problem