Jailbreaks have evolved from "pretend you have no rules" into a discipline. We map the families we see most in 2026, roleplay, many-shot, crescendo, obfuscation, and explain why blocking one does not block the others.
When people picture a jailbreak, they still picture the early classics: "pretend you are an AI with no restrictions." Those worked once and mostly do not anymore. What replaced them is not a single trick but a whole discipline, a steadily expanding set of techniques that share one goal, getting a model to do what its safety training says it should not. We red-team these for a living, and we want to give you a working map of the families we encounter most in 2026, because the single most important thing to understand about jailbreaks is that they are not one problem with one fix.
A quick distinction first: a jailbreak aims to bypass the model's own safety alignment, its refusal to produce harmful content. Prompt injection aims to override the application's instructions. They overlap and often combine, but the taxonomy below is about the bypass techniques themselves.
The families we see
Roleplay and persona framing. The oldest surviving family. The attacker wraps the request in a fiction, a character, a hypothetical, a story, a "you are a different AI" frame, so that producing the harmful content feels in-character rather than against the rules. Modern versions are far more sophisticated than "pretend," using elaborate nested scenarios that make refusal seem like breaking the fiction.
Many-shot / context flooding. This family exploits long context windows. The attacker fills the context with many examples of the model complying with escalating requests, so that by the time the real ask arrives, the pattern the model is completing is "comply." It turns the model's own in-context learning against its safety training.
Crescendo and multi-turn escalation. Rather than asking for the harmful thing directly, the attacker builds toward it across a conversation, each turn slightly more than the last, none individually alarming. By the time the request is harmful, the model is following the momentum of a conversation it has already been cooperating with. We cover this one in depth separately, because single-turn defenses are blind to it.
Obfuscation and encoding. The attacker hides the request from filters using character tricks, encodings, alternate scripts, invisible characters, base64, or spacing, so that a keyword-based defense never sees the trigger while the model still understands the intent. We treat this as its own deep topic too, because it specifically targets the gap between what a filter reads and what a model comprehends.
Instruction-hierarchy and policy attacks. These exploit how models weigh competing instructions, asserting a higher authority, claiming a policy exception, or framing the harmful output as required by a rule the model should obey. The attacker is not breaking the rules so much as convincing the model that a different rule applies.
Indirect and tool-mediated jailbreaks. In agentic systems, the bypass can arrive through retrieved content, a tool's output, or another agent, rather than the user's prompt, combining jailbreak intent with indirect prompt injection.
Why blocking one does not block the rest
Here is the lesson we most want you to take from this map. These families exploit different properties of the model, the persona-following, the in-context learning, the conversational momentum, the gap between filter and comprehension, the instruction weighting. A defense tuned to catch roleplay framing does little against a crescendo attack. A keyword filter that catches obvious triggers is blind to obfuscation. This is why blocklists and single-technique defenses fail: the attacker simply switches families. And because new variants appear continuously, a static defense ages out almost immediately.
The honest consequence is that jailbreak defense is not a one-time configuration. It is an adaptive, ongoing effort against a moving target, which is exactly why we built our detection to evolve rather than to enumerate.
What actually holds
- Context-aware detection over keyword matching. Evaluate the intent of an input, across families, not the presence of specific trigger words that any obfuscation defeats.
- Multi-turn awareness. Look at the trajectory of a conversation, not just the latest message, so crescendo attacks cannot hide in the gaps between turns.
- Detection that is not itself an LLM with the same blind spots. A defense that shares the target's architecture often shares its weaknesses; effective detection has to operate differently.
- Continuous adaptation. Test against current jailbreak techniques regularly, because last quarter's coverage does not cover this quarter's variants.
Frequently asked questions
Can a strong system prompt prevent jailbreaks? It raises the bar a little and is reliably defeated by the families above, because the jailbreak is operating on the same model your prompt is instructing. Prompt hardening is a baseline, not a control.
Are jailbreaks the same as prompt injection? Related but distinct. A jailbreak targets the model's safety alignment; injection targets the application's instructions. They frequently combine, and the input-layer detection that catches one tends to help with the other.
Why not just keep adding blocked phrases? Because the techniques are about intent and structure, not specific words. Obfuscation and reframing route around any phrase list, and maintaining one is an endless, losing chase. Recognising intent scales; enumerating evil does not.
How Promptention helps
We treat jailbreaks as the moving, multi-family problem they are. Our detection layer evaluates the intent behind inputs across these techniques, is built to be multi-turn aware, and is updated continuously against a growing adversarial dataset, so it keeps pace as the families evolve rather than aging out the week after you deploy it. And our red team will run the current generation of these techniques against your specific system, so you learn where you are exposed from us rather than from an incident. Jailbreaks are not one problem. We do not defend them like they are.
Promptention Guard provides context-aware, continuously updated jailbreak detection, and our Red Team Services test your system against current techniques.

