The Injection You Cannot See: Prompt Attacks in Images

If your model accepts images, documents, or audio but your guardrails only read text, you have an open door. We explain how prompt injection hides in pixels and files, and why text-only defenses miss it.

Multimodal models opened a door most security programs never closed. The moment your application accepts an image, a PDF, a screenshot, or audio, you added attack surface, and if your guardrails only inspect text, you are watching one door while another stands open. We see this gap constantly: teams that built solid text-based input filtering, then added image upload as a feature, and never extended their defenses to match. The attacker only has to use the channel you are not watching.

Visual and multimodal prompt injection is exactly that, an injection that arrives through a non-text input, and it works because the model reads the instruction even when your filter cannot.

How an instruction hides in a picture

A multimodal model does not just "see" an image as decoration; it interprets its content, including any text in it. That is the opening. An attacker can place instructions in an image in ways a human glancing at it might not register and a text filter cannot read at all:

  • Text rendered into the image itself, instructions written as part of the picture, which the model reads and may follow as though they were typed in the prompt.
  • Low-contrast or visually subtle text, faint, small, or blended into the background, present enough for the model to parse, easy for a person to miss.
  • Instructions embedded in documents, a PDF or screenshot the model processes, where the malicious text is just another part of the content it ingests.
  • Audio and other modalities, the same principle applied to speech or other inputs your model accepts.

In every case the mechanism is the one we keep returning to: the model treats the content of what it is given as instructions, and it cannot tell your user's intent from an attacker's planted one. The only thing different here is the delivery channel.

Why text-only guardrails miss it entirely

This is the part we want to be blunt about, because the failure is so clean. A text-based filter inspects the text of the prompt. An instruction painted into an image is not in the prompt text; it is in the pixels. The filter sees an innocuous message and an image it does not analyse, passes both, and the model reads the hidden instruction off the image and acts on it. There was no bypass of your filter. Your filter was simply not looking at the channel the attack used. A guardrail has to cover every modality the application actually accepts, or it is covering none of them in any meaningful sense.

This gap is especially common because multimodal support is frequently added after the initial security controls were built. The defenses were scoped to text, the product grew, and nobody re-scoped the defenses.

What to do about it

  • Match your guardrails to your inputs. If the application accepts images, documents, or audio, your detection has to analyse images, documents, and audio. Text-only coverage on a multimodal app is a known gap, not a partial defense.
  • Analyse content within files, not just the file's safety. A document can be malware-free and still carry an injection in its text. You need to read what the model will read.
  • Treat every uploaded artifact as untrusted input. The same posture you apply to a user's typed message applies to the image they attach.
  • Re-audit when you add a modality. Every new input type your product accepts is a new surface; revisit your controls when the product grows.

Frequently asked questions

Is this a real attack or a theoretical one? It is real and practical. Any model that interprets the content of images or documents can be instructed through them, and the technique requires no special access, just an upload field. As multimodal features spread, so does this surface.

Won't the model's own safety training catch it? No more reliably than it catches text-based injection, which is to say not reliably at all. The model reads the instruction in the image the same way it reads one in text, and is subject to the same inability to distinguish trusted from untrusted. Native safety is not the layer that closes this.

We scan uploads for malware. Isn't that enough? Malware scanning checks whether a file is dangerous to your systems. It does not check whether the file's content is an injection aimed at your model. Those are different inspections; you need both.

How Promptention helps

Covering every channel your application accepts is a principle we build to, not an add-on. Our defense against prompt injection includes multimodal analysis, evaluating images and other non-text inputs for the hidden instructions that text-only filters never see, so the door you opened when you added image upload does not stay unwatched. If your model can read it, your guardrail should be able to read it too. We make sure it can.

Promptention Guard provides multimodal injection detection, covering visual and text prompt attacks across the inputs your application accepts.