The Voice on the Call Might Not Be Real: Deepfakes in the Enterprise

AI now clones voices and faces convincingly enough to authorise fraudulent transfers over video calls. We cover how deepfake-driven social engineering works and the human and process controls that actually stop it.

Not every AI risk lives inside your application. Some of it calls your finance team. Generative AI has made it cheap and convincing to clone a person's voice and even their face in real time, and attackers have noticed. The result is a sharp evolution of an old attack: social engineering, now wearing the voice of your CEO on a phone call or their face in a video meeting, asking an employee to do something urgent. The technology that makes AI useful is the same technology that makes this fraud work, and defending against it is less about your model and more about your people and your processes.

How the attack works now

The classic version of this fraud relied on a convincing email or a spoofed phone number. The new version adds synthetic media that defeats the instincts people relied on to tell real from fake.

  • Voice cloning. With a short sample of someone's speech, attackers can generate that person saying anything, in real time, on a call. The employee hears a familiar voice and trusts it.
  • Video deepfakes. More elaborate operations put a synthetic face on a video call, so that an employee believes they are speaking with a senior leader, or several, who are instructing them to act.
  • Personalised, AI-written pretexts. The social engineering around the media is sharper too, drawing on public information to make the request specific and believable.

The mechanics of the fraud are the familiar ones, urgency, authority, secrecy, "I need this transfer done now, and keep it between us." What is new is that the channel itself, the voice, the face, can no longer be trusted as proof of identity. There are well-documented cases of employees authorising very large transfers after a video call with what turned out to be entirely synthetic participants.

Why detection technology is not the whole answer

It is tempting to hope for a tool that flags deepfakes, and detection research is real and improving. But we want to be honest: relying on automated deepfake detection as your primary control is fragile, because the generation technology keeps improving and the detection is always chasing it. The durable defense does not depend on spotting the fake. It depends on a process that does not treat the channel as proof in the first place. If a synthetic voice cannot, by itself, authorise a transfer, then whether you detected it or not, the fraud fails.

This is a place where the right answer is partly human, and we say that plainly rather than pretending technology covers everything.

The controls that actually stop it

  • Verify through a different channel. The single most effective habit: if a request arrives by call, confirm it by a separate, known channel before acting, message the person through your internal system, call them back on a known number. The attacker controls one channel; they rarely control two. If a video call asks for something consequential, hang up and reach the person yourself.
  • Require multi-party approval for consequential actions. No single person, on a single channel, should be able to authorise a large transfer. A second, independent approver breaks the urgency-and-secrecy play that these attacks depend on.
  • Treat urgency and secrecy as red flags, not reasons. "Do this now and tell no one" is the signature of the attack. Train people to slow down precisely when they are being pushed to speed up.
  • Build awareness. People defend better when they know real-time voice and video can be faked. The instinct "I heard their voice, so it was them" is exactly the assumption the attack exploits, and naming it disarms it.

Frequently asked questions

Can't software just detect the deepfake on the call? Detection exists and helps, but it is a moving target, generation improves and detection follows, so building your defense on spotting the fake is fragile. A process that does not accept a single channel as authorisation works regardless of whether the fake was caught, which is why we put the weight there.

Is this really an AI security issue? It is AI-enabled fraud, and it belongs in your security awareness and process controls. The AI made the attack convincing; the defense is verification, multi-party approval, and a culture that questions urgent, secret, single-channel requests. The threat is new; the discipline that beats it is sound and learnable.

What is the one habit to teach first? Switch the channel. If a request to move money or data arrives on one channel, verify it on another before acting. That single reflex defeats the large majority of these attacks, because the attacker almost never controls both.

How Promptention helps

This particular threat lives mostly in human process, and we will not pretend a product replaces verification habits and multi-party approval, those are controls your organisation has to own. Where we help is upstream and adjacent: the same discipline that beats deepfake fraud, do not trust a single channel, verify before acting, mirrors the principle behind everything we build, treat untrusted input as untrusted and confirm before consequential action. For the AI systems in your stack, we provide the detection, monitoring, and policy controls that enforce that discipline; for your people, the most important defense is awareness and process, which is why we are happy to say so directly rather than overselling a tool.

Promptention secures the AI systems in your stack; for deepfake-driven social engineering, pair that with verification habits and multi-party approval. The strongest control here is human.