Banks hear the eerie echoes of AI-generated voices

By Carter Pape September 07, 2023, 3:45 p.m. EDT 4 Min Read

Fraudsters are putting AI-generated voices on the phone with banks, impersonating customers and deceiving financial institutions. But the industry is fighting back with detection systems.

Fans of Jordan Peele, part of the duo behind Comedy Central's popular "Key and Peele" series, are likely familiar with his impersonations of former President Barack Obama, including a video Peele made with BuzzFeed in 2018 — the one where Obama appears to use an expletive to describe then-President Donald Trump.

The video employs a technology known as a deepfake — doctored media employing artificial intelligence to achieve realism. The technology has gained traction across the internet as a comedic gag. In the world of fraud, it is increasingly helping criminals trick companies into parting with their money.

As the development of AI tools advances, the threat is being extended to banks. One company, Pindrop, has reviewed more than 5 billion calls to financial firms' call centers and says that it has started detecting AI-generated voices in the last year.

Pindrop CEO Vijay Balasubramaniyan said that while the threat to call centers of fraudsters' using deepfakes is real, it has not been very severe so far; scammers by and large prefer to use their own voices rather than that of a computer to try to steal money from companies, according to Balasubramaniyan.

But that could change as AI tools become more sophisticated and accessible. Machine-learning techniques such as generative adversarial networks have yielded faster and more accurate voice simulations, making it easier to, for example, generate convincing fakes in real time while on the phone with a call center.

"We anticipate that deepfake attacks will become more sophisticated and abundant in the near future because of the recent increase in good-quality commercial TTS (text to speech) tools," Balasubramaniyan said.

These tools work by taking samples of people talking to create a model of their voice that captures the various characteristics that make them sound how they do. The user can then provide the software with text that gets turned into an audio file of the voice saying whatever the user wants.

Fraudsters deploy voice deepfakes alongside many of the same methods used in other fraud schemes, according to Baptiste Collot, co-founder and CEO of the payment-fraud-prevention platform Trustpair. He described the calls fraudsters make to banks' call centers.

"The scam hinges on putting pressure on the target with time-sensitive language to create urgency and offering specific, legitimate company or employee information to gain trust," Collot said. "Often, the fraudster will impersonate a bank representative — someone with authority over the target or someone a bank regularly works with. By appearing as a reputable banking representative, the fraudsters pressure to initiate seemingly real payments."

How real is the deepfake threat?

New technologies are making fraud detection even more of a challenge for financial institutions, but there are steps credit unions can take to protect themselves and their members.

Pindrop and companies that offer related services, including Nuance, IngenID, and Veridas have methods of detecting when a voice is fake or real, even in cases where they are hearing a voice for the first time. This is because text-to-speech software often leaves artifacts in the audio — traces of data that clue in the astute observer to that the voice is computer-generated.

In a video earlier this year, Pindrop demonstrated this capability using remarks that Sen. Richard Blumenthal, a Democrat representing Connecticut, made and synthesized during a hearing on regulating artificial intelligence. The senator used text-to-speech software to replicate his own voice making a statement about the risks of AI. Pindrop's video shows its software ranking Blumenthal's real voice as real and the computer-generated voice as fake.

A potent tool that companies can use to fight back against voice deepfakes is the voiceprint — a fingerprint for the voice. Like deepfake technology itself, voiceprints quantify characteristics of voices that can be hard for the human ear to discriminate. Artificial intelligence models that are trained on human speech can compare a sample of speech to a voiceprint to give a score of how similar the two are — in other words, how authentic the voice sounds.

Voiceprints are the same technology that allows Apple, Amazon, Google and other devices to differentiate who is speaking. On newer iPhones, the only voice that can activate Siri is the owner's, and certain Alexa devices can differentiate the voices of household members to enable personalized commands (such as "Call mom" or "Play my favorite music").

These voiceprints are also a tool for fraudsters, though, who can take clips of audio from videos of a potential victim, turn that audio into a voiceprint, then train voice generation AI to mimic that voiceprint, mixing in unique cadences and intonations to give a sense of life or reality to the voice.

Despite the eroding trust that companies may have in voice authentication with the advent of deepfakes, biometrics still offer a layer of security from which many banks can benefit, according to Eduardo Azanza, CEO at the identity-verification company Veridas.

"The convenience of biometrics outweighs the risk — customers no longer want to remember and manage dozens of passwords," Azanza said. "Because biometrics are so unique to an individual, they are less likely to be forgotten, stolen or replicated, ultimately making them the more secure option."

When a bank takes a call in its call center, it is well advised to rely on multiple layers of authentication, Azanza said. Fraudsters can spoof voices, steal passwords and answer security questions, but it's harder to do all of these at once than to do just one.

Carter Pape

Technology Reporter, American Banker

	About Carter
mailto	carter.pape@arizent.com
linkedin	carter-pape