Proactive Detection of Voice Cloning with Localized Watermarking

Abstract

In the rapidly evolving field of speech generative models, there is a pressing need to ensure audio authenticity against the risks of voice cloning. We present AudioSeal, the first audio watermarking technique designed specifically for localized detection of AI-generated speech. AudioSeal employs a generator / detector architecture trained jointly with a localization loss to enable localized watermark detection up to the sample level, and a novel perceptual loss inspired by auditory masking, that enables AudioSeal to achieve better imperceptibility. AudioSeal achieves state-of-the-art performance in terms of robustness to real life audio manipulations and imperceptibility based on automatic and human evaluation metrics. Additionally, AudioSeal is designed with a fast, single-pass detector, that significantly surpasses existing models in speed, achieving detection up to two orders of magnitude faster, making it ideal for large-scale and real-time applications.Code is available at https://github.com/facebookresearch/audioseal

Cite

Text

San Roman et al. "Proactive Detection of Voice Cloning with Localized Watermarking." International Conference on Machine Learning, 2024.

Markdown

[San Roman et al. "Proactive Detection of Voice Cloning with Localized Watermarking." International Conference on Machine Learning, 2024.](https://mlanthology.org/icml/2024/sanroman2024icml-proactive/)

BibTeX

@inproceedings{sanroman2024icml-proactive,
  title     = {{Proactive Detection of Voice Cloning with Localized Watermarking}},
  author    = {San Roman, Robin and Fernandez, Pierre and Elsahar, Hady and Défossez, Alexandre and Furon, Teddy and Tran, Tuan},
  booktitle = {International Conference on Machine Learning},
  year      = {2024},
  pages     = {43180-43196},
  volume    = {235},
  url       = {https://mlanthology.org/icml/2024/sanroman2024icml-proactive/}
}