Prompt-Guided Precise Audio Editing with Diffusion Models

Abstract

Audio editing involves the arbitrary manipulation of audio content through precise control. Although text-guided diffusion models have made significant advancements in text-to-audio generation, they still face challenges in finding a flexible and precise way to modify target events within an audio track. We present a novel approach, referred to as PPAE, which serves as a general module for diffusion models and enables precise audio editing. The editing is based on the input textual prompt only and is entirely training-free. We exploit the cross-attention maps of diffusion models to facilitate accurate local editing and employ a hierarchical local-global pipeline to ensure a smoother editing process. Experimental results highlight the effectiveness of our method in various editing tasks.

Cite

Text

Xu et al. "Prompt-Guided Precise Audio Editing with Diffusion Models." International Conference on Machine Learning, 2024.

Markdown

[Xu et al. "Prompt-Guided Precise Audio Editing with Diffusion Models." International Conference on Machine Learning, 2024.](https://mlanthology.org/icml/2024/xu2024icml-promptguided/)

BibTeX

@inproceedings{xu2024icml-promptguided,
  title     = {{Prompt-Guided Precise Audio Editing with Diffusion Models}},
  author    = {Xu, Manjie and Li, Chenxing and Zhang, Duzhen and Su, Dan and Liang, Wei and Yu, Dong},
  booktitle = {International Conference on Machine Learning},
  year      = {2024},
  pages     = {55126-55143},
  volume    = {235},
  url       = {https://mlanthology.org/icml/2024/xu2024icml-promptguided/}
}