WavCraft: Audio Editing and Generation with Large Language Models

Abstract

We introduce WavCraft, a collective system that leverages large language models (LLMs) to connect diverse task-specific models for audio content creation and editing. Specifically, WavCraft describes the content of raw sound materials in natural language and prompts the LLM conditioned on audio descriptions and users' requests. WavCraft leverages the in-context learning ability of the LLM to decomposes users' instructions into several tasks and tackle each task collaboratively with the particular module. Through task decomposition along with a set of task-specific models, WavCraft follows the input instruction to create or edit audio content with more details and rationales, facilitating users' control. In addition, WavCraft is able to cooperate with users via dialogue interaction and even produce the audio content without explicit user commands. Experiments demonstrate that WavCraft yields a better performance than existing methods, especially when adjusting the local regions of audio clips. Moreover, WavCraft can follow complex instructions to edit and even create audio content on the top of input recordings, facilitating audio producers in a broader range of applications.

Cite

Text

Liang et al. "WavCraft: Audio Editing and Generation with Large Language Models." ICLR 2024 Workshops: LLMAgents, 2024.

Markdown

[Liang et al. "WavCraft: Audio Editing and Generation with Large Language Models." ICLR 2024 Workshops: LLMAgents, 2024.](https://mlanthology.org/iclrw/2024/liang2024iclrw-wavcraft/)

BibTeX

@inproceedings{liang2024iclrw-wavcraft,
  title     = {{WavCraft: Audio Editing and Generation with Large Language Models}},
  author    = {Liang, Jinhua and Zhang, Huan and Liu, Haohe and Cao, Yin and Kong, Qiuqiang and Liu, Xubo and Wang, Wenwu and Plumbley, Mark D and Phan, Huy and Benetos, Emmanouil},
  booktitle = {ICLR 2024 Workshops: LLMAgents},
  year      = {2024},
  url       = {https://mlanthology.org/iclrw/2024/liang2024iclrw-wavcraft/}
}