Improving Source Extraction with Diffusion and Consistency Models
Abstract
In this work, we integrate a score-matching diffusion model into a standard deterministic architecture for time-domain musical source extraction. To address the typically slow iterative sampling process of diffusion models, we apply consistency distillation and reduce the sampling process to a single step, achieving performance comparable to that of diffusion models, and with two or more steps, even surpassing them. Trained on the Slakh2100 dataset for four instruments (bass, drums, guitar, and piano), our model shows significant improvements across objective metrics compared to baseline methods. Sound examples are available at https://consistency-separation.github.io/.
Cite
Text
Karchkhadze et al. "Improving Source Extraction with Diffusion and Consistency Models." NeurIPS 2024 Workshops: Audio_Imagination, 2024.Markdown
[Karchkhadze et al. "Improving Source Extraction with Diffusion and Consistency Models." NeurIPS 2024 Workshops: Audio_Imagination, 2024.](https://mlanthology.org/neuripsw/2024/karchkhadze2024neuripsw-improving/)BibTeX
@inproceedings{karchkhadze2024neuripsw-improving,
title = {{Improving Source Extraction with Diffusion and Consistency Models}},
author = {Karchkhadze, Tornike and Izadi, Mohammad Rasool and Zhang, Shuo},
booktitle = {NeurIPS 2024 Workshops: Audio_Imagination},
year = {2024},
url = {https://mlanthology.org/neuripsw/2024/karchkhadze2024neuripsw-improving/}
}