Disentangling Multi-Instrument Music Audio for Source-Level Pitch and Timbre Manipulation
Abstract
Disentangling pitch and timbre from the audio of a musical instrument involves encoding these two attributes as separate latent representations, allowing the synthesis of instrument sounds with novel attribute combinations by manipulating one representation independently of the other. Existing solutions have mostly focused on single-instrument audio, excluding the cases where multiple sources of instruments are presented. To fill the gap, we aim to disentangle multi-instrument mixtures by extracting per-instrument representation that combines the pitch and timbre latent variables. These latent variables construct a set of modular building blocks that is used to condition a decoder to compose new mixtures. We first present a simple implementation to verify the framework using structured and isolated chords. We then scale up to a complex dataset of four-part chorales by a model that jointly learns the latents and a diffusion transformer. Our evaluation identifies the key components for the success of disentanglement and demonstrates the application of mixture transformation based on source-level attribute manipulation. Audio samples are available at https://yjlolo.github.io/dismix-audio-samples.
Cite
Text
Luo et al. "Disentangling Multi-Instrument Music Audio for Source-Level Pitch and Timbre Manipulation." NeurIPS 2024 Workshops: Audio_Imagination, 2024.Markdown
[Luo et al. "Disentangling Multi-Instrument Music Audio for Source-Level Pitch and Timbre Manipulation." NeurIPS 2024 Workshops: Audio_Imagination, 2024.](https://mlanthology.org/neuripsw/2024/luo2024neuripsw-disentangling/)BibTeX
@inproceedings{luo2024neuripsw-disentangling,
title = {{Disentangling Multi-Instrument Music Audio for Source-Level Pitch and Timbre Manipulation}},
author = {Luo, Yin-Jyun and Cheuk, Kin Wai and Choi, Woosung and Liao, Wei-Hsiang and Toyama, Keisuke and Uesaka, Toshimitsu and Saito, Koichi and Lai, Chieh-Hsin and Takida, Yuhta and Dixon, Simon and Mitsufuji, Yuki},
booktitle = {NeurIPS 2024 Workshops: Audio_Imagination},
year = {2024},
url = {https://mlanthology.org/neuripsw/2024/luo2024neuripsw-disentangling/}
}