iQuery: Instruments as Queries for Audio-Visual Sound Separation
Abstract
Current audio-visual separation methods share a standard architecture design where an audio encoder-decoder network is fused with visual encoding features at the encoder bottleneck. This design confounds the learning of multi-modal feature encoding with robust sound decoding for audio separation. To generalize to a new instrument, one must fine-tune the entire visual and audio network for all musical instruments. We re-formulate the visual-sound separation task and propose Instruments as Queries (iQuery) with a flexible query expansion mechanism. Our approach ensures cross-modal consistency and cross-instrument disentanglement. We utilize "visually named" queries to initiate the learning of audio queries and use cross-modal attention to remove potential sound source interference at the estimated waveforms. To generalize to a new instrument or event class, drawing inspiration from the text-prompt design, we insert additional queries as audio prompts while freezing the attention mechanism. Experimental results on three benchmarks demonstrate that our iQuery improves audio-visual sound source separation performance. Code is available at https://github.com/JiabenChen/iQuery.
Cite
Text
Chen et al. "iQuery: Instruments as Queries for Audio-Visual Sound Separation." Conference on Computer Vision and Pattern Recognition, 2023. doi:10.1109/CVPR52729.2023.01410Markdown
[Chen et al. "iQuery: Instruments as Queries for Audio-Visual Sound Separation." Conference on Computer Vision and Pattern Recognition, 2023.](https://mlanthology.org/cvpr/2023/chen2023cvpr-iquery/) doi:10.1109/CVPR52729.2023.01410BibTeX
@inproceedings{chen2023cvpr-iquery,
title = {{iQuery: Instruments as Queries for Audio-Visual Sound Separation}},
author = {Chen, Jiaben and Zhang, Renrui and Lian, Dongze and Yang, Jiaqi and Zeng, Ziyao and Shi, Jianbo},
booktitle = {Conference on Computer Vision and Pattern Recognition},
year = {2023},
pages = {14675-14686},
doi = {10.1109/CVPR52729.2023.01410},
url = {https://mlanthology.org/cvpr/2023/chen2023cvpr-iquery/}
}