Independency Adversarial Learning for Cross-Modal Sound Separation

Abstract

The sound mixture separation is still challenging due to heavy sound overlapping and disturbance from noise. Unsupervised separation would significantly increase the difficulty. As sound overlapping always hinders accurate sound separation, we propose an Independency Adversarial Learning based Cross-Modal Sound Separation (IAL-CMS) approach, where IAL employs adversarial learning to minimize the correlation of separated sound elements, exploring high sound independence; CMS performs cross-modal sound separation, incorporating audio-visual consistent feature learning and interactive cross-attention learning to emphasize the semantic consistency among cross-modal features. Both audio-visual consistency and audio consistency are kept to guarantee accurate separation. The consistency and sound independence ensure the decomposition of overlapping mixtures into unrelated and distinguishable sound elements. The proposed approach is evaluated on MUSIC, VGGSound, and AudioSet. Extensive experiments certify that our approach outperforms existing approaches in supervised and unsupervised scenarios.

Cite

Text

Lin et al. "Independency Adversarial Learning for Cross-Modal Sound Separation." AAAI Conference on Artificial Intelligence, 2024. doi:10.1609/AAAI.V38I4.28140

Markdown

[Lin et al. "Independency Adversarial Learning for Cross-Modal Sound Separation." AAAI Conference on Artificial Intelligence, 2024.](https://mlanthology.org/aaai/2024/lin2024aaai-independency/) doi:10.1609/AAAI.V38I4.28140

BibTeX

@inproceedings{lin2024aaai-independency,
  title     = {{Independency Adversarial Learning for Cross-Modal Sound Separation}},
  author    = {Lin, Zhenkai and Ji, Yanli and Yang, Yang},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {2024},
  pages     = {3522-3530},
  doi       = {10.1609/AAAI.V38I4.28140},
  url       = {https://mlanthology.org/aaai/2024/lin2024aaai-independency/}
}