Principled Probing of Foundation Models in the Auditory Modality

Abstract

We leverage ecological theories of sound perception in humans and a carefully designed dataset of perceptually calibrated sounds to develop and carry out principled fine-grained probing of foundation models in relation to the auditory modality. We show that internal activations of the state-of-the-art audio foundation model BEATs correlate better with perceptual dimensions than a supervised audio classification model and a text-audio multimodal model and that all models fail to represent at least one perceptual dimension. We also report preliminary evidence suggesting that directions aligning invariantly with a perceptual dimension can be identified within the representation space at inner layers of the BEATs model. We briefly discuss future work and potential applications.

Cite

Text

Bost et al. "Principled Probing of Foundation Models in the Auditory Modality." NeurIPS 2024 Workshops: Behavioral_ML, 2024.

Markdown

[Bost et al. "Principled Probing of Foundation Models in the Auditory Modality." NeurIPS 2024 Workshops: Behavioral_ML, 2024.](https://mlanthology.org/neuripsw/2024/bost2024neuripsw-principled/)

BibTeX

@inproceedings{bost2024neuripsw-principled,
  title     = {{Principled Probing of Foundation Models in the Auditory Modality}},
  author    = {Bost, Etienne and Aramaki, Mitsuko and Kronland-Martinet, Richard and Ystad, Sølvi and Artières, Thierry and Schatz, Thomas},
  booktitle = {NeurIPS 2024 Workshops: Behavioral_ML},
  year      = {2024},
  url       = {https://mlanthology.org/neuripsw/2024/bost2024neuripsw-principled/}
}