Physics-Informed Audio-Geometry-Grid Representation Learning for Universal Sound Source Localization

Abstract

Sound source localization (SSL) is a fundamental task in spatial audio understanding, yet most deep neural network-based methods are constrained by fixed array geometries and predefined directional grids, limiting generalizability and scalability. To address these issues, we propose _audio-geometry-grid representation learning_ (AGG-RL), a novel framework that jointly learns audio-geometry and grid representations in a shared latent space, enabling both geometry-invariant and grid-flexible SSL. Moreover, to enhance generalizability and interpretability, we introduce two physics-informed components: a _learnable non-uniform discrete Fourier transform_ (LNuDFT), which optimizes the dense allocation of frequency bins in a non-uniform manner to emphasize informative phase regions, and a _relative microphone positional encoding_ (rMPE), which encodes relative microphone coordinates in accordance with the nature of inter-channel time differences. Experiments on synthetic and real datasets demonstrate that AGG-RL achieves superior performance, particularly under unseen conditions. The results highlight the potential of representation learning with physics-informed design towards a universal solution for spatial acoustic scene understanding across diverse scenarios.

Cite

Text

Baek et al. "Physics-Informed Audio-Geometry-Grid Representation Learning for Universal Sound Source Localization." International Conference on Learning Representations, 2026.

Markdown

[Baek et al. "Physics-Informed Audio-Geometry-Grid Representation Learning for Universal Sound Source Localization." International Conference on Learning Representations, 2026.](https://mlanthology.org/iclr/2026/baek2026iclr-physicsinformed/)

BibTeX

@inproceedings{baek2026iclr-physicsinformed,
  title     = {{Physics-Informed Audio-Geometry-Grid Representation Learning for Universal Sound Source Localization}},
  author    = {Baek, Min-Sang and Kim, Gyeong-Su and Kim, Donghyun and Chang, Joon-Hyuk},
  booktitle = {International Conference on Learning Representations},
  year      = {2026},
  url       = {https://mlanthology.org/iclr/2026/baek2026iclr-physicsinformed/}
}