Semantic Object Prediction and Spatial Sound Super-Resolution with Binaural Sounds

Abstract

Humans can robustly recognize and localize objects by integrating visual and auditory cues. While machines are able to do the same now with images, less work has been done with sounds. This work develops an approach for dense semantic labelling of sound-making objects, purely based on binaural sounds. We propose a novel sensor setup and record a new audio-visual dataset of street scenes with eight professional binaural microphones and a $360^{

Cite

Text

Vasudevan et al. "Semantic Object Prediction and Spatial Sound Super-Resolution with Binaural Sounds." Proceedings of the European Conference on Computer Vision (ECCV), 2020. doi:10.1007/978-3-030-58548-8_37

Markdown

[Vasudevan et al. "Semantic Object Prediction and Spatial Sound Super-Resolution with Binaural Sounds." Proceedings of the European Conference on Computer Vision (ECCV), 2020.](https://mlanthology.org/eccv/2020/vasudevan2020eccv-semantic/) doi:10.1007/978-3-030-58548-8_37

BibTeX

@inproceedings{vasudevan2020eccv-semantic,
  title     = {{Semantic Object Prediction and Spatial Sound Super-Resolution with Binaural Sounds}},
  author    = {Vasudevan, Arun Balajee and Dai, Dengxin and Van Gool, Luc},
  booktitle = {Proceedings of the European Conference on Computer Vision (ECCV)},
  year      = {2020},
  doi       = {10.1007/978-3-030-58548-8_37},
  url       = {https://mlanthology.org/eccv/2020/vasudevan2020eccv-semantic/}
}