Semantic Object Prediction and Spatial Sound Super-Resolution with Binaural Sounds
Abstract
Humans can robustly recognize and localize objects by integrating visual and auditory cues. While machines are able to do the same now with images, less work has been done with sounds. This work develops an approach for dense semantic labelling of sound-making objects, purely based on binaural sounds. We propose a novel sensor setup and record a new audio-visual dataset of street scenes with eight professional binaural microphones and a $360^{
Cite
Text
Vasudevan et al. "Semantic Object Prediction and Spatial Sound Super-Resolution with Binaural Sounds." Proceedings of the European Conference on Computer Vision (ECCV), 2020. doi:10.1007/978-3-030-58548-8_37Markdown
[Vasudevan et al. "Semantic Object Prediction and Spatial Sound Super-Resolution with Binaural Sounds." Proceedings of the European Conference on Computer Vision (ECCV), 2020.](https://mlanthology.org/eccv/2020/vasudevan2020eccv-semantic/) doi:10.1007/978-3-030-58548-8_37BibTeX
@inproceedings{vasudevan2020eccv-semantic,
title = {{Semantic Object Prediction and Spatial Sound Super-Resolution with Binaural Sounds}},
author = {Vasudevan, Arun Balajee and Dai, Dengxin and Van Gool, Luc},
booktitle = {Proceedings of the European Conference on Computer Vision (ECCV)},
year = {2020},
doi = {10.1007/978-3-030-58548-8_37},
url = {https://mlanthology.org/eccv/2020/vasudevan2020eccv-semantic/}
}