Seeing with Sound: Long-Range Acoustic Beamforming for Multimodal Scene Understanding

Abstract

Existing autonomous vehicles primarily use sensors that rely on electromagnetic waves which are undisturbed in good environmental conditions but can suffer in adverse scenarios, such as low light or for objects with low reflectance. Moreover, only objects in direct line-of-sight are typically detected by these existing methods. Acoustic pressure waves emanating from road users do not share these limitations. However, such signals are typically ignored in automotive perception because they suffer from low spatial resolution and lack directional information. In this work, we introduce long-range acoustic beamforming of pressure waves from noise directly produced by automotive vehicles in-the-wild as a complementary sensing modality to traditional optical sensor approaches for detection of objects in dynamic traffic environments. To this end, we introduce the first multimodal long-range acoustic beamforming dataset. We propose a neural aperture expansion method for beamforming and we validate its utility for multimodal automotive object detection. We validate the benefit of adding sound detections to existing RGB cameras in challenging automotive scenarios, where camera-only approaches fail or do not deliver the ultra-fast rates of pressure sensors.

Cite

Text

Chakravarthula et al. "Seeing with Sound: Long-Range Acoustic Beamforming for Multimodal Scene Understanding." Conference on Computer Vision and Pattern Recognition, 2023. doi:10.1109/CVPR52729.2023.00101

Markdown

[Chakravarthula et al. "Seeing with Sound: Long-Range Acoustic Beamforming for Multimodal Scene Understanding." Conference on Computer Vision and Pattern Recognition, 2023.](https://mlanthology.org/cvpr/2023/chakravarthula2023cvpr-seeing/) doi:10.1109/CVPR52729.2023.00101

BibTeX

@inproceedings{chakravarthula2023cvpr-seeing,
  title     = {{Seeing with Sound: Long-Range Acoustic Beamforming for Multimodal Scene Understanding}},
  author    = {Chakravarthula, Praneeth and D’Souza, Jim Aldon and Tseng, Ethan and Bartusek, Joe and Heide, Felix},
  booktitle = {Conference on Computer Vision and Pattern Recognition},
  year      = {2023},
  pages     = {982-991},
  doi       = {10.1109/CVPR52729.2023.00101},
  url       = {https://mlanthology.org/cvpr/2023/chakravarthula2023cvpr-seeing/}
}