Grounding Acoustic Echoes in Single View Geometry Estimation

Abstract

Extracting the 3D geometry plays an important part in scene understanding. Recently, robust visual descriptors are proposed for extracting the indoor scene layout from a passive agent’s perspective, specifically from a single image. Their robustness is mainly due to modelling the physical interaction of the underlying room geometry with the objects and the humans present in the room. In this work we add the physical constraints coming from acoustic echoes, generated by an audio source, to this visual model. Our audio-visual 3D geometry descriptor improves over the state of the art in passive perception models as we show in our experiments.

Cite

Text

Hussain et al. "Grounding Acoustic Echoes in Single View Geometry Estimation." AAAI Conference on Artificial Intelligence, 2014. doi:10.1609/AAAI.V28I1.9140

Markdown

[Hussain et al. "Grounding Acoustic Echoes in Single View Geometry Estimation." AAAI Conference on Artificial Intelligence, 2014.](https://mlanthology.org/aaai/2014/hussain2014aaai-grounding/) doi:10.1609/AAAI.V28I1.9140

BibTeX

@inproceedings{hussain2014aaai-grounding,
  title     = {{Grounding Acoustic Echoes in Single View Geometry Estimation}},
  author    = {Hussain, Muhammad Wajahat and Civera, Javier and Montano, Luis},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {2014},
  pages     = {2760-2766},
  doi       = {10.1609/AAAI.V28I1.9140},
  url       = {https://mlanthology.org/aaai/2014/hussain2014aaai-grounding/}
}