Audio Vision: Using Audio-Visual Synchrony to Locate Sounds

Abstract

Psychophysical and physiological evidence shows that sound local(cid:173) ization of acoustic signals is strongly influenced by their synchrony with visual signals. This effect, known as ventriloquism, is at work when sound coming from the side of a TV set feels as if it were coming from the mouth of the actors. The ventriloquism effect suggests that there is important information about sound location encoded in the synchrony between the audio and video signals. In spite of this evidence, audiovisual synchrony is rarely used as a source of information in computer vision tasks. In this paper we explore the use of audio visual synchrony to locate sound sources. We developed a system that searches for regions of the visual land(cid:173) scape that correlate highly with the acoustic signals and tags them as likely to contain an acoustic source. We discuss our experience implementing the system, present results on a speaker localization task and discuss potential applications of the approach.

Cite

Text

Hershey and Movellan. "Audio Vision: Using Audio-Visual Synchrony to Locate Sounds." Neural Information Processing Systems, 1999.

Markdown

[Hershey and Movellan. "Audio Vision: Using Audio-Visual Synchrony to Locate Sounds." Neural Information Processing Systems, 1999.](https://mlanthology.org/neurips/1999/hershey1999neurips-audio/)

BibTeX

@inproceedings{hershey1999neurips-audio,
  title     = {{Audio Vision: Using Audio-Visual Synchrony to Locate Sounds}},
  author    = {Hershey, John R. and Movellan, Javier R.},
  booktitle = {Neural Information Processing Systems},
  year      = {1999},
  pages     = {813-819},
  url       = {https://mlanthology.org/neurips/1999/hershey1999neurips-audio/}
}