Suggesting Sounds for Images from Video Collections
Abstract
Given a still image, humans can easily think of a sound associated with this image. For instance, people might associate the picture of a car with the sound of a car engine. In this paper we aim to retrieve sounds corresponding to a query image. To solve this challenging task, our approach exploits the correlation between the audio and visual modalities in video collections. A major difficulty is the high amount of uncorrelated audio in the videos, i.e., audio that does not correspond to the main image content, such as voice-over, background music, added sound effects, or sounds originating off-screen. We present an unsupervised, clustering-based solution that is able to automatically separate correlated sounds from uncorrelated ones. The core algorithm is based on a joint audio-visual feature space, in which we perform iterated mutual kNN clustering in order to effectively filter out uncorrelated sounds. To this end we also introduce a new dataset of correlated audio-visual data, on which we evaluate our approach and compare it to alternative solutions. Experiments show that our approach can successfully deal with a high amount of uncorrelated audio.
Cite
Text
Solèr et al. "Suggesting Sounds for Images from Video Collections." European Conference on Computer Vision Workshops, 2016. doi:10.1007/978-3-319-48881-3_59Markdown
[Solèr et al. "Suggesting Sounds for Images from Video Collections." European Conference on Computer Vision Workshops, 2016.](https://mlanthology.org/eccvw/2016/soler2016eccvw-suggesting/) doi:10.1007/978-3-319-48881-3_59BibTeX
@inproceedings{soler2016eccvw-suggesting,
title = {{Suggesting Sounds for Images from Video Collections}},
author = {Solèr, Matthias and Bazin, Jean-Charles and Wang, Oliver and Krause, Andreas and Sorkine-Hornung, Alexander},
booktitle = {European Conference on Computer Vision Workshops},
year = {2016},
pages = {900-917},
doi = {10.1007/978-3-319-48881-3_59},
url = {https://mlanthology.org/eccvw/2016/soler2016eccvw-suggesting/}
}