Realtime Generation of Audible Textures Inspired by a Video Stream

Abstract

We showcase a model to generate a soundscape from a camera stream in real time. The approach relies on a training video with an associated meaningful audio track; a granular synthesizer generates a novel sound by randomly sampling and mixing audio data from such video, favoring timestamps whose frame is similar to the current camera frame; the semantic similarity between frames is computed by a pretrained neural network. The demo is interactive: a user points a mobile phone to different objects and hears how the generated sound changes.

Cite

Text

Mellace et al. "Realtime Generation of Audible Textures Inspired by a Video Stream." AAAI Conference on Artificial Intelligence, 2019. doi:10.1609/AAAI.V33I01.33019865

Markdown

[Mellace et al. "Realtime Generation of Audible Textures Inspired by a Video Stream." AAAI Conference on Artificial Intelligence, 2019.](https://mlanthology.org/aaai/2019/mellace2019aaai-realtime/) doi:10.1609/AAAI.V33I01.33019865

BibTeX

@inproceedings{mellace2019aaai-realtime,
  title     = {{Realtime Generation of Audible Textures Inspired by a Video Stream}},
  author    = {Mellace, Simone and Guzzi, Jérôme and Giusti, Alessandro and Gambardella, Luca Maria},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {2019},
  pages     = {9865-9866},
  doi       = {10.1609/AAAI.V33I01.33019865},
  url       = {https://mlanthology.org/aaai/2019/mellace2019aaai-realtime/}
}