Translating Visual Art into Music

Abstract

The Synesthetic Variational Autoencoder (SynVAE) introduced in this research is able to learn a consistent mapping between visual and auditive sensory modalities in the absence of paired datasets. A quantitative evaluation on MNIST as well as the Behance Artistic Media dataset (BAM) shows that SynVAE is capable of retaining sufficient information content during the translation while maintaining cross-modal latent space consistency. In a qualitative evaluation trial, human evaluators were furthermore able to match musical samples with the images which generated them with accuracies of up to 73%.

Cite

Text

Müller-Eberstein and van Noord. "Translating Visual Art into Music." IEEE/CVF International Conference on Computer Vision Workshops, 2019. doi:10.1109/ICCVW.2019.00378

Markdown

[Müller-Eberstein and van Noord. "Translating Visual Art into Music." IEEE/CVF International Conference on Computer Vision Workshops, 2019.](https://mlanthology.org/iccvw/2019/mullereberstein2019iccvw-translating/) doi:10.1109/ICCVW.2019.00378

BibTeX

@inproceedings{mullereberstein2019iccvw-translating,
  title     = {{Translating Visual Art into Music}},
  author    = {Müller-Eberstein, Maximilian and van Noord, Nanne},
  booktitle = {IEEE/CVF International Conference on Computer Vision Workshops},
  year      = {2019},
  pages     = {3117-3120},
  doi       = {10.1109/ICCVW.2019.00378},
  url       = {https://mlanthology.org/iccvw/2019/mullereberstein2019iccvw-translating/}
}