Art2Mus: Bridging Visual Arts and Music Through Cross-Modal Generation

Abstract

Artificial Intelligence and generative models have revolutionized music creation, with many models leveraging textual or visual prompts for guidance. However, existing image-to-music models are limited to simple images, lacking the capability to generate music from complex digitized artworks. To address this gap, we introduce $\mathcal{A}\textit{rt2}\mathcal{M}\textit{us}$, a novel model designed to create music from digitized artworks or text inputs. $\mathcal{A}\textit{rt2}\mathcal{M}\textit{us}$ extends the AudioLDM~2 architecture, a text-to-audio model, and employs our newly curated datasets, created via ImageBind, which pair digitized artworks with music. Experimental results demonstrate that $\mathcal{A}\textit{rt2}\mathcal{M}\textit{us}$ can generate music that resonates with the input stimuli. These findings suggest promising applications in multimedia art, interactive installations, and AI-driven creative tools.

Cite

Text

Rinaldi et al. "Art2Mus: Bridging Visual Arts and Music Through Cross-Modal Generation." European Conference on Computer Vision Workshops, 2024. doi:10.1007/978-3-031-92808-6_11

Markdown

[Rinaldi et al. "Art2Mus: Bridging Visual Arts and Music Through Cross-Modal Generation." European Conference on Computer Vision Workshops, 2024.](https://mlanthology.org/eccvw/2024/rinaldi2024eccvw-art2mus/) doi:10.1007/978-3-031-92808-6_11

BibTeX

@inproceedings{rinaldi2024eccvw-art2mus,
  title     = {{Art2Mus: Bridging Visual Arts and Music Through Cross-Modal Generation}},
  author    = {Rinaldi, Ivan and Fanelli, Nicola and Castellano, Giovanna and Vessio, Gennaro},
  booktitle = {European Conference on Computer Vision Workshops},
  year      = {2024},
  pages     = {173-186},
  doi       = {10.1007/978-3-031-92808-6_11},
  url       = {https://mlanthology.org/eccvw/2024/rinaldi2024eccvw-art2mus/}
}