V2A-Mapper: A Lightweight Solution for Vision-to-Audio Generation by Connecting Foundation Models

Cite

Text

Wang et al. "V2A-Mapper: A Lightweight Solution for Vision-to-Audio Generation by Connecting Foundation Models." AAAI Conference on Artificial Intelligence, 2024. doi:10.1609/AAAI.V38I14.29475

Markdown

[Wang et al. "V2A-Mapper: A Lightweight Solution for Vision-to-Audio Generation by Connecting Foundation Models." AAAI Conference on Artificial Intelligence, 2024.](https://mlanthology.org/aaai/2024/wang2024aaai-v/) doi:10.1609/AAAI.V38I14.29475

BibTeX

@inproceedings{wang2024aaai-v,
  title     = {{V2A-Mapper: A Lightweight Solution for Vision-to-Audio Generation by Connecting Foundation Models}},
  author    = {Wang, Heng and Ma, Jianbo and Pascual, Santiago and Cartwright, Richard and Cai, Weidong},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {2024},
  pages     = {15492-15501},
  doi       = {10.1609/AAAI.V38I14.29475},
  url       = {https://mlanthology.org/aaai/2024/wang2024aaai-v/}
}