VMCML: Video and Music Matching via Cross-Modality Lifting

Abstract

We propose a content-based system for matching video and background music. The system aims to address the challenges in music recommendation for new users or new music give short-form videos. To this end, we propose a cross-modal framework VMCML (Video and Music Matching via Cross-Modality Lifting) that finds a shared embedding space between video and music representations. To ensure the embedding space can be effectively shared by both representations, we leverage CosFace loss based on margin-based cosine similarity loss. Furthermore, to confirm the music is not the original sound of the video and that more than one video is matched to the same music, we follow the rule and collect videos and music from a well-known multi-media platform. That is because there are limitations of previous datasets. We establish a large-scale dataset called MSV, which provide 390 individual music and the corresponding matched 150,000 videos. We conduct extensive experiments on Youtube-8M and our MSV datasets. Our quantitative and qualitative results demonstrate the effectiveness of our proposed framework and achieve state-of-the-art video and music matching performance.

Cite

Text

Lee et al. "VMCML: Video and Music Matching via Cross-Modality Lifting." IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2024. doi:10.1109/CVPRW63382.2024.00211

Markdown

[Lee et al. "VMCML: Video and Music Matching via Cross-Modality Lifting." IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2024.](https://mlanthology.org/cvprw/2024/lee2024cvprw-vmcml/) doi:10.1109/CVPRW63382.2024.00211

BibTeX

@inproceedings{lee2024cvprw-vmcml,
  title     = {{VMCML: Video and Music Matching via Cross-Modality Lifting}},
  author    = {Lee, Yi-Shan and Tseng, Wei-Cheng and Wang, Fu-En and Sun, Min},
  booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops},
  year      = {2024},
  pages     = {2060-2069},
  doi       = {10.1109/CVPRW63382.2024.00211},
  url       = {https://mlanthology.org/cvprw/2024/lee2024cvprw-vmcml/}
}