Contrastive Lyrics Alignment with a Timestamp-Informed Loss

Abstract

Recent multimodal methods for lyrics alignment have relied on large datasets. Our approach introduces a box loss that directly incorporates timestamp information into the loss function, enabling precise alignment and competitive results even with limited training data. We also address the noise present in the public DALI dataset, conducting a thorough cleaning process to improve the quality of training data. Finally, we propose JamendoLyrics++, a substantial extension of the common JamendoLyrics evaluation dataset, offering improved genre diversity for better evaluation of lyrics alignment systems.

Cite

Text

Kick et al. "Contrastive Lyrics Alignment with a Timestamp-Informed Loss." NeurIPS 2024 Workshops: Audio_Imagination, 2024.

Markdown

[Kick et al. "Contrastive Lyrics Alignment with a Timestamp-Informed Loss." NeurIPS 2024 Workshops: Audio_Imagination, 2024.](https://mlanthology.org/neuripsw/2024/kick2024neuripsw-contrastive/)

BibTeX

@inproceedings{kick2024neuripsw-contrastive,
  title     = {{Contrastive Lyrics Alignment with a Timestamp-Informed Loss}},
  author    = {Kick, Timon and Grötschla, Florian and Lanzendörfer, Luca A and Wattenhofer, Roger},
  booktitle = {NeurIPS 2024 Workshops: Audio_Imagination},
  year      = {2024},
  url       = {https://mlanthology.org/neuripsw/2024/kick2024neuripsw-contrastive/}
}