Retrieval Guided Music Captioning via Multimodal Prefixes

Srivatsan, Nikita; Chen, Ke; Dubnov, Shlomo; Berg-Kirkpatrick, Taylor

doi:10.24963/ijcai.2024/859

Retrieval Guided Music Captioning via Multimodal Prefixes

Nikita Srivatsan, Ke Chen, Shlomo Dubnov, Taylor Berg-Kirkpatrick

IJCAI 2024 pp. 7762-7770

doi:10.24963/ijcai.2024/859 /ijcai/2024/srivatsan2024ijcai-retrieval/

Abstract

Crystal structures can be simplified as a periodic point set that repeats across three-dimensional space along an underlying lattice. Traditionally, crystal representation methods rely on descriptors such as lattice parameters, symmetry, and space groups to characterize the structure. However, in reality, atoms in materials always vibrate above absolute zero, causing their positions to fluctuate continuously. This dynamic behavior disrupts the fundamental periodicity of the lattice, making crystal graphs based on static lattice parameters and conventional descriptors discontinuous under slight perturbations. Chemists proposed the pairwise distance distribution (PDD) method to address this. However, the completeness of PDD requires defining a large number of neighboring atoms, leading to high computational costs. Additionally, PDD does not account for atomic information, making it challenging to apply it directly to crystal material property prediction tasks. To tackle these challenges, we introduce the atom-weighted Pairwise Distance Distribution (WPDD) and Unit cell Pairwise Distance Distribution (UPDD) for the first time, applying them to the construction of multi-edge crystal graphs. We demonstrate the continuity and general completeness of crystal graphs under slight atomic position perturbations. Moreover, by modeling PDD as global information and integrating it into matrix-based message passing, we significantly reduce computational costs. Comprehensive evaluation results show that WPDDFormer achieves state-of-the-art predictive accuracy across tasks on benchmark datasets such as the Materials Project and JARVIS-DFT.

PDF IJCAI Semantic Scholar

Cite

Text

Srivatsan et al. "Retrieval Guided Music Captioning via Multimodal Prefixes." International Joint Conference on Artificial Intelligence, 2024. doi:10.24963/ijcai.2024/859

Markdown

[Srivatsan et al. "Retrieval Guided Music Captioning via Multimodal Prefixes." International Joint Conference on Artificial Intelligence, 2024.](https://mlanthology.org/ijcai/2024/srivatsan2024ijcai-retrieval/) doi:10.24963/ijcai.2024/859

BibTeX

@inproceedings{srivatsan2024ijcai-retrieval,
  title     = {{Retrieval Guided Music Captioning via Multimodal Prefixes}},
  author    = {Srivatsan, Nikita and Chen, Ke and Dubnov, Shlomo and Berg-Kirkpatrick, Taylor},
  booktitle = {International Joint Conference on Artificial Intelligence},
  year      = {2024},
  pages     = {7762-7770},
  doi       = {10.24963/ijcai.2024/859},
  url       = {https://mlanthology.org/ijcai/2024/srivatsan2024ijcai-retrieval/}
}