A Practitioner's Guide to Continual Multimodal Pretraining
Abstract
Multimodal foundation models, despite being extensively pretrained, become outdated over time. Research into continual pretraining mainly explores (1) infrequent, indiscriminate updates on large-scale new data, or (2) frequent, sample-level updates. However, practical model deployment often operates in the gap between these limit cases, as real-world applications require continual adaptation to specific subdomains or tasks. In this work, we complement current approaches through a new, continual multimodal pretraining test bed with realistic compute constraints and practical deployment requirements (\texttt{FoMo-in-Flux}), and provide \textit{comprehensive practical guidance} for effective continual model updates---investigating different method choices, pipeline design and data-centric deployment scenarios.
Cite
Text
Roth et al. "A Practitioner's Guide to Continual Multimodal Pretraining." NeurIPS 2024 Workshops: Continual_FoMo, 2024.Markdown
[Roth et al. "A Practitioner's Guide to Continual Multimodal Pretraining." NeurIPS 2024 Workshops: Continual_FoMo, 2024.](https://mlanthology.org/neuripsw/2024/roth2024neuripsw-practitioner/)BibTeX
@inproceedings{roth2024neuripsw-practitioner,
title = {{A Practitioner's Guide to Continual Multimodal Pretraining}},
author = {Roth, Karsten and Udandarao, Vishaal and Dziadzio, Sebastian and Prabhu, Ameya and Cherti, Mehdi and Vinyals, Oriol and Henaff, Olivier J and Albanie, Samuel and Bethge, Matthias and Akata, Zeynep},
booktitle = {NeurIPS 2024 Workshops: Continual_FoMo},
year = {2024},
url = {https://mlanthology.org/neuripsw/2024/roth2024neuripsw-practitioner/}
}