Hyperbolic Learning with Multimodal Large Language Models
Abstract
Hyperbolic embeddings have demonstrated their effectiveness in capturing measures of uncertainty and hierarchical relationships across various deep-learning tasks, including image segmentation and active learning. However, their application in modern vision-language models (VLMs) has been limited. A notable exception is MERU, which leverages the hierarchical properties of hyperbolic space in the CLIP ViT-large model, consisting of hundreds of millions of parameters. In our work, we address the challenges of scaling multi-modal hyperbolic models by orders of magnitude in terms of parameters (billions) and training complexity using the BLIP-2 architecture. Although hyperbolic embeddings offer potential insights into uncertainty not present in Euclidean embeddings, our analysis reveals that scaling these models is particularly difficult. We propose a novel training strategy for a hyperbolic version of BLIP-2, which allows to achieve comparable performance to its Euclidean counterpart, while maintaining stability throughout the training process and showing a meaningful indication of uncertainty with each embedding.
Cite
Text
Mandica et al. "Hyperbolic Learning with Multimodal Large Language Models." European Conference on Computer Vision Workshops, 2024. doi:10.1007/978-3-031-91585-7_23Markdown
[Mandica et al. "Hyperbolic Learning with Multimodal Large Language Models." European Conference on Computer Vision Workshops, 2024.](https://mlanthology.org/eccvw/2024/mandica2024eccvw-hyperbolic/) doi:10.1007/978-3-031-91585-7_23BibTeX
@inproceedings{mandica2024eccvw-hyperbolic,
title = {{Hyperbolic Learning with Multimodal Large Language Models}},
author = {Mandica, Paolo and Franco, Luca and Kallidromitis, Konstantinos and Petryk, Suzanne and Galasso, Fabio},
booktitle = {European Conference on Computer Vision Workshops},
year = {2024},
pages = {382-398},
doi = {10.1007/978-3-031-91585-7_23},
url = {https://mlanthology.org/eccvw/2024/mandica2024eccvw-hyperbolic/}
}