Galaxy Walker: Geometry-Aware VLMs for Galaxy-Scale Understanding

Abstract

Modern vision-language models (VLMs) develop patch embedding and convolution backbone within vector space, especially Euclidean ones, at the very founding. When expanding VLMs to a galaxy-scale for understanding astronomical phenomena, the integration of spherical space for planetary orbits and hyperbolic spaces for black holes raises two formidable challenges. a) The current pre-training model is confined to Euclidean space rather than a comprehensive geometric embedding. b) The predominant architecture lacks suitable backbones for anisotropic physical geometries. In this paper, we introduced Galaxy-Walker, a geometry-aware VLM, for the universe-level vision understanding tasks. We proposed the geometry prompt that generates geometry tokens by random walks across diverse spaces on a multi-scale physical graph, along with a geometry adapter that compresses and reshapes the space anisotropy in a mixture-of-experts manner. Extensive experiments demonstrate the effectiveness of our approach, with Galaxy-Walker achieving state-of-the-art performance in both galaxy property estimation (R2 scores up to 0.91) and morphology classification tasks (up to +0.17 F1 improvement in challenging features), significantly outperforming both domain-specific models and general-purpose VLMs.

Cite

Text

Chen et al. "Galaxy Walker: Geometry-Aware VLMs for Galaxy-Scale Understanding." Conference on Computer Vision and Pattern Recognition, 2025. doi:10.1109/CVPR52734.2025.00389

Markdown

[Chen et al. "Galaxy Walker: Geometry-Aware VLMs for Galaxy-Scale Understanding." Conference on Computer Vision and Pattern Recognition, 2025.](https://mlanthology.org/cvpr/2025/chen2025cvpr-galaxy/) doi:10.1109/CVPR52734.2025.00389

BibTeX

@inproceedings{chen2025cvpr-galaxy,
  title     = {{Galaxy Walker: Geometry-Aware VLMs for Galaxy-Scale Understanding}},
  author    = {Chen, Tianyu and Fu, Xingcheng and Gao, Yisen and Qian, Haodong and Wei, Yuecen and Yan, Kun and Zhou, Haoyi and Li, Jianxin},
  booktitle = {Conference on Computer Vision and Pattern Recognition},
  year      = {2025},
  pages     = {4112-4121},
  doi       = {10.1109/CVPR52734.2025.00389},
  url       = {https://mlanthology.org/cvpr/2025/chen2025cvpr-galaxy/}
}