TerraMind: Large-Scale Generative Multimodality for Earth Observation
Abstract
We present TerraMind, the first any-to-any generative, multi-modal foundation model for Earth observation (EO). Unlike other multimodal models, TerraMind is pretrained on dual-scale representations combining both token-level and pixel-level data across modalities. On a token level, TerraMind encodes high-level contextual information to learn cross-modal relationships, while on a pixel level, TerraMind leverages fine-grained representations to capture critical spatial nuances. We pretrained TerraMind on nine geospatial modalities of a global, large-scale dataset. In this paper, we demonstrate that (i) TerraMind's dual-scale early fusion approach unlocks a range of zero-shot and few-shot applications for Earth observation, (ii) TerraMind introduces "thinking in modalities" (TiM)--the capability of generating additional artificial data during finetuning and inference to improve the model output--and (iii) TerraMind achieves beyond state-of-the-art performance in community-standard benchmarks for EO like PANGAEA. All models and code have been open-sourced under a permissive license at https://huggingface.co/ibm-esa-geospatial and https://github.com/ibm/terramind.
Cite
Text
Jakubik et al. "TerraMind: Large-Scale Generative Multimodality for Earth Observation." International Conference on Computer Vision, 2025.Markdown
[Jakubik et al. "TerraMind: Large-Scale Generative Multimodality for Earth Observation." International Conference on Computer Vision, 2025.](https://mlanthology.org/iccv/2025/jakubik2025iccv-terramind/)BibTeX
@inproceedings{jakubik2025iccv-terramind,
title = {{TerraMind: Large-Scale Generative Multimodality for Earth Observation}},
author = {Jakubik, Johannes and Yang, Felix and Blumenstiel, Benedikt and Scheurer, Erik and Sedona, Rocco and Maurogiovanni, Stefano and Bosmans, Jente and Dionelis, Nikolaos and Marsocci, Valerio and Kopp, Niklas and Ramachandran, Rahul and Fraccaro, Paolo and Brunschwiler, Thomas and Cavallaro, Gabriele and Bernabe-Moreno, Juan and Longépé, Nicolas},
booktitle = {International Conference on Computer Vision},
year = {2025},
pages = {7383-7394},
url = {https://mlanthology.org/iccv/2025/jakubik2025iccv-terramind/}
}