COP-GEN-Beta: Unified Generative Modelling of COPernicus Imagery Thumbnails

Abstract

In remote sensing, multi-modal data from various sensors capturing the same scene offers rich opportunities, but learning a unified representation across these modalities remains a significant challenge. Traditional methods have often been limited to single or dual-modality approaches. In this paper, we introduce COP-GEN-Beta, a generative diffusion model trained on optical, radar, and elevation data from the Major TOM dataset. What sets COP-GEN-Beta apart is its ability to map any subset of modalities to any other, enabling zero-shot modality translation after training. This is achieved through a sequence-based diffusion transformer, where each modality is controlled by its own timestep embedding. We extensively evaluate COP-GEN-Beta on thumbnail images from the Major TOM dataset, demonstrating its effectiveness in generating high-quality samples. Qualitative and quantitative evaluations validate the model's performance, highlighting its potential as a powerful pre-trained model for future remote sensing tasks.

Cite

Text

Espinosa et al. "COP-GEN-Beta: Unified Generative Modelling of COPernicus Imagery Thumbnails." IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2025.

Markdown

[Espinosa et al. "COP-GEN-Beta: Unified Generative Modelling of COPernicus Imagery Thumbnails." IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2025.](https://mlanthology.org/cvprw/2025/espinosa2025cvprw-copgenbeta/)

BibTeX

@inproceedings{espinosa2025cvprw-copgenbeta,
  title     = {{COP-GEN-Beta: Unified Generative Modelling of COPernicus Imagery Thumbnails}},
  author    = {Espinosa, Miguel and Marsocci, Valerio and Jia, Yuru and Crowley, Elliot and Czerkawski, Mikolaj},
  booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops},
  year      = {2025},
  pages     = {3085-3095},
  url       = {https://mlanthology.org/cvprw/2025/espinosa2025cvprw-copgenbeta/}
}