Is Visual In-Context Learning for Compositional Medical Tasks Within Reach?

Abstract

In this paper, we explore the potential of visual in-context learning to enable a single model to handle multiple tasks and adapt to new tasks during test time without re-training. Unlike previous approaches, our focus is on training in-context learners to adapt to sequences of tasks, rather than individual tasks. Our goal is to solve complex tasks that involve multiple intermediate steps using a single model, allowing users to define entire vision pipelines flexibly at test time. To achieve this, we first examine the properties and limitations of visual in-context learning architectures, with a particular focus on the role of codebooks. We then introduce a novel method for training in-context learners using a synthetic compositional task generation engine. This engine bootstraps task sequences from arbitrary segmentation datasets, enabling the training of visual in-context learners for compositional tasks. Additionally, we investigate different masking-based training objectives to gather insights into how to train models better for solving complex, compositional tasks. Our exploration not only provides important insights especially for multi-modal medical task sequences but also highlights challenges that need to be addressed.

Cite

Text

Reiß et al. "Is Visual In-Context Learning for Compositional Medical Tasks Within Reach?." International Conference on Computer Vision, 2025.

Markdown

[Reiß et al. "Is Visual In-Context Learning for Compositional Medical Tasks Within Reach?." International Conference on Computer Vision, 2025.](https://mlanthology.org/iccv/2025/rei2025iccv-visual/)

BibTeX

@inproceedings{rei2025iccv-visual,
  title     = {{Is Visual In-Context Learning for Compositional Medical Tasks Within Reach?}},
  author    = {Reiß, Simon and Marinov, Zdravko and Jaus, Alexander and Seibold, Constantin and Sarfraz, M. Saquib and Rodner, Erik and Stiefelhagen, Rainer},
  booktitle = {International Conference on Computer Vision},
  year      = {2025},
  pages     = {2642-2652},
  url       = {https://mlanthology.org/iccv/2025/rei2025iccv-visual/}
}