ConText: Driving In-Context Learning for Text Removal and Segmentation

Abstract

This paper presents the first study on adapting the visual in-context learning (V-ICL) paradigm to optical character recognition tasks, specifically focusing on text removal and segmentation. Most existing V-ICL generalists employ a reasoning-as-reconstruction approach: they turn to using a straightforward image-label compositor as the prompt and query input, and then masking the query label to generate the desired output. This direct prompt confines the model to a challenging single-step reasoning process. To address this, we propose a task-chaining compositor in the form of image-removal-segmentation, providing an enhanced prompt that elicits reasoning with enriched intermediates. Additionally, we introduce context-aware aggregation, integrating the chained prompt pattern into the latent query representation, thereby strengthening the model’s in-context reasoning. We also consider the issue of visual heterogeneity, which complicates the selection of homogeneous demonstrations in text recognition. Accordingly, this is effectively addressed through a simple self-prompting strategy, preventing the model’s in-context learnability from devolving into specialist-like, context-free inference. Collectively, these insights culminate in our ConText model, which achieves new state-of-the-art across both in- and out-of-domain benchmarks. The code is available at https://github.com/Ferenas/ConText.

Cite

Text

Zhang et al. "ConText: Driving In-Context Learning for Text Removal and Segmentation." Proceedings of the 42nd International Conference on Machine Learning, 2025.

Markdown

[Zhang et al. "ConText: Driving In-Context Learning for Text Removal and Segmentation." Proceedings of the 42nd International Conference on Machine Learning, 2025.](https://mlanthology.org/icml/2025/zhang2025icml-context/)

BibTeX

@inproceedings{zhang2025icml-context,
  title     = {{ConText: Driving In-Context Learning for Text Removal and Segmentation}},
  author    = {Zhang, Fei and Zhang, Pei and Yang, Baosong and Huang, Fei and Wang, Yanfeng and Zhang, Ya},
  booktitle = {Proceedings of the 42nd International Conference on Machine Learning},
  year      = {2025},
  pages     = {76998-77016},
  volume    = {267},
  url       = {https://mlanthology.org/icml/2025/zhang2025icml-context/}
}