Leveraging Data to Say No: Memory Augmented Plug-and-Play Selective Prediction

Abstract

Selective prediction aims to endow predictors with a reject option, to avoid low confidence predictions. However, existing literature has primarily focused on closed-set tasks, such as visual question answering with predefined options or fixed-category classification. This paper considers selective prediction for visual language foundation models, addressing a taxonomy of tasks ranging from closed to open set and from finite to unbounded vocabularies, as in image captioning. We seek training-free approaches of low-complexity, applicable to any foundation model and consider methods based on external vision-language model (VLM) embeddings, like CLIP. This is denoted as $\textit{Plug-and-Play Selective Prediction} (\textbf{\texttt{PaPSP}})$. We identify two key challenges: (1) $\textit{instability of the visual-language representations}$, leading to high variance in image-text embeddings, and (2) $\textit{poor calibration of similarity scores}$. To address these issues, we propose a $\textit{memory augmented}$ $\textbf{\texttt{PaPSP}}$ ($\textbf{\texttt{MA-PaPSP}}$) model, which augments $\textbf{\texttt{PaPSP}}$ with a retrieval dataset of image-text pairs. This is leveraged to reduce embedding variance by averaging retrieved nearest-neighbor pairs and is complemented by the use of contrastive normalization to improve score calibration. Through extensive experiments on multiple datasets, we show that $\textbf{\texttt{MA-PaPSP}}$ outperforms $\textbf{\texttt{PaPSP}}$ and other selective prediction baselines for selective captioning, image-text matching, and fine-grained classification. Source code will be made public.

Cite

Text

Sarkar et al. "Leveraging Data to Say No: Memory Augmented Plug-and-Play Selective Prediction." International Conference on Learning Representations, 2026.

Markdown

[Sarkar et al. "Leveraging Data to Say No: Memory Augmented Plug-and-Play Selective Prediction." International Conference on Learning Representations, 2026.](https://mlanthology.org/iclr/2026/sarkar2026iclr-leveraging/)

BibTeX

@inproceedings{sarkar2026iclr-leveraging,
  title     = {{Leveraging Data to Say No: Memory Augmented Plug-and-Play Selective Prediction}},
  author    = {Sarkar, Aditya and Li, Yi and Cheng, Jiacheng and Mishra, Shlok Kumar and Vasconcelos, Nuno},
  booktitle = {International Conference on Learning Representations},
  year      = {2026},
  url       = {https://mlanthology.org/iclr/2026/sarkar2026iclr-leveraging/}
}