VSCoDe: Visual-Augmentation Selection for Contrastive Decoding
Abstract
Despite the impressive performance of recent Large Vision-Language Models (LVLMs), these models often produce inaccurate responses. To address this issue, previous studies have aimed to reduce hallucinations by using contrastive decoding (CD) with modified images, such as cropping objects related to query or adding noise, thereby contrasting with the original image. However, these methods have several limitations. First, employing fixed visual augmentation, such as adding noise, is a simple approach but too rigid to contrast on various queries. Conversely, using semantics in queries or images by leveraging external models can adaptively generate contrastive images, but it entails significant additional costs. To address these shortcomings, we explore using pre-defined visual augmentations to enable flexible adaptation to each query without relying on external models. We observe that each query achieves different contrasts through different visual augmentations. Based on this, we propose a novel method called VSCoDe, Visual-Augmentation Selection for Contrastive Decoding, which adaptively selects augmentations using a proposed distance metric to identify those with higher contrast. Our empirical evaluations demonstrate that VSCoDe outperforms previous methods and enhances the quality of various vision-language tasks without additional training or reliance on external models.
Cite
Text
Kim et al. "VSCoDe: Visual-Augmentation Selection for Contrastive Decoding." Transactions on Machine Learning Research, 2025.Markdown
[Kim et al. "VSCoDe: Visual-Augmentation Selection for Contrastive Decoding." Transactions on Machine Learning Research, 2025.](https://mlanthology.org/tmlr/2025/kim2025tmlr-vscode/)BibTeX
@article{kim2025tmlr-vscode,
title = {{VSCoDe: Visual-Augmentation Selection for Contrastive Decoding}},
author = {Kim, Sihyeon and Cho, Boryeong and Bae, Sangmin and Ahn, Sumyeong and Yun, Se-Young},
journal = {Transactions on Machine Learning Research},
year = {2025},
url = {https://mlanthology.org/tmlr/2025/kim2025tmlr-vscode/}
}