VSCoDe: Visual-Augmentation Selection for Contrastive Decoding

Kim, Sihyeon; Cho, Boryeong; Bae, Sangmin; Ahn, Sumyeong; Yun, Se-Young

VSCoDe: Visual-Augmentation Selection for Contrastive Decoding

Sihyeon Kim, Boryeong Cho, Sangmin Bae, Sumyeong Ahn, Se-Young Yun

TMLR 2025

/tmlr/2025/kim2025tmlr-vscode/

Abstract

Despite the impressive performance of recent Large Vision-Language Models (LVLMs), these models often produce inaccurate responses. To address this issue, previous studies have aimed to reduce hallucinations by using contrastive decoding (CD) with modified images, such as cropping objects related to query or adding noise, thereby contrasting with the original image. However, these methods have several limitations. First, employing fixed visual augmentation, such as adding noise, is a simple approach but too rigid to contrast on various queries. Conversely, using semantics in queries or images by leveraging external models can adaptively generate contrastive images, but it entails significant additional costs. To address these shortcomings, we explore using pre-defined visual augmentations to enable flexible adaptation to each query without relying on external models. We observe that each query achieves different contrasts through different visual augmentations. Based on this, we propose a novel method called VSCoDe, Visual-Augmentation Selection for Contrastive Decoding, which adaptively selects augmentations using a proposed distance metric to identify those with higher contrast. Our empirical evaluations demonstrate that VSCoDe outperforms previous methods and enhances the quality of various vision-language tasks without additional training or reliance on external models.

PDF TMLR Semantic Scholar

Cite

Text

Kim et al. "VSCoDe: Visual-Augmentation Selection for Contrastive Decoding." Transactions on Machine Learning Research, 2025.

Markdown

[Kim et al. "VSCoDe: Visual-Augmentation Selection for Contrastive Decoding." Transactions on Machine Learning Research, 2025.](https://mlanthology.org/tmlr/2025/kim2025tmlr-vscode/)

BibTeX

@article{kim2025tmlr-vscode,
  title     = {{VSCoDe: Visual-Augmentation Selection for Contrastive Decoding}},
  author    = {Kim, Sihyeon and Cho, Boryeong and Bae, Sangmin and Ahn, Sumyeong and Yun, Se-Young},
  journal   = {Transactions on Machine Learning Research},
  year      = {2025},
  url       = {https://mlanthology.org/tmlr/2025/kim2025tmlr-vscode/}
}