SpurLens: Finding Spurious Correlations in Multimodal LLMs

Abstract

While multimodal large language models (MLLMs) exhibit remarkable capabilities in visual and textual understanding, they remain highly susceptible to spurious correlations. We propose SpurLens, a novel pipeline leveraging LLMs and open-set object detectors to identify spurious cues and measure their effect on MLLMs in an object detection scenario. Furthermore, we tested different prompting strategies to mitigate this issue, but none proved effective. These findings highlight the urgent need for robust solutions to address spurious correlations in MLLMs.

Cite

Text

Hosseini et al. "SpurLens: Finding Spurious Correlations in Multimodal LLMs." ICLR 2025 Workshops: SCSL, 2025.

Markdown

[Hosseini et al. "SpurLens: Finding Spurious Correlations in Multimodal LLMs." ICLR 2025 Workshops: SCSL, 2025.](https://mlanthology.org/iclrw/2025/hosseini2025iclrw-spurlens/)

BibTeX

@inproceedings{hosseini2025iclrw-spurlens,
  title     = {{SpurLens: Finding Spurious Correlations in Multimodal LLMs}},
  author    = {Hosseini, Parsa and Nawathe, Sumit and Moayeri, Mazda and Balasubramanian, Sriram and Feizi, Soheil},
  booktitle = {ICLR 2025 Workshops: SCSL},
  year      = {2025},
  url       = {https://mlanthology.org/iclrw/2025/hosseini2025iclrw-spurlens/}
}