Automated Detection of Visual Attribute Reliance with a Self-Reflective Agent

Abstract

When a vision model performs image recognition, which visual attributes drive its predictions? Detecting unintended reliance on specific visual features is critical for ensuring model robustness, preventing overfitting, and avoiding spurious correlations. We introduce an automated framework for detecting such dependencies in trained vision models. At the core of our method is a self-reflective agent that systematically generates and tests hypotheses about visual attributes that a model may rely on. This process is iterative: the agent refines its hypotheses based on experimental outcomes and uses a self-evaluation protocol to assess whether its findings accurately explain model behavior. When inconsistencies arise, the agent self-reflects over its findings and triggers a new cycle of experimentation. We evaluate our approach on a novel benchmark of 130 models designed to exhibit diverse visual attribute dependencies across 18 categories. Our results show that the agent's performance consistently improves with self-reflection, with a significant performance increase over non-reflective baselines. We further demonstrate that the agent identifies real-world visual attribute dependencies in state-of-the-art models, including CLIP's vision encoder and the YOLOv8 object detector.

Cite

Text

Li et al. "Automated Detection of Visual Attribute Reliance with a Self-Reflective Agent." Advances in Neural Information Processing Systems, 2025.

Markdown

[Li et al. "Automated Detection of Visual Attribute Reliance with a Self-Reflective Agent." Advances in Neural Information Processing Systems, 2025.](https://mlanthology.org/neurips/2025/li2025neurips-automated/)

BibTeX

@inproceedings{li2025neurips-automated,
  title     = {{Automated Detection of Visual Attribute Reliance with a Self-Reflective Agent}},
  author    = {Li, Christy and Camuñas, Josep Lopez and Touchet, Jake Thomas and Andreas, Jacob and Lapedriza, Agata and Torralba, Antonio and Shaham, Tamar Rott},
  booktitle = {Advances in Neural Information Processing Systems},
  year      = {2025},
  url       = {https://mlanthology.org/neurips/2025/li2025neurips-automated/}
}