Decomposition of Concept-Level Rules in Visual Scenes

Abstract

Human cognition is compositional, and one can parse a visual scene into independent concepts and the corresponding concept-changing rules. By contrast, many vision-language systems process images holistically, with limited support for explicit decomposition. Previous methods of decomposing concepts and rules often rely on hand-crafted inductive biases or human-designed priors. We introduce a Concept-Rule Decomposition (CRD) framework to decompose concept-level rules with Large Vision-Language Models (LVLMs), which explains visual input by leveraging LVLM-extracted concepts and the rules governing their variation. The proposed method operates in two stages: (1) a pretrained LVLM proposes visual concepts and concept values, which are employed to instantiate a space of concept rule functions that model concept changes and spatial distributions; (2) an iterative process to select a concise set of concepts that best account for the input according to the rule function. We evaluate CRD on an abstract visual reasoning benchmark, a spatial reasoning benchmark, and a real-world image caption dataset. Across both settings, our approach outperforms baseline models while improving interpretability by explicitly revealing underlying concepts and compositional rules, advancing explainable and generalizable visual reasoning.

Cite

Text

Shi et al. "Decomposition of Concept-Level Rules in Visual Scenes." International Conference on Learning Representations, 2026.

Markdown

[Shi et al. "Decomposition of Concept-Level Rules in Visual Scenes." International Conference on Learning Representations, 2026.](https://mlanthology.org/iclr/2026/shi2026iclr-decomposition/)

BibTeX

@inproceedings{shi2026iclr-decomposition,
  title     = {{Decomposition of Concept-Level Rules in Visual Scenes}},
  author    = {Shi, Fan and Liang, Yuxuan and Chen, Xiaolei and Yu, Haiyang and Li, Xu and Zheng, Yi and Zhu, Rui and Xue, Xiangyang and Li, Bin},
  booktitle = {International Conference on Learning Representations},
  year      = {2026},
  url       = {https://mlanthology.org/iclr/2026/shi2026iclr-decomposition/}
}