Escaping Low-Rank Traps: Interpretable Visual Concept Learning via Implicit Vector Quantization

Gao, Shujian; Wang, Yuan; Ma, Chenglong; Gao, Xin; Yan, Jiangtao; Ning, Junzhi; Tang, Cheng; Ji, Changkai; Xu, Huihui; Li, Wei; Huang, Ziyan; Lin, Jiashi; Hu, Ming; Liu, Jiyao; Tang, Wenhao; Du, Ye; Li, Tianbin; Ye, Jin; He, Junjun

Escaping Low-Rank Traps: Interpretable Visual Concept Learning via Implicit Vector Quantization

Shujian Gao, Yuan Wang, Chenglong Ma, Xin Gao, Jiangtao Yan, Junzhi Ning, Cheng Tang, Changkai Ji, Huihui Xu, Wei Li, Ziyan Huang, Jiashi Lin, Ming Hu, Jiyao Liu, Wenhao Tang, Ye Du, Tianbin Li, Jin Ye, Junjun He

ICLR 2026

/iclr/2026/gao2026iclr-escaping/

Abstract

Concept Bottleneck Models (CBMs) achieve interpretability by interposing a human-understandable concept layer between perception and label prediction. We first identify that the condition of \textit{many-to-many} mapping is necessary for robust CBMs, a prerequisite that has been largely overlooked in previous approaches. While several recent methods have attempted to establish this relationship, we observe that they suffer from the fundamental issue of \textit{representation collapse}, where visual patch features degenerate into a low-rank subspace during training, severely degrading the quality of learned concept activation vectors, thus hindering both model interpretability and downstream performance. To address these issues, we propose Implicit Vector Quantization (IVQ), a lightweight regularizer that maintains high-rank, diverse representations throughout training. Rather than imposing a hard bottleneck via direct quantization, IVQ learns a codebook prior that anchors semantic information in visual features, allowing it to act as a proxy objective. To further exploit these high-rank concept-aware features, we propose Magnet Attention, which dynamically aggregates patch-level features into visual concept prototypes, explicitly modeling the many-to-many vision–concept correspondence. Extensive experimental results show that our approach effectively prevents representational collapse and achieves state-of-the-art performance on diverse benchmarks. Our experiments further probe the low-rank phenomenon in representational collapse, finding that IVQ mitigates the information bottleneck and yields cross-modal representations with clearer, more interpretable consistency. Code is available at \url{https://github.com/Daryl-GSJ/IVQ-CBM}.

PDF ICLR OpenReview Semantic Scholar

Cite

Text

Gao et al. "Escaping Low-Rank Traps: Interpretable Visual Concept Learning via Implicit Vector Quantization." International Conference on Learning Representations, 2026.

Markdown

[Gao et al. "Escaping Low-Rank Traps: Interpretable Visual Concept Learning via Implicit Vector Quantization." International Conference on Learning Representations, 2026.](https://mlanthology.org/iclr/2026/gao2026iclr-escaping/)

BibTeX

@inproceedings{gao2026iclr-escaping,
  title     = {{Escaping Low-Rank Traps: Interpretable Visual Concept Learning via Implicit Vector Quantization}},
  author    = {Gao, Shujian and Wang, Yuan and Ma, Chenglong and Gao, Xin and Yan, Jiangtao and Ning, Junzhi and Tang, Cheng and Ji, Changkai and Xu, Huihui and Li, Wei and Huang, Ziyan and Lin, Jiashi and Hu, Ming and Liu, Jiyao and Tang, Wenhao and Du, Ye and Li, Tianbin and Ye, Jin and He, Junjun},
  booktitle = {International Conference on Learning Representations},
  year      = {2026},
  url       = {https://mlanthology.org/iclr/2026/gao2026iclr-escaping/}
}