ICAR: Image-Based Complementary Auto Reasoning

Abstract

Scene-aware Complementary Item Retrieval (CIR) is a challenging task which requires to generate a set of compatible items across domains. Due to the subjectivity, it is difficult to set up a rigorous standard for both data collection and learning objectives. To address this challenging task, we propose a visual compatibility concept, composed of similarity (resembling in color, geometry, texture, and etc.) and complementarity (different items like table vs chair completing a group). Based on this notion, we propose a compatibility learning framework, a category-aware Flexible Bidirectional Transformer (FBT), for visual ``scene-based set compatibility reasoning'' with the cross-domain visual similarity input and auto-regressive complementary item generation. We introduce a ``Flexible Bidirectional Transformer (FBT),'' consisting of an encoder with flexible masking, a category prediction arm, and an auto-regressive visual embedding prediction arm. And the inputs for FBT are cross-domain visual similarity invariant embeddings, making this framework quite generalizable. Furthermore, our proposed FBT model learns the inter-object compatibility from a large set of scene images in a self-supervised way. Compared with the SOTA methods, this approach achieves up to 5.3% and 9.6% in FITB score and 22.3% and 31.8% SFID improvement on fashion and furniture, respectively.

Cite

Text

Wang et al. "ICAR: Image-Based Complementary Auto Reasoning." AAAI Conference on Artificial Intelligence, 2024. doi:10.1609/AAAI.V38I6.28374

Markdown

[Wang et al. "ICAR: Image-Based Complementary Auto Reasoning." AAAI Conference on Artificial Intelligence, 2024.](https://mlanthology.org/aaai/2024/wang2024aaai-icar/) doi:10.1609/AAAI.V38I6.28374

BibTeX

@inproceedings{wang2024aaai-icar,
  title     = {{ICAR: Image-Based Complementary Auto Reasoning}},
  author    = {Wang, Xijun and Liang, Anqi and Liang, Junbang and Lin, Ming C. and Lou, Yu and Yang, Shan},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {2024},
  pages     = {5633-5641},
  doi       = {10.1609/AAAI.V38I6.28374},
  url       = {https://mlanthology.org/aaai/2024/wang2024aaai-icar/}
}