Towards a Unified Framework for Visual Compatibility Prediction

Abstract

Visual compatibility prediction refers to the task of determining if a set of items go well together. Existing techniques for compatibility prediction prioritize sensitivity to type or context in item representations and evaluate using a fill-in-the-blank (FITB) task. We scale the FITB task to stress-test existing methods which highlight the need for a compatibility prediction framework that is sensitive to multiple modalities of item relationships. In this work, we introduce a unified framework for compatibility learning that is jointly conditioned on the type, context, and style. The framework is composed of TC-GAE, a graph-based network that models type & context; SAE, an autoencoder that models style; and a reinforcement-learning based search technique that incorporates these modalities to learn a unified compatibility measure. We conduct experiments on two standard datasets and significantly outperform existing state-of-the-art methods. We also present qualitative analysis and discussions to study the impact of components of the proposed framework.

Cite

Text

Singhal et al. "Towards a Unified Framework for Visual Compatibility Prediction." Winter Conference on Applications of Computer Vision, 2020.

Markdown

[Singhal et al. "Towards a Unified Framework for Visual Compatibility Prediction." Winter Conference on Applications of Computer Vision, 2020.](https://mlanthology.org/wacv/2020/singhal2020wacv-unified/)

BibTeX

@inproceedings{singhal2020wacv-unified,
  title     = {{Towards a Unified Framework for Visual Compatibility Prediction}},
  author    = {Singhal, Anirudh and Chopra, Ayush and Ayush, Kumar and Govind, Utkarsh Patel and Krishnamurthy, Balaji},
  booktitle = {Winter Conference on Applications of Computer Vision},
  year      = {2020},
  url       = {https://mlanthology.org/wacv/2020/singhal2020wacv-unified/}
}