Towards a Unified Framework for Visual Compatibility Prediction
Abstract
Visual compatibility prediction refers to the task of determining if a set of items go well together. Existing techniques for compatibility prediction prioritize sensitivity to type or context in item representations and evaluate using a fill-in-the-blank (FITB) task. We scale the FITB task to stress-test existing methods which highlight the need for a compatibility prediction framework that is sensitive to multiple modalities of item relationships. In this work, we introduce a unified framework for compatibility learning that is jointly conditioned on the type, context, and style. The framework is composed of TC-GAE, a graph-based network that models type & context; SAE, an autoencoder that models style; and a reinforcement-learning based search technique that incorporates these modalities to learn a unified compatibility measure. We conduct experiments on two standard datasets and significantly outperform existing state-of-the-art methods. We also present qualitative analysis and discussions to study the impact of components of the proposed framework.
Cite
Text
Singhal et al. "Towards a Unified Framework for Visual Compatibility Prediction." Winter Conference on Applications of Computer Vision, 2020.Markdown
[Singhal et al. "Towards a Unified Framework for Visual Compatibility Prediction." Winter Conference on Applications of Computer Vision, 2020.](https://mlanthology.org/wacv/2020/singhal2020wacv-unified/)BibTeX
@inproceedings{singhal2020wacv-unified,
title = {{Towards a Unified Framework for Visual Compatibility Prediction}},
author = {Singhal, Anirudh and Chopra, Ayush and Ayush, Kumar and Govind, Utkarsh Patel and Krishnamurthy, Balaji},
booktitle = {Winter Conference on Applications of Computer Vision},
year = {2020},
url = {https://mlanthology.org/wacv/2020/singhal2020wacv-unified/}
}