Learning Feature Representations for Look-Alike Images

Abstract

Human perception of visual similarity relies on information varying from low-level features such as texture and color, to high-level features such as objects and elements. While generic features learned for image or face recognition tasks somewhat correlate with the perceived visual similarity, they are found to be inadequate for matching look-alike images. In this paper, we learn the 'look-alike feature' embedding, capable of representing the perceived image similarity, by fusing low- and high-level features within a modified CNN encoder architecture. This encoder is trained using the triplet loss paradigm on look-alike image pairs. Our findings demonstrate that combining features from different layers across the network is beneficial for look-alike image matching, and clearly outperforms the standard pretrained networks followed by finetuning. Furthermore, we show that the learned similarities are meaningful, and capture color, shape, facial or holistic appearance patterns, depending upon context and image modalities.

Cite

Text

Takmaz et al. "Learning Feature Representations for Look-Alike Images." IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2019.

Markdown

[Takmaz et al. "Learning Feature Representations for Look-Alike Images." IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2019.](https://mlanthology.org/cvprw/2019/takmaz2019cvprw-learning/)

BibTeX

@inproceedings{takmaz2019cvprw-learning,
  title     = {{Learning Feature Representations for Look-Alike Images}},
  author    = {Takmaz, Ayça and Probst, Thomas and Paudel, Danda Pani and Van Gool, Luc},
  booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops},
  year      = {2019},
  pages     = {21-24},
  url       = {https://mlanthology.org/cvprw/2019/takmaz2019cvprw-learning/}
}