An End-to-End Vision Transformer Approach for Image Copy Detection

Abstract

Image copy detection is one of the pivotal tools to safeguard online information integrity. The challenge lies in determining whether a query image is an edited copy, which necessitates the identification of candidate source images through a retrieval process. The process requires discriminative features comprising of both global descriptors that are designed to be augmentation-invariant and local descriptors that can capture salient foreground objects to assess whether a query image is an edited copy of some source reference image. This work describes an end-to-end solution that leverage a Vision Transformer model to learn such discriminative features and perform implicit matching between the query image and the reference image. Experimental results on two benchmark datasets demonstrate that the proposed solution outperforms state-of-the-art methods. Case studies illustrate the effectiveness of our approach in matching reference images from which the query images have been copy-edited.

Cite

Text

Lee et al. "An End-to-End Vision Transformer Approach for Image Copy Detection." IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2024. doi:10.1109/CVPRW63382.2024.00693

Markdown

[Lee et al. "An End-to-End Vision Transformer Approach for Image Copy Detection." IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2024.](https://mlanthology.org/cvprw/2024/lee2024cvprw-endtoend/) doi:10.1109/CVPRW63382.2024.00693

BibTeX

@inproceedings{lee2024cvprw-endtoend,
  title     = {{An End-to-End Vision Transformer Approach for Image Copy Detection}},
  author    = {Lee, Jiahe Steven and Hsu, Wynne and Lee, Mong-Li},
  booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops},
  year      = {2024},
  pages     = {6997-7006},
  doi       = {10.1109/CVPRW63382.2024.00693},
  url       = {https://mlanthology.org/cvprw/2024/lee2024cvprw-endtoend/}
}