An End-to-End Vision Transformer Approach for Image Copy Detection
Abstract
Image copy detection is one of the pivotal tools to safeguard online information integrity. The challenge lies in determining whether a query image is an edited copy, which necessitates the identification of candidate source images through a retrieval process. The process requires discriminative features comprising of both global descriptors that are designed to be augmentation-invariant and local descriptors that can capture salient foreground objects to assess whether a query image is an edited copy of some source reference image. This work describes an end-to-end solution that leverage a Vision Transformer model to learn such discriminative features and perform implicit matching between the query image and the reference image. Experimental results on two benchmark datasets demonstrate that the proposed solution outperforms state-of-the-art methods. Case studies illustrate the effectiveness of our approach in matching reference images from which the query images have been copy-edited.
Cite
Text
Lee et al. "An End-to-End Vision Transformer Approach for Image Copy Detection." IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2024. doi:10.1109/CVPRW63382.2024.00693Markdown
[Lee et al. "An End-to-End Vision Transformer Approach for Image Copy Detection." IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2024.](https://mlanthology.org/cvprw/2024/lee2024cvprw-endtoend/) doi:10.1109/CVPRW63382.2024.00693BibTeX
@inproceedings{lee2024cvprw-endtoend,
title = {{An End-to-End Vision Transformer Approach for Image Copy Detection}},
author = {Lee, Jiahe Steven and Hsu, Wynne and Lee, Mong-Li},
booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops},
year = {2024},
pages = {6997-7006},
doi = {10.1109/CVPRW63382.2024.00693},
url = {https://mlanthology.org/cvprw/2024/lee2024cvprw-endtoend/}
}