UnionFormer: Unified-Learning Transformer with Multi-View Representation for Image Manipulation Detection and Localization

Abstract

We present UnionFormer a novel framework that integrates tampering clues across three views by unified learning for image manipulation detection and localization. Specifically we construct a BSFI-Net to extract tampering features from RGB and noise views achieving enhanced responsiveness to boundary artifacts while modulating spatial consistency at different scales. Additionally to explore the inconsistency between objects as a new view of clues we combine object consistency modeling with tampering detection and localization into a three-task unified learning process allowing them to promote and improve mutually. Therefore we acquire a unified manipulation discriminative representation under multi-scale supervision that consolidates information from three views. This integration facilitates highly effective concurrent detection and localization of tampering. We perform extensive experiments on diverse datasets and the results show that the proposed approach outperforms state-of-the-art methods in tampering detection and localization.

Cite

Text

Li et al. "UnionFormer: Unified-Learning Transformer with Multi-View Representation for Image Manipulation Detection and Localization." Conference on Computer Vision and Pattern Recognition, 2024. doi:10.1109/CVPR52733.2024.01190

Markdown

[Li et al. "UnionFormer: Unified-Learning Transformer with Multi-View Representation for Image Manipulation Detection and Localization." Conference on Computer Vision and Pattern Recognition, 2024.](https://mlanthology.org/cvpr/2024/li2024cvpr-unionformer/) doi:10.1109/CVPR52733.2024.01190

BibTeX

@inproceedings{li2024cvpr-unionformer,
  title     = {{UnionFormer: Unified-Learning Transformer with Multi-View Representation for Image Manipulation Detection and Localization}},
  author    = {Li, Shuaibo and Ma, Wei and Guo, Jianwei and Xu, Shibiao and Li, Benchong and Zhang, Xiaopeng},
  booktitle = {Conference on Computer Vision and Pattern Recognition},
  year      = {2024},
  pages     = {12523-12533},
  doi       = {10.1109/CVPR52733.2024.01190},
  url       = {https://mlanthology.org/cvpr/2024/li2024cvpr-unionformer/}
}