Learning Interpretable Forensic Representations via Local Window Modulation

Abstract

The majority of existing image forgeries involve augmenting a specific region of the source image which leaves detectable artifacts and forensic traces. These distinguishing features are mostly found in and around the local neighborhood of the manipulated pixels. However, patch-based detection approaches quickly become intractable due to inefficient computation and low robustness. In this work, we investigate how to effectively learn these forensic representations using local window-based attention techniques. We propose Forensic Modulation Network (ForMoNet) that uses focal modulation and gated attention layers to automatically identify the long and short-range context for any query pixel. Furthermore, the network is more interpretable and computationally efficient than standard self-attention, which is critical for real-world applications. Our evaluation of various benchmarks shows that ForMoNet outperforms existing transformer-based forensic networks by 6% to 11% on different forgeries.

Cite

Text

Das and Amin. "Learning Interpretable Forensic Representations via Local Window Modulation." IEEE/CVF International Conference on Computer Vision Workshops, 2023. doi:10.1109/ICCVW60793.2023.00050

Markdown

[Das and Amin. "Learning Interpretable Forensic Representations via Local Window Modulation." IEEE/CVF International Conference on Computer Vision Workshops, 2023.](https://mlanthology.org/iccvw/2023/das2023iccvw-learning/) doi:10.1109/ICCVW60793.2023.00050

BibTeX

@inproceedings{das2023iccvw-learning,
  title     = {{Learning Interpretable Forensic Representations via Local Window Modulation}},
  author    = {Das, Sowmen and Amin, Md. Ruhul},
  booktitle = {IEEE/CVF International Conference on Computer Vision Workshops},
  year      = {2023},
  pages     = {436-447},
  doi       = {10.1109/ICCVW60793.2023.00050},
  url       = {https://mlanthology.org/iccvw/2023/das2023iccvw-learning/}
}