Spotting Audio-Visual Inconsistencies (SAVI) in Manipulated Video

Bolles, Robert C.; Burns, J. Brian; Graciarena, Martin; Kathol, Andreas; Lawson, Aaron; McLaren, Mitchell; Mensink, Thomas

doi:10.1109/CVPRW.2017.238

Spotting Audio-Visual Inconsistencies (SAVI) in Manipulated Video

Robert C. Bolles, J. Brian Burns, Martin Graciarena, Andreas Kathol, Aaron Lawson, Mitchell McLaren, Thomas Mensink

CVPRW 2017 pp. 1907-1914

doi:10.1109/CVPRW.2017.238 /cvprw/2017/bolles2017cvprw-spotting/

Abstract

This paper is part of a larger effort to detect manipulations of video by searching for and combining the evidence of multiple types of inconsistencies between the audio and visual channels. Here, we focus on inconsistencies between the type of scenes detected in the audio and visual modalities (e.g., audio indoor, small room versus visual outdoor, urban), and inconsistencies in speaker identity tracking over a video given audio speaker features and visual face features (e.g., a voice change, but no talking face change). The scene inconsistency task was complicated by mismatches in the categories used in current visual scene and audio scene collections. To deal with this, we employed a novel semantic mapping method. The speaker identity inconsistency process was challenged by the complexity of comparing face tracks and audio speech clusters, requiring a novel method of fusing these two sources. Our progress on both tasks was demonstrated on two collections of tampered videos.

CVPRW Semantic Scholar

Cite

Text

Bolles et al. "Spotting Audio-Visual Inconsistencies (SAVI) in Manipulated Video." IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2017. doi:10.1109/CVPRW.2017.238

Markdown

[Bolles et al. "Spotting Audio-Visual Inconsistencies (SAVI) in Manipulated Video." IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2017.](https://mlanthology.org/cvprw/2017/bolles2017cvprw-spotting/) doi:10.1109/CVPRW.2017.238

BibTeX

@inproceedings{bolles2017cvprw-spotting,
  title     = {{Spotting Audio-Visual Inconsistencies (SAVI) in Manipulated Video}},
  author    = {Bolles, Robert C. and Burns, J. Brian and Graciarena, Martin and Kathol, Andreas and Lawson, Aaron and McLaren, Mitchell and Mensink, Thomas},
  booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops},
  year      = {2017},
  pages     = {1907-1914},
  doi       = {10.1109/CVPRW.2017.238},
  url       = {https://mlanthology.org/cvprw/2017/bolles2017cvprw-spotting/}
}