Activity Image-to-Video Retrieval by Disentangling Appearance and Motion

Abstract

With the rapid emergence of video data, image-to-video retrieval has attracted much attention. There are two types of image-to-video retrieval: instance-based and activity-based. The former task aims to retrieve videos containing the same main objects as the query image, while the latter focuses on finding the similar activity. Since dynamic information plays a significant role in the video, we pay attention to the latter task to explore the motion relation between images and videos. In this paper, we propose a Motion-assisted Activity Proposal-based Image-to-Video Retrieval (MAP-IVR) approach to disentangle the video features into motion features and appearance features and obtain appearance features from the images. Then, we perform image-to-video translation to improve the disentanglement quality. The retrieval is performed in both appearance and video feature spaces. Extensive experiments demonstrate that our MAP-IVR approach remarkably outperforms the state-of-the-art approaches on two benchmark activity-based video datasets.

Cite

Text

Liu et al. "Activity Image-to-Video Retrieval by Disentangling Appearance and Motion." AAAI Conference on Artificial Intelligence, 2021. doi:10.1609/AAAI.V35I3.16312

Markdown

[Liu et al. "Activity Image-to-Video Retrieval by Disentangling Appearance and Motion." AAAI Conference on Artificial Intelligence, 2021.](https://mlanthology.org/aaai/2021/liu2021aaai-activity/) doi:10.1609/AAAI.V35I3.16312

BibTeX

@inproceedings{liu2021aaai-activity,
  title     = {{Activity Image-to-Video Retrieval by Disentangling Appearance and Motion}},
  author    = {Liu, Liu and Li, Jiangtong and Niu, Li and Xu, Ruicong and Zhang, Liqing},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {2021},
  pages     = {2145-2153},
  doi       = {10.1609/AAAI.V35I3.16312},
  url       = {https://mlanthology.org/aaai/2021/liu2021aaai-activity/}
}