Exploiting Long-Term Dependencies for Generating Dynamic Scene Graphs
Abstract
Dynamic scene graph generation from a video is challenging due to the temporal dynamics of the scene and the inherent temporal fluctuations of predictions. We hypothesize that capturing long-term temporal dependencies is the key to effective generation of dynamic scene graphs. We propose to learn the long-term dependencies in a video by capturing the object-level consistency and inter-object relationship dynamics over object-level long-term tracklets using transformers. Experimental results demonstrate that our "Dynamic Scene Graph Detection Transformer" (DSG-DETR) outperforms state-of-the-art methods by a significant margin on the benchmark dataset Action Genome. Our ablation studies validate the effectiveness of each component of the proposed approach. The source code is available at https://github.com/Shengyu-Feng/DSG-DETR.
Cite
Text
Feng et al. "Exploiting Long-Term Dependencies for Generating Dynamic Scene Graphs." Winter Conference on Applications of Computer Vision, 2023.Markdown
[Feng et al. "Exploiting Long-Term Dependencies for Generating Dynamic Scene Graphs." Winter Conference on Applications of Computer Vision, 2023.](https://mlanthology.org/wacv/2023/feng2023wacv-exploiting/)BibTeX
@inproceedings{feng2023wacv-exploiting,
title = {{Exploiting Long-Term Dependencies for Generating Dynamic Scene Graphs}},
author = {Feng, Shengyu and Mostafa, Hesham and Nassar, Marcel and Majumdar, Somdeb and Tripathi, Subarna},
booktitle = {Winter Conference on Applications of Computer Vision},
year = {2023},
pages = {5130-5139},
url = {https://mlanthology.org/wacv/2023/feng2023wacv-exploiting/}
}