Unbiased Video Scene Graph Generation via Visual and Semantic Dual Debiasing

Abstract

Video Scene Graph Generation (VidSGG) aims to capture dynamic relationships among entities by sequentially analyzing video frames and integrating visual and semantic information. However, VidSGG is challenged by significant biases that skew predictions. To mitigate these biases, we propose a VIsual and Semantic Awareness (VISA) framework for unbiased VidSGG. VISA addresses visual bias through an innovative memory update mechanism that enhances object representations and concurrently reduces semantic bias by iteratively integrating object features with comprehensive semantic information derived from triplet relationships. This visual-semantics dual debiasing approach results in more unbiased representations of complex scene dynamics. Extensive experiments demonstrate the effectiveness of our method, where VISA outperforms existing unbiased VidSGG approaches by a substantial margin (e.g., +13.1% improvement in mR@20 and mR@50 for the SGCLS task under Semi Constraint).

Cite

Text

Li et al. "Unbiased Video Scene Graph Generation via Visual and Semantic Dual Debiasing." Conference on Computer Vision and Pattern Recognition, 2025. doi:10.1109/CVPR52734.2025.01774

Markdown

[Li et al. "Unbiased Video Scene Graph Generation via Visual and Semantic Dual Debiasing." Conference on Computer Vision and Pattern Recognition, 2025.](https://mlanthology.org/cvpr/2025/li2025cvpr-unbiased/) doi:10.1109/CVPR52734.2025.01774

BibTeX

@inproceedings{li2025cvpr-unbiased,
  title     = {{Unbiased Video Scene Graph Generation via Visual and Semantic Dual Debiasing}},
  author    = {Li, Yanjun and Li, Zhaoyang and Chen, Honghui and Xu, Lizhi},
  booktitle = {Conference on Computer Vision and Pattern Recognition},
  year      = {2025},
  pages     = {19047-19056},
  doi       = {10.1109/CVPR52734.2025.01774},
  url       = {https://mlanthology.org/cvpr/2025/li2025cvpr-unbiased/}
}