Multimodal Video Understanding Using Graph Neural Network
Abstract
Majority of existing semantic video understanding methods process every video independently without considering the underlying inter-video relationships. However, videos uploaded by individuals on social media platforms like YouTube, Instagram etc. exhibit inter-video relationship which are a reflection of individual’s interest, geography, culture etc. In this work, we explicitly attempt to model this inter-video relationship, originating from the creators of these videos using Graph Neural Networks (GNN) in a multimodal setup. We perform video classification by leveraging the creators of the videos and semantic similarity between for creating edges between videos and observe improvements of 4% in accuracy
Cite
Text
Singh and Gupta. "Multimodal Video Understanding Using Graph Neural Network." NeurIPS 2022 Workshops: GLFrontiers, 2022.Markdown
[Singh and Gupta. "Multimodal Video Understanding Using Graph Neural Network." NeurIPS 2022 Workshops: GLFrontiers, 2022.](https://mlanthology.org/neuripsw/2022/singh2022neuripsw-multimodal/)BibTeX
@inproceedings{singh2022neuripsw-multimodal,
title = {{Multimodal Video Understanding Using Graph Neural Network}},
author = {Singh, Ayush and Gupta, Vikram},
booktitle = {NeurIPS 2022 Workshops: GLFrontiers},
year = {2022},
url = {https://mlanthology.org/neuripsw/2022/singh2022neuripsw-multimodal/}
}