Just Ask Plus: Using Transcripts for VideoQA

Pirhadi, Mohammad Javad; Mirzaei, Motahhare; Eetemadi, Sauleh

doi:10.1109/ICCVW60793.2023.00332

Just Ask Plus: Using Transcripts for VideoQA

Mohammad Javad Pirhadi, Motahhare Mirzaei, Sauleh Eetemadi

ICCVW 2023 pp. 3074-3077

doi:10.1109/ICCVW60793.2023.00332 /iccvw/2023/pirhadi2023iccvw-just/

Abstract

Social-IQ 2.0 challenge is designed to benchmark recent AI technologies' skills to reason about social interactions, which is referred as Artificial Social Intelligence in the form of a VideoQA task. In this work, we use Just Ask and SpeechT5 models as feature extractors, and reason by adding one attention layer and two transformer encoders. Our best configuration reaches 53.35% accuracy on the validation set. The code is publicly available on GitHub.

ICCVW Semantic Scholar

Cite

Text

Pirhadi et al. "Just Ask Plus: Using Transcripts for VideoQA." IEEE/CVF International Conference on Computer Vision Workshops, 2023. doi:10.1109/ICCVW60793.2023.00332

Markdown

[Pirhadi et al. "Just Ask Plus: Using Transcripts for VideoQA." IEEE/CVF International Conference on Computer Vision Workshops, 2023.](https://mlanthology.org/iccvw/2023/pirhadi2023iccvw-just/) doi:10.1109/ICCVW60793.2023.00332

BibTeX

@inproceedings{pirhadi2023iccvw-just,
  title     = {{Just Ask Plus: Using Transcripts for VideoQA}},
  author    = {Pirhadi, Mohammad Javad and Mirzaei, Motahhare and Eetemadi, Sauleh},
  booktitle = {IEEE/CVF International Conference on Computer Vision Workshops},
  year      = {2023},
  pages     = {3074-3077},
  doi       = {10.1109/ICCVW60793.2023.00332},
  url       = {https://mlanthology.org/iccvw/2023/pirhadi2023iccvw-just/}
}