Segment-Then-Rank: Non-Factoid Question Answering on Instructional Videos
Abstract
We study the problem of non-factoid QA on instructional videos. Existing work focuses either on visual or textual modality of video content, to find matching answers to the question. However, neither is flexible enough for our problem setting of non-factoid answers with varying lengths. Motivated by this, we propose a two-stage model: (a) multimodal segmentation of video into span candidates and (b) length-adaptive ranking of the candidates to the question. First, for segmentation, we propose Segmenter for generating span candidates of diverse length, considering both textual and visual modality. Second, for ranking, we propose Ranker to score the candidates, dynamically combining the two models with complementary strength for both short and long spans respectively. Experimental result demonstrates that our model achieves state-of-the-art performance.
Cite
Text
Lee et al. "Segment-Then-Rank: Non-Factoid Question Answering on Instructional Videos." AAAI Conference on Artificial Intelligence, 2020. doi:10.1609/AAAI.V34I05.6327Markdown
[Lee et al. "Segment-Then-Rank: Non-Factoid Question Answering on Instructional Videos." AAAI Conference on Artificial Intelligence, 2020.](https://mlanthology.org/aaai/2020/lee2020aaai-segment/) doi:10.1609/AAAI.V34I05.6327BibTeX
@inproceedings{lee2020aaai-segment,
title = {{Segment-Then-Rank: Non-Factoid Question Answering on Instructional Videos}},
author = {Lee, Kyungjae and Duan, Nan and Ji, Lei and Li, Jason and Hwang, Seung-won},
booktitle = {AAAI Conference on Artificial Intelligence},
year = {2020},
pages = {8147-8154},
doi = {10.1609/AAAI.V34I05.6327},
url = {https://mlanthology.org/aaai/2020/lee2020aaai-segment/}
}