Temporal Attention Mechanism with Conditional Inference for Large-Scale Multi-Label Video Classification

Kim, Eun-Sol; On, Kyoung-Woon; Kim, Jongseok; Heo, Yu-Jung; Choi, Seong-Ho; Lee, Hyun-Dong; Zhang, Byoung-Tak

doi:10.1007/978-3-030-11018-5_28

Temporal Attention Mechanism with Conditional Inference for Large-Scale Multi-Label Video Classification

Eun-Sol Kim, Kyoung-Woon On, Jongseok Kim, Yu-Jung Heo, Seong-Ho Choi, Hyun-Dong Lee, Byoung-Tak Zhang

ECCVW 2018 pp. 306-316

doi:10.1007/978-3-030-11018-5_28 /eccvw/2018/kim2018eccvw-temporal/

Abstract

Here we show neural network based methods, which combine multimodal sequential inputs effectively and classify the inputs into multiple categories. Two key ideas are (1) to select informative frames among a sequence using attention mechanism and (2) to utilize correlation information between labels to solve multi-label classification problems. The attention mechanism is used in both modality (spatio) and sequential (temporal) dimensions to ignore noisy and meaningless frames. Furthermore, to tackle fundamental problems induced by independently predicting each label in conventional multi-label classification methods, the proposed method considers the dependencies among the labels by decomposing joint probability of labels into conditional terms. From the experimental results (5th in the Kaggle competition), we discuss how the suggested methods operate in the YouTube-8M Classification Task, what insights they have, and why they succeed or fail.

PDF ECCVW Semantic Scholar

Cite

Text

Kim et al. "Temporal Attention Mechanism with Conditional Inference for Large-Scale Multi-Label Video Classification." European Conference on Computer Vision Workshops, 2018. doi:10.1007/978-3-030-11018-5_28

Markdown

[Kim et al. "Temporal Attention Mechanism with Conditional Inference for Large-Scale Multi-Label Video Classification." European Conference on Computer Vision Workshops, 2018.](https://mlanthology.org/eccvw/2018/kim2018eccvw-temporal/) doi:10.1007/978-3-030-11018-5_28

BibTeX

@inproceedings{kim2018eccvw-temporal,
  title     = {{Temporal Attention Mechanism with Conditional Inference for Large-Scale Multi-Label Video Classification}},
  author    = {Kim, Eun-Sol and On, Kyoung-Woon and Kim, Jongseok and Heo, Yu-Jung and Choi, Seong-Ho and Lee, Hyun-Dong and Zhang, Byoung-Tak},
  booktitle = {European Conference on Computer Vision Workshops},
  year      = {2018},
  pages     = {306-316},
  doi       = {10.1007/978-3-030-11018-5_28},
  url       = {https://mlanthology.org/eccvw/2018/kim2018eccvw-temporal/}
}