Joint Event Detection and Description in Continuous Video Streams

Xu, Huijuan; Li, Boyang; Ramanishka, Vasili; Sigal, Leonid; Saenko, Kate

doi:10.1109/WACVW.2019.00011

Joint Event Detection and Description in Continuous Video Streams

Huijuan Xu, Boyang Li, Vasili Ramanishka, Leonid Sigal, Kate Saenko

WACVW 2019 pp. 25-26

doi:10.1109/WACVW.2019.00011 /wacvw/2019/xu2019wacvw-joint/

Abstract

Dense video captioning involves first localizing events in a video and then generating captions for the identified events. We present the Joint Event Detection and Description Network (JEDDi-Net) for solving this task in an end-to-end fashion, which encodes the input video stream with three-dimensional convolutional layers, proposes variable-length temporal events based on pooled features, and then uses a two-level hierarchical LSTM module with context modeling to transcribe the event proposals into captions. We show the effectiveness of our proposed JEDDi-Net on the large-scale ActivityNet Captions dataset.

WACVW Semantic Scholar

Cite

Text

Xu et al. "Joint Event Detection and Description in Continuous Video Streams." IEEE/CVF Winter Conference on Applications of Computer Vision Workshops, 2019. doi:10.1109/WACVW.2019.00011

Markdown

[Xu et al. "Joint Event Detection and Description in Continuous Video Streams." IEEE/CVF Winter Conference on Applications of Computer Vision Workshops, 2019.](https://mlanthology.org/wacvw/2019/xu2019wacvw-joint/) doi:10.1109/WACVW.2019.00011

BibTeX

@inproceedings{xu2019wacvw-joint,
  title     = {{Joint Event Detection and Description in Continuous Video Streams}},
  author    = {Xu, Huijuan and Li, Boyang and Ramanishka, Vasili and Sigal, Leonid and Saenko, Kate},
  booktitle = {IEEE/CVF Winter Conference on Applications of Computer Vision Workshops},
  year      = {2019},
  pages     = {25-26},
  doi       = {10.1109/WACVW.2019.00011},
  url       = {https://mlanthology.org/wacvw/2019/xu2019wacvw-joint/}
}