Temporal Context Network for Activity Localization in Videos

Abstract

We present a Temporal Context Network (TCN) for precise temporal localization of human activities. Similar to the Faster-RCNN architecture, proposals are placed at equal intervals in a video which span multiple temporal scales. We propose a novel representation for ranking these proposals. Since pooling features only inside a segment is not sufficient to predict activity boundaries, we construct a representation which explicitly captures context around a proposal for ranking it. For each temporal segment inside a proposal, features are uniformly sampled at a pair of scales and are input to a temporal convolutional neural network for classification. After ranking proposals, non-maximum suppression is applied and classification is performed to obtain final detections. TCN outperforms state-of-the-art methods on the ActivityNet dataset and the THUMOS14 dataset.

Cite

Text

Dai et al. "Temporal Context Network for Activity Localization in Videos." International Conference on Computer Vision, 2017. doi:10.1109/ICCV.2017.610

Markdown

[Dai et al. "Temporal Context Network for Activity Localization in Videos." International Conference on Computer Vision, 2017.](https://mlanthology.org/iccv/2017/dai2017iccv-temporal/) doi:10.1109/ICCV.2017.610

BibTeX

@inproceedings{dai2017iccv-temporal,
  title     = {{Temporal Context Network for Activity Localization in Videos}},
  author    = {Dai, Xiyang and Singh, Bharat and Zhang, Guyue and Davis, Larry S. and Chen, Yan Qiu},
  booktitle = {International Conference on Computer Vision},
  year      = {2017},
  doi       = {10.1109/ICCV.2017.610},
  url       = {https://mlanthology.org/iccv/2017/dai2017iccv-temporal/}
}