Multimodal Multi-Stream Deep Learning for Egocentric Activity Recognition

Song, Sibo; Chandrasekhar, Vijay; Mandal, Bappaditya; Li, Liyuan; Lim, Joo-Hwee; Babu, Giduthuri Sateesh; San, Phyo Phyo; Cheung, Ngai-Man

doi:10.1109/CVPRW.2016.54

Multimodal Multi-Stream Deep Learning for Egocentric Activity Recognition

Sibo Song, Vijay Chandrasekhar, Bappaditya Mandal, Liyuan Li, Joo-Hwee Lim, Giduthuri Sateesh Babu, Phyo Phyo San, Ngai-Man Cheung

CVPRW 2016 pp. 378-385

doi:10.1109/CVPRW.2016.54 /cvprw/2016/song2016cvprw-multimodal/

Abstract

In this paper, we propose a multimodal multi-stream deep learning framework to tackle the egocentric activity recognition problem, using both the video and sensor data. First, we experiment and extend a multi-stream Convolutional Neural Network to learn the spatial and temporal features from egocentric videos. Second, we propose a multistream Long Short-Term Memory architecture to learn the features from multiple sensor streams (accelerometer, gyroscope, etc.). Third, we propose to use a two-level fusion technique and experiment different pooling techniques to compute the prediction results. Experimental results using a multimodal egocentric dataset show that our proposed method can achieve very encouraging performance, despite the constraint that the scale of the existing egocentric datasets is still quite limited.

CVPRW Semantic Scholar

Cite

Text

Song et al. "Multimodal Multi-Stream Deep Learning for Egocentric Activity Recognition." IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2016. doi:10.1109/CVPRW.2016.54

Markdown

[Song et al. "Multimodal Multi-Stream Deep Learning for Egocentric Activity Recognition." IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2016.](https://mlanthology.org/cvprw/2016/song2016cvprw-multimodal/) doi:10.1109/CVPRW.2016.54

BibTeX

@inproceedings{song2016cvprw-multimodal,
  title     = {{Multimodal Multi-Stream Deep Learning for Egocentric Activity Recognition}},
  author    = {Song, Sibo and Chandrasekhar, Vijay and Mandal, Bappaditya and Li, Liyuan and Lim, Joo-Hwee and Babu, Giduthuri Sateesh and San, Phyo Phyo and Cheung, Ngai-Man},
  booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops},
  year      = {2016},
  pages     = {378-385},
  doi       = {10.1109/CVPRW.2016.54},
  url       = {https://mlanthology.org/cvprw/2016/song2016cvprw-multimodal/}
}