A Long Short-Term Memory Convolutional Neural Network for First-Person Vision Activity Recognition
Abstract
Temporal information is the main source of discriminating characteristics for the recognition of proprioceptive activities in first-person vision (FPV). In this paper, we propose a motion representation that uses stacked spectrograms. These spectrograms are generated over temporal windows from mean grid-optical-flow vectors and the displacement vectors of the intensity centroid. The stacked representation enables us to use 2D convolutions to learn and extract global motion features. Moreover, we employ a long short-term memory (LSTM) network to encode the temporal dependency among consecutive samples recursively. Experimental results show that the proposed approach achieves state-of-the-art performance in the largest public dataset for FPV activity recognition.
Cite
Text
Abebe and Cavallaro. "A Long Short-Term Memory Convolutional Neural Network for First-Person Vision Activity Recognition." IEEE/CVF International Conference on Computer Vision Workshops, 2017. doi:10.1109/ICCVW.2017.159Markdown
[Abebe and Cavallaro. "A Long Short-Term Memory Convolutional Neural Network for First-Person Vision Activity Recognition." IEEE/CVF International Conference on Computer Vision Workshops, 2017.](https://mlanthology.org/iccvw/2017/abebe2017iccvw-long/) doi:10.1109/ICCVW.2017.159BibTeX
@inproceedings{abebe2017iccvw-long,
title = {{A Long Short-Term Memory Convolutional Neural Network for First-Person Vision Activity Recognition}},
author = {Abebe, Girmaw and Cavallaro, Andrea},
booktitle = {IEEE/CVF International Conference on Computer Vision Workshops},
year = {2017},
pages = {1339-1346},
doi = {10.1109/ICCVW.2017.159},
url = {https://mlanthology.org/iccvw/2017/abebe2017iccvw-long/}
}