Deep Spatial Pyramid Ensemble for Cultural Event Recognition
Abstract
Semantic event recognition based only on image-based cues is a challenging problem in computer vision. In order to capture rich information and exploit important cues like human poses, human garments and scene categories, we propose the Deep Spatial Pyramid Ensemble framework, which is mainly based on our previous work, i.e., Deep Spatial Pyramid (DSP). DSP could build universal and powerful image representations from CNN models. Specifically, we employ five deep networks trained on different data sources to extract five corresponding DSP representations for event recognition images. For combining the complementary information from different DSP representations, we ensemble these features by both "early fusion" and "late fusion". Finally, based on the proposed framework, we come up with a solution for the track of the Cultural Event Recognition competition at the ChaLearn Looking at People (LAP) challenge in association with ICCV 2015. Our framework achieved one of the best cultural event recognition performance in this challenge.
Cite
Text
Wei et al. "Deep Spatial Pyramid Ensemble for Cultural Event Recognition." IEEE/CVF International Conference on Computer Vision Workshops, 2015. doi:10.1109/ICCVW.2015.45Markdown
[Wei et al. "Deep Spatial Pyramid Ensemble for Cultural Event Recognition." IEEE/CVF International Conference on Computer Vision Workshops, 2015.](https://mlanthology.org/iccvw/2015/wei2015iccvw-deep/) doi:10.1109/ICCVW.2015.45BibTeX
@inproceedings{wei2015iccvw-deep,
title = {{Deep Spatial Pyramid Ensemble for Cultural Event Recognition}},
author = {Wei, Xiu-Shen and Gao, Bin-Bin and Wu, Jianxin},
booktitle = {IEEE/CVF International Conference on Computer Vision Workshops},
year = {2015},
pages = {280-286},
doi = {10.1109/ICCVW.2015.45},
url = {https://mlanthology.org/iccvw/2015/wei2015iccvw-deep/}
}