Inferring Human Attention by Learning Latent Intentions
Abstract
This paper addresses the problem of inferring 3D human attention in RGB-D videos at scene scale. 3D human attention describes where a human is looking in 3D scenes. We propose a probabilistic method to jointly model attention, intentions, and their interactions. Latent intentions guide human attention which conversely reveals the intention features. This mutual interaction makes attention inference a joint optimization with latent intentions. An EM-based approach is adopted to learn the latent intentions and model parameters. Given an RGB-D video with 3D human skeletons, a joint-state dynamic programming algorithm is utilized to jointly infer the latent intentions, the 3D attention directions, and the attention voxels in scene point clouds. Experiments on a new 3D human attention dataset prove the strength of our method.
Cite
Text
Wei et al. "Inferring Human Attention by Learning Latent Intentions." International Joint Conference on Artificial Intelligence, 2017. doi:10.24963/IJCAI.2017/180Markdown
[Wei et al. "Inferring Human Attention by Learning Latent Intentions." International Joint Conference on Artificial Intelligence, 2017.](https://mlanthology.org/ijcai/2017/wei2017ijcai-inferring/) doi:10.24963/IJCAI.2017/180BibTeX
@inproceedings{wei2017ijcai-inferring,
title = {{Inferring Human Attention by Learning Latent Intentions}},
author = {Wei, Ping and Xie, Dan and Zheng, Nanning and Zhu, Song-Chun},
booktitle = {International Joint Conference on Artificial Intelligence},
year = {2017},
pages = {1297-1303},
doi = {10.24963/IJCAI.2017/180},
url = {https://mlanthology.org/ijcai/2017/wei2017ijcai-inferring/}
}