HuMoCon: Concept Discovery for Human Motion Understanding
Abstract
We present HuMoCon, a novel motion-video understanding framework designed for advanced human behavior analysis. The core of our method is a human motion concept discovery framework that efficiently trains multi-modal encoders to extract semantically meaningful and generalizable features. HuMoCon addresses key challenges in motion concept discovery for understanding and reasoning, including the lack of explicit multi-modality feature alignment and the loss of high-frequency information in masked autoencoding frameworks. Our approach integrates a feature alignment strategy that leverages video for contextual understanding and motion for fine-grained interaction modeling, further with a velocity reconstruction mechanism to enhance high-frequency feature expression and mitigate temporal over-smoothing. Comprehensive experiments on standard benchmarks demonstrate that HuMoCon enables effective motion concept discovery and significantly outperforms state-of-the-art methods in training large models for human motion understanding.
Cite
Text
Fang et al. "HuMoCon: Concept Discovery for Human Motion Understanding." Conference on Computer Vision and Pattern Recognition, 2025. doi:10.1109/CVPR52734.2025.00673Markdown
[Fang et al. "HuMoCon: Concept Discovery for Human Motion Understanding." Conference on Computer Vision and Pattern Recognition, 2025.](https://mlanthology.org/cvpr/2025/fang2025cvpr-humocon/) doi:10.1109/CVPR52734.2025.00673BibTeX
@inproceedings{fang2025cvpr-humocon,
title = {{HuMoCon: Concept Discovery for Human Motion Understanding}},
author = {Fang, Qihang and Tang, Chengcheng and Tekin, Bugra and Ma, Shugao and Yang, Yanchao},
booktitle = {Conference on Computer Vision and Pattern Recognition},
year = {2025},
pages = {7179-7190},
doi = {10.1109/CVPR52734.2025.00673},
url = {https://mlanthology.org/cvpr/2025/fang2025cvpr-humocon/}
}