RobustHAR: Multi-Scale Spatial-Temporal Masked Self-Supervised Pre-Training for Robust Human Activity Recognition

Abstract

Human activity recognition (HAR) is prone to performance degradation in real-world applications due to data missing between intra-sensor and inter-sensor channels. Masked modeling, as one mainstream paradigm of self-supervised pre-training, can learn robust representations across sensors in the data missing scenario by reconstructing the masked content based on the unmasked part. However, the existing methods predominantly emphasize the temporal dynamics of human activities, which limits their ability to effectively capture the spatial interdependencies among multiple sensors. Besides, different human activities often span across various spatial-temporal scales, which results in activity recognizer failing to capture intricate spatial-temporal semantic information. To address these issues, we propose RobustHAR, a new HAR model with multi-scale spatial-temporal masked self-supervised pre-training designed to improve model performance on the data missing context. RobustHAR involves three main steps: (1) RobustHAR constructs location-inspired spatial-temporal 3D-variation modeling to capture spatial-temporal correlated information in human activity data. (2) RobustHAR then designs multi-scale spatial-temporal masked self-supervised pre-training with semantic-consistent multi-scale feature co-learning for learning robust features at different scales. (3) Finally, RobustHAR fine-tunes the pretraining model with adaptive multi-scale feature fusion for human activity recognition. Extensive experiments on three public multi-sensor datasets demonstrate that RobustHAR outperforms existing state-of-the-art methods.

Cite

Text

Liu et al. "RobustHAR: Multi-Scale Spatial-Temporal Masked Self-Supervised Pre-Training for Robust Human Activity Recognition." International Joint Conference on Artificial Intelligence, 2025. doi:10.24963/IJCAI.2025/952

Markdown

[Liu et al. "RobustHAR: Multi-Scale Spatial-Temporal Masked Self-Supervised Pre-Training for Robust Human Activity Recognition." International Joint Conference on Artificial Intelligence, 2025.](https://mlanthology.org/ijcai/2025/liu2025ijcai-robusthar/) doi:10.24963/IJCAI.2025/952

BibTeX

@inproceedings{liu2025ijcai-robusthar,
  title     = {{RobustHAR: Multi-Scale Spatial-Temporal Masked Self-Supervised Pre-Training for Robust Human Activity Recognition}},
  author    = {Liu, Xiao and Yuan, Guan and Zhang, Yanmei and Liu, Shang and Yan, Qiuyan},
  booktitle = {International Joint Conference on Artificial Intelligence},
  year      = {2025},
  pages     = {8563-8571},
  doi       = {10.24963/IJCAI.2025/952},
  url       = {https://mlanthology.org/ijcai/2025/liu2025ijcai-robusthar/}
}