MMAD: Multi-Label Micro-Action Detection in Videos

ICCV 2025 pp. 13225-13236

Abstract

Human body actions are an important form of non-verbal communication in social interactions. This paper specifically focuses on a subset of body actions known as micro-actions, which are subtle, low-intensity body movements with promising applications in human emotion analysis. In real-world scenarios, human micro-actions often temporally co-occur, with multiple micro-actions overlapping in time, such as concurrent head and hand movements. However, current research primarily focuses on recognizing individual micro-actions while overlooking their co-occurring nature. To address this gap, we propose a new task named Multi-label Micro-Action Detection (MMAD), which involves identifying all micro-actions in a given short video, determining their start and end times, and categorizing them. Accomplishing this requires a model capable of accurately capturing both long-term and short-term action relationships to detect multiple overlapping micro-actions. To facilitate the MMAD task, we introduce a new dataset named Multi-label Micro-Action-52 (MMA-52) and propose a baseline method equipped with a dual-path spatial-temporal adapter to address the challenges of subtle visual change in MMAD. We hope that MMA-52 can stimulate research on micro-action analysis in videos and prompt the development of spatio-temporal modeling in human-centric video understanding. The proposed MMA-52 dataset is available at: https://github.com/VUT-HFUT/Micro-Action

Cite

Text

Li et al. "MMAD: Multi-Label Micro-Action Detection in Videos." International Conference on Computer Vision, 2025.

Markdown

[Li et al. "MMAD: Multi-Label Micro-Action Detection in Videos." International Conference on Computer Vision, 2025.](https://mlanthology.org/iccv/2025/li2025iccv-mmad/)

BibTeX

@inproceedings{li2025iccv-mmad,
  title     = {{MMAD: Multi-Label Micro-Action Detection in Videos}},
  author    = {Li, Kun and Liu, Pengyu and Guo, Dan and Wang, Fei and Wu, Zhiliang and Fan, Hehe and Wang, Meng},
  booktitle = {International Conference on Computer Vision},
  year      = {2025},
  pages     = {13225-13236},
  url       = {https://mlanthology.org/iccv/2025/li2025iccv-mmad/}
}