Open-Vocabulary Fine-Grained Hand Action Detection

Zhe, Ting; Han, Mengya; Hao, Xiaoshuai; Luo, Yong; He, Zheng; Cai, Xiantao; Zhang, Jing

doi:10.24963/IJCAI.2025/276

Open-Vocabulary Fine-Grained Hand Action Detection

Ting Zhe, Mengya Han, Xiaoshuai Hao, Yong Luo, Zheng He, Xiantao Cai, Jing Zhang

IJCAI 2025 pp. 2476-2484

doi:10.24963/IJCAI.2025/276 /ijcai/2025/zhe2025ijcai-open/

Abstract

In this work, we address the new challenge of open-vocabulary fine-grained hand action detection, which aims to recognize hand actions from both known and novel categories using textual descriptions. Traditional hand action detection methods are limited to closed-set detection, making it difficult for them to generalize to new, unseen hand action categories. While current open-vocabulary detection (OVD) methods are effective at detecting novel objects, they face challenges with fine-grained action recognition, particularly when data is limited and heterogeneous. This often leads to poor generalization and performance bias between base and novel categories. To address these issues, we propose a novel approach, Open-FGHA (Open-vocabulary Fine-Grained Hand Action), which learns to distinguish fine-grained features across multiple modalities from limited heterogeneous data. It then identifies optimal matching relationships among these features, enabling accurate open-vocabulary fine-grained hand action detection. Specifically, we introduce three key components: Hierarchical Heterogeneous Low-Rank Adaptation, Bidirectional Selection and Fusion Mechanism, and Cross-Modality Query Generator. These components work in unison to enhance the alignment and fusion of multimodal fine-grained features. Extensive experiments demonstrate that Open-FGHA outperforms existing OVD methods, showing its strong potential for open-vocabulary hand action detection. The source code is available at OV-FGHAD.

PDF IJCAI Semantic Scholar

Cite

Text

Zhe et al. "Open-Vocabulary Fine-Grained Hand Action Detection." International Joint Conference on Artificial Intelligence, 2025. doi:10.24963/IJCAI.2025/276

Markdown

[Zhe et al. "Open-Vocabulary Fine-Grained Hand Action Detection." International Joint Conference on Artificial Intelligence, 2025.](https://mlanthology.org/ijcai/2025/zhe2025ijcai-open/) doi:10.24963/IJCAI.2025/276

BibTeX

@inproceedings{zhe2025ijcai-open,
  title     = {{Open-Vocabulary Fine-Grained Hand Action Detection}},
  author    = {Zhe, Ting and Han, Mengya and Hao, Xiaoshuai and Luo, Yong and He, Zheng and Cai, Xiantao and Zhang, Jing},
  booktitle = {International Joint Conference on Artificial Intelligence},
  year      = {2025},
  pages     = {2476-2484},
  doi       = {10.24963/IJCAI.2025/276},
  url       = {https://mlanthology.org/ijcai/2025/zhe2025ijcai-open/}
}