Cross-View Isolated Sign Language Recognition via View Synthesis and Feature Disentanglement
Abstract
Cross-view isolated sign language recognition (CV-ISLR) addresses the challenge of identifying isolated signs from viewpoints unseen during training, a problem aggravated by the scarcity of multi-view data in existing benchmarks. To bridge this gap, we introduce a novel two-stage framework comprising View Synthesis and Contrastive Multi-task View-Semantics Recognition. In the View Synthesis stage, we simulate unseen viewpoints by extracting 3D keypoints from the front-view training dataset and synthesizing common-view 2D skeleton sequences with virtual camera rotation, which enriches view diversity without the cost of multi-camera setups. However, direct training on these synthetic samples leads to limited improvement, as viewpoint-specific and semantics-specific features remain entangled. To overcome this drawback, we present a Contrastive Multi-task View-Semantics Recognition (CMVSR) module that disentangles viewpoint-dependent features from sign semantics. In this way, CMVSR obtains view-invariant representations of the sign video, leading to robust recognition performance against various camera viewpoints. We evaluate our approach on the MM-WLAuslan dataset, the first benchmark for CV-ISLR, and on our extended protocol MTV-Test that includes additional multi-view data captured in the wild. Experimental results demonstrate that our method not only improves the accuracy of front-view skeleton-based isolated sign language recognition, but also exhibits superior generalization to novel viewpoints.
Cite
Text
Shen et al. "Cross-View Isolated Sign Language Recognition via View Synthesis and Feature Disentanglement." International Conference on Computer Vision, 2025.Markdown
[Shen et al. "Cross-View Isolated Sign Language Recognition via View Synthesis and Feature Disentanglement." International Conference on Computer Vision, 2025.](https://mlanthology.org/iccv/2025/shen2025iccv-crossview/)BibTeX
@inproceedings{shen2025iccv-crossview,
title = {{Cross-View Isolated Sign Language Recognition via View Synthesis and Feature Disentanglement}},
author = {Shen, Xin and Wang, Xinyu and Shen, Lei and Zhang, Kaihao and Yu, Xin},
booktitle = {International Conference on Computer Vision},
year = {2025},
pages = {20647-20657},
url = {https://mlanthology.org/iccv/2025/shen2025iccv-crossview/}
}