Cross-View Isolated Sign Language Recognition via View Synthesis and Feature Disentanglement

Abstract

Cross-view isolated sign language recognition (CV-ISLR) addresses the challenge of identifying isolated signs from viewpoints unseen during training, a problem aggravated by the scarcity of multi-view data in existing benchmarks. To bridge this gap, we introduce a novel two-stage framework comprising View Synthesis and Contrastive Multi-task View-Semantics Recognition. In the View Synthesis stage, we simulate unseen viewpoints by extracting 3D keypoints from the front-view training dataset and synthesizing common-view 2D skeleton sequences with virtual camera rotation, which enriches view diversity without the cost of multi-camera setups. However, direct training on these synthetic samples leads to limited improvement, as viewpoint-specific and semantics-specific features remain entangled. To overcome this drawback, we present a Contrastive Multi-task View-Semantics Recognition (CMVSR) module that disentangles viewpoint-dependent features from sign semantics. In this way, CMVSR obtains view-invariant representations of the sign video, leading to robust recognition performance against various camera viewpoints. We evaluate our approach on the MM-WLAuslan dataset, the first benchmark for CV-ISLR, and on our extended protocol MTV-Test that includes additional multi-view data captured in the wild. Experimental results demonstrate that our method not only improves the accuracy of front-view skeleton-based isolated sign language recognition, but also exhibits superior generalization to novel viewpoints.

Cite

Text

Shen et al. "Cross-View Isolated Sign Language Recognition via View Synthesis and Feature Disentanglement." International Conference on Computer Vision, 2025.

Markdown

[Shen et al. "Cross-View Isolated Sign Language Recognition via View Synthesis and Feature Disentanglement." International Conference on Computer Vision, 2025.](https://mlanthology.org/iccv/2025/shen2025iccv-crossview/)

BibTeX

@inproceedings{shen2025iccv-crossview,
  title     = {{Cross-View Isolated Sign Language Recognition via View Synthesis and Feature Disentanglement}},
  author    = {Shen, Xin and Wang, Xinyu and Shen, Lei and Zhang, Kaihao and Yu, Xin},
  booktitle = {International Conference on Computer Vision},
  year      = {2025},
  pages     = {20647-20657},
  url       = {https://mlanthology.org/iccv/2025/shen2025iccv-crossview/}
}