Bridging Sign and Spoken Languages: Pseudo Gloss Generation for Sign Language Translation

Abstract

Sign Language Translation (SLT) aims to map sign language videos to spoken language text. A common approach relies on gloss annotations as an intermediate representation, decomposing SLT into two sub-tasks: video-to-gloss recognition and gloss-to-text translation. While effective, this paradigm depends on expert-annotated gloss labels, which are costly and rarely available in existing datasets, limiting its scalability. To address this challenge, we propose a gloss-free pseudo gloss generation framework that eliminates the need for human-annotated glosses while preserving the structured intermediate representation. Specifically, we prompt a Large Language Model (LLM) with a few example text-gloss pairs using in-context learning to produce draft sign glosses from spoken language text. To enhance the correspondence between LLM-generated pseudo glosses and the sign sequences in video, we correct the ordering in the pseudo glosses for better alignment via a weakly supervised learning process. This reordering facilitates the incorporation of auxiliary alignment objectives, and allows for the use of efficient supervision via a Connectionist Temporal Classification (CTC) loss. We train our SLT model—consisting of a vision encoder and a translator—through a three-stage pipeline, which progressively narrows the modality gap between sign language and spoken language. Despite its simplicity, our approach outperforms previous state-of-the-art gloss-free frameworks on two SLT benchmarks and achieves competitive results compared to gloss-based methods.

Cite

Text

Guo et al. "Bridging Sign and Spoken Languages: Pseudo Gloss Generation for Sign Language Translation." Advances in Neural Information Processing Systems, 2025.

Markdown

[Guo et al. "Bridging Sign and Spoken Languages: Pseudo Gloss Generation for Sign Language Translation." Advances in Neural Information Processing Systems, 2025.](https://mlanthology.org/neurips/2025/guo2025neurips-bridging/)

BibTeX

@inproceedings{guo2025neurips-bridging,
  title     = {{Bridging Sign and Spoken Languages: Pseudo Gloss Generation for Sign Language Translation}},
  author    = {Guo, Jianyuan and Li, Peike and Cohn, Trevor},
  booktitle = {Advances in Neural Information Processing Systems},
  year      = {2025},
  url       = {https://mlanthology.org/neurips/2025/guo2025neurips-bridging/}
}