Improving Gloss-Free Sign Language Translation by Reducing Representation Density
Abstract
Gloss-free sign language translation (SLT) aims to develop well-performing SLT systems with no requirement for the costly gloss annotations, but currently still lags behind gloss-based approaches significantly. In this paper, we identify a representation density problem that could be a bottleneck in restricting the performance of gloss-free SLT. Specifically, the representation density problem describes that the visual representations of semantically distinct sign gestures tend to be closely packed together in feature space, which makes gloss-free methods struggle with distinguishing different sign gestures and suffer from a sharp performance drop. To address the representation density problem, we introduce a simple but effective contrastive learning strategy, namely SignCL, which encourages gloss-free models to learn more discriminative feature representation in a self-supervised manner. Our experiments demonstrate that the proposed SignCL can significantly reduce the representation density and improve performance across various translation frameworks. Specifically, SignCLachieves a significant improvement in BLEU score for the Sign Language Transformer and GFSLT-VLP on the CSL-Daily dataset by 39\% and 46\%, respectively, without any increase of model parameters. Compared to Sign2GPT, a state-of-the-art method based on large-scale pre-trained vision and language models, SignCLachieves better performance with only 35\% of its parameters. We will release our code and model to facilitate further research.
Cite
Text
Ye et al. "Improving Gloss-Free Sign Language Translation by Reducing Representation Density." Neural Information Processing Systems, 2024. doi:10.52202/079017-3411Markdown
[Ye et al. "Improving Gloss-Free Sign Language Translation by Reducing Representation Density." Neural Information Processing Systems, 2024.](https://mlanthology.org/neurips/2024/ye2024neurips-improving/) doi:10.52202/079017-3411BibTeX
@inproceedings{ye2024neurips-improving,
title = {{Improving Gloss-Free Sign Language Translation by Reducing Representation Density}},
author = {Ye, Jinhui and Wang, Xing and Jiao, Wenxiang and Liang, Junwei and Xiong, Hui},
booktitle = {Neural Information Processing Systems},
year = {2024},
doi = {10.52202/079017-3411},
url = {https://mlanthology.org/neurips/2024/ye2024neurips-improving/}
}