Language Imbalance Driven Rewarding for Multilingual Self-Improving
Abstract
Large Language Models (LLMs) have achieved state-of-the-art performance across numerous tasks. However, these advancements have predominantly benefited "first-class" languages such as English and Chinese, leaving many other languages underrepresented. This imbalance, while limiting broader applications, generates a natural preference ranking between languages, offering an opportunity to bootstrap the multilingual capabilities of LLM in a self-improving manner. Thus, we propose $\textit{Language Imbalance Driven Rewarding}$, where the inherent imbalance between dominant and non-dominant languages within LLMs is leveraged as a reward signal. Iterative DPO training demonstrates that this approach not only enhances LLM performance in non-dominant languages but also improves the dominant language's capacity, thereby yielding an iterative reward signal. Fine-tuning Meta-Llama-3-8B-Instruct over two iterations of this approach results in continuous improvements in multilingual performance across instruction-following and arithmetic reasoning tasks, evidenced by an average improvement of 7.46\% win rate on the X-AlpacaEval leaderboard and 13.9\% accuracy on the MGSM benchmark. This work serves as an initial exploration, paving the way for multilingual self-improvement of LLMs.
Cite
Text
Yang et al. "Language Imbalance Driven Rewarding for Multilingual Self-Improving." International Conference on Learning Representations, 2025.Markdown
[Yang et al. "Language Imbalance Driven Rewarding for Multilingual Self-Improving." International Conference on Learning Representations, 2025.](https://mlanthology.org/iclr/2025/yang2025iclr-language/)BibTeX
@inproceedings{yang2025iclr-language,
title = {{Language Imbalance Driven Rewarding for Multilingual Self-Improving}},
author = {Yang, Wen and Wu, Junhong and Wang, Chen and Zong, Chengqing and Zhang, Jiajun},
booktitle = {International Conference on Learning Representations},
year = {2025},
url = {https://mlanthology.org/iclr/2025/yang2025iclr-language/}
}