DiscoX: Benchmarking Discourse-Level Translation in Expert Domains

Zhao, Xiying; Wen, Zhoufutu; Chen, Zhixuan; Ding, Jingzhe; Jiao, Jianpeng; Li, Shuai; Li, Xi; Liang, Danni; Long, Shengda; Liu, Qianqian; Wu, Xianbo; Gao, Hongwan; Gao, Xiang; Hu, Liang; Liu, Jiashuo; Liumengyun,; Shi, Weiran; Yang, Chenghao; Yang, Qianyu; Zhang, Xuanliang; Zhang, Ge; Huang, Wenhao

DiscoX: Benchmarking Discourse-Level Translation in Expert Domains

ICLR 2026

/iclr/2026/zhao2026iclr-discox/

Abstract

The evaluation of discourse-level translation in expert domains remains inadequate, despite its centrality to knowledge dissemination and cross-lingual scholarly communication. While these translations demand discourse-level coherence and strict terminological precision, current evaluation methods predominantly focus on segment-level accuracy and fluency. To address this limitation, we introduce DiscoX, a new benchmark for discourse-level and expert-level Chinese-English translation. It comprises 200 professionally-curated texts from 7 domains, with an average length exceeding 1700 tokens. To evaluate performance on DiscoX, we also develop Metric-S, a reference-free system that provides fine-grained automatic assessments across accuracy, fluency, and appropriateness. Metric-S demonstrates strong consistency with human judgments, significantly outperforming existing metrics. Our experiments reveal a remarkable performance gap: even the most advanced LLMs still trail human experts on these tasks. This finding validates the difficulty of DiscoX and underscores the challenges that remain in achieving professional-grade machine translation. The proposed benchmark and evaluation system provide a robust framework for more rigorous evaluation, facilitating future advancements in LLM-based translation. Our data and code are available at https://github.com/ByteDance-Seed/DiscoX.

PDF ICLR OpenReview Semantic Scholar

Cite

Text

Zhao et al. "DiscoX: Benchmarking Discourse-Level Translation in Expert Domains." International Conference on Learning Representations, 2026.

Markdown

[Zhao et al. "DiscoX: Benchmarking Discourse-Level Translation in Expert Domains." International Conference on Learning Representations, 2026.](https://mlanthology.org/iclr/2026/zhao2026iclr-discox/)

BibTeX

@inproceedings{zhao2026iclr-discox,
  title     = {{DiscoX: Benchmarking Discourse-Level Translation in Expert Domains}},
  author    = {Zhao, Xiying and Wen, Zhoufutu and Chen, Zhixuan and Ding, Jingzhe and Jiao, Jianpeng and Li, Shuai and Li, Xi and Liang, Danni and Long, Shengda and Liu, Qianqian and Wu, Xianbo and Gao, Hongwan and Gao, Xiang and Hu, Liang and Liu, Jiashuo and Liumengyun,  and Shi, Weiran and Yang, Chenghao and Yang, Qianyu and Zhang, Xuanliang and Zhang, Ge and Huang, Wenhao},
  booktitle = {International Conference on Learning Representations},
  year      = {2026},
  url       = {https://mlanthology.org/iclr/2026/zhao2026iclr-discox/}
}