LDP: Generalizing to Multilingual Visual Information Extraction by Language Decoupled Pretraining

Abstract

Visual Information Extraction (VIE) plays a crucial role in the comprehension of semi-structured documents, and several pre-trained models have been developed to enhance performance. However, most of these works are monolingual (usually English). Due to the extremely unbalanced quantity and quality of pre-training corpora between English and other languages, few works can extend to non-English scenarios. In this paper, we conduct systematic experiments to show that vision and layout modality hold invariance among images with different languages. If decoupling language bias from document images, a vision-layout-based model can achieve impressive cross-lingual generalization. Accordingly, we present a simple but effective multilingual training paradigm LDP (Language Decoupled Pre-training) for better utilization of monolingual pre-training data. Our proposed model LDM (Language Decoupled Model) is first pre-trained on the language-independent data, where the language knowledge is decoupled by a diffusion model, and then the LDM is fine-tuned on the downstream languages. Extensive experiments show that the LDM outperformed all SOTA multilingual pre-trained models, and also maintains competitiveness on downstream monolingual/English benchmarks.

Cite

Text

Shen et al. "LDP: Generalizing to Multilingual Visual Information Extraction by Language Decoupled Pretraining." AAAI Conference on Artificial Intelligence, 2025. doi:10.1609/AAAI.V39I7.32730

Markdown

[Shen et al. "LDP: Generalizing to Multilingual Visual Information Extraction by Language Decoupled Pretraining." AAAI Conference on Artificial Intelligence, 2025.](https://mlanthology.org/aaai/2025/shen2025aaai-ldp/) doi:10.1609/AAAI.V39I7.32730

BibTeX

@inproceedings{shen2025aaai-ldp,
  title     = {{LDP: Generalizing to Multilingual Visual Information Extraction by Language Decoupled Pretraining}},
  author    = {Shen, Huawen and Li, Gengluo and Zhong, Jinwen and Zhou, Yu},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {2025},
  pages     = {6805-6813},
  doi       = {10.1609/AAAI.V39I7.32730},
  url       = {https://mlanthology.org/aaai/2025/shen2025aaai-ldp/}
}