What Limits Bidirectional Model’s Generative Capabilities? a Uni-Bi-Directional Mixture-of-Expert Method for Bidirectional Fine-Tuning

Abstract

Large Language Models (LLMs) excel in generation tasks, yet their causal attention mechanisms limit performance in embedding tasks. While bidirectional modeling may enhance embeddings, naively fine-tuning unidirectional models bidirectionally severely degrades generative performance. To investigate this trade-off, we analyze attention weights as dependence indicators and find that bidirectional fine-tuning increases subsequent dependence, impairing unidirectional generation. Through systematic Transformer module evaluations, we discover the FFN layer is least affected by such dependence. Leveraging this discovery, we propose UBMoE-LLM, a novel Uni-Bi-directional Mixture-of-Experts LLM, which integrates the original unidirectional FFN with a bidirectionally fine-tuned FFN via unsupervised contrastive learning. This MoE-based approach enhances embedding performance while preserving robust generation. Extensive experiments across diverse datasets and model scales validate our attention dependence metric and demonstrate UBMoE-LLM’s superior generative quality and reduced hallucination. Code is available at: https://github.com/heiyonghua/ubmoe_llm.

Cite

Text

Li et al. "What Limits Bidirectional Model’s Generative Capabilities? a Uni-Bi-Directional Mixture-of-Expert Method for Bidirectional Fine-Tuning." Proceedings of the 42nd International Conference on Machine Learning, 2025.

Markdown

[Li et al. "What Limits Bidirectional Model’s Generative Capabilities? a Uni-Bi-Directional Mixture-of-Expert Method for Bidirectional Fine-Tuning." Proceedings of the 42nd International Conference on Machine Learning, 2025.](https://mlanthology.org/icml/2025/li2025icml-limits/)

BibTeX

@inproceedings{li2025icml-limits,
  title     = {{What Limits Bidirectional Model’s Generative Capabilities? a Uni-Bi-Directional Mixture-of-Expert Method for Bidirectional Fine-Tuning}},
  author    = {Li, Zuchao and Hei, Yonghua and Li, Qiwei and Zhang, Lefei and Wang, Ping and Zhao, Hai and Qi, Baoyuan and Guoming, Liu},
  booktitle = {Proceedings of the 42nd International Conference on Machine Learning},
  year      = {2025},
  pages     = {34564-34577},
  volume    = {267},
  url       = {https://mlanthology.org/icml/2025/li2025icml-limits/}
}