What Limits Bidirectional Model’s Generative Capabilities? a Uni-Bi-Directional Mixture-of-Expert Method for Bidirectional Fine-Tuning
Abstract
Large Language Models (LLMs) excel in generation tasks, yet their causal attention mechanisms limit performance in embedding tasks. While bidirectional modeling may enhance embeddings, naively fine-tuning unidirectional models bidirectionally severely degrades generative performance. To investigate this trade-off, we analyze attention weights as dependence indicators and find that bidirectional fine-tuning increases subsequent dependence, impairing unidirectional generation. Through systematic Transformer module evaluations, we discover the FFN layer is least affected by such dependence. Leveraging this discovery, we propose UBMoE-LLM, a novel Uni-Bi-directional Mixture-of-Experts LLM, which integrates the original unidirectional FFN with a bidirectionally fine-tuned FFN via unsupervised contrastive learning. This MoE-based approach enhances embedding performance while preserving robust generation. Extensive experiments across diverse datasets and model scales validate our attention dependence metric and demonstrate UBMoE-LLM’s superior generative quality and reduced hallucination. Code is available at: https://github.com/heiyonghua/ubmoe_llm.
Cite
Text
Li et al. "What Limits Bidirectional Model’s Generative Capabilities? a Uni-Bi-Directional Mixture-of-Expert Method for Bidirectional Fine-Tuning." Proceedings of the 42nd International Conference on Machine Learning, 2025.Markdown
[Li et al. "What Limits Bidirectional Model’s Generative Capabilities? a Uni-Bi-Directional Mixture-of-Expert Method for Bidirectional Fine-Tuning." Proceedings of the 42nd International Conference on Machine Learning, 2025.](https://mlanthology.org/icml/2025/li2025icml-limits/)BibTeX
@inproceedings{li2025icml-limits,
title = {{What Limits Bidirectional Model’s Generative Capabilities? a Uni-Bi-Directional Mixture-of-Expert Method for Bidirectional Fine-Tuning}},
author = {Li, Zuchao and Hei, Yonghua and Li, Qiwei and Zhang, Lefei and Wang, Ping and Zhao, Hai and Qi, Baoyuan and Guoming, Liu},
booktitle = {Proceedings of the 42nd International Conference on Machine Learning},
year = {2025},
pages = {34564-34577},
volume = {267},
url = {https://mlanthology.org/icml/2025/li2025icml-limits/}
}