Bao, Yilin

1 publications

ICLRW 2025 Offline Reinforcement Learning for LLM Multi-Step Reasoning Huaijie Wang, Shibo Hao, Hanze Dong, Shenao Zhang, Yilin Bao, Ziran Yang, Yi Wu