Cheng, Qi

1 publications

ICLR 2026 Slow-Fast Policy Optimization: Reposition-Before-Update for LLM Reasoning Ziyan Wang, Zheng Wang, Xingwei Qu, Qi Cheng, Jie Fu, Shengpu Tang, Minjia Zhang, Xiaoming Huo