Huang, Zhuoxu

1 publications

ICLR 2026 Controllable Exploration in Hybrid-Policy RLVR for Multi-Modal Reasoning Zhuoxu Huang, Mengxi Jia, Hao Sun, Xuelong Li, Jungong Han