Shi, Yuling

2 publications

ICLR 2026 Attention as a Compass: Efficient Exploration for Process-Supervised RL in Reasoning Models Runze Liu, Jiakang Wang, Yuling Shi, Zhihui Xie, Chenxin An, Kaiyan Zhang, Jian Zhao, Xiaodong Gu, Lei Lin, Wenping Hu, Xiu Li, Fuzheng Zhang, Guorui Zhou, Kun Gai
ICLR 2026 Robust Preference Alignment via Directional Neighborhood Consensus Ruochen Mao, Yuling Shi, Xiaodong Gu, Jiaheng Wei