Liu, Runze

11 publications

ICLR 2026 Attention as a Compass: Efficient Exploration for Process-Supervised RL in Reasoning Models Runze Liu, Jiakang Wang, Yuling Shi, Zhihui Xie, Chenxin An, Kaiyan Zhang, Jian Zhao, Xiaodong Gu, Lei Lin, Wenping Hu, Xiu Li, Fuzheng Zhang, Guorui Zhou, Kun Gai
ICLR 2026 MARTI: A Framework for Multi-Agent LLM Systems Reinforced Training and Inference Kaiyan Zhang, Kai Tian, Runze Liu, Sihang Zeng, Xuekai Zhu, Guoli Jia, Yuchen Fan, Xingtai Lv, Yuxin Zuo, Che Jiang, Yuru Wang, Jianyu Wang, Ermo Hua, Xinwei Long, Junqi Gao, Youbang Sun, Zhiyuan Ma, Ganqu Cui, Ning Ding, Biqing Qi, Bowen Zhou
ICLR 2026 Robust Adversarial Attacks Against Unknown Disturbance via Inverse Gradient Sample Zhaoyang Zhang, Shen Wang, Runze Liu, Guopu Zhu, Fanghui Sun, Ye Lu, Zeyue Wang, Yihan Yan
NeurIPS 2025 Bohdi: Heterogeneous LLM Fusion with Automatic Data Exploration Junqi Gao, Zhichang Guo, Dazhi Zhang, Dong Li, Runze Liu, Pengfei Li, Kai Tian, Biqing Qi
ICLRW 2025 Can 1b LLM Surpass 405b LLM? Rethinking Compute-Optimal Test-Time Scaling Runze Liu, Junqi Gao, Jian Zhao, Kaiyan Zhang, Xiu Li, Biqing Qi, Wanli Ouyang, Bowen Zhou
ICLR 2025 Cross-Domain Offline Policy Adaptation with Optimal Transport and Dataset Constraint Jiafei Lyu, Mengbei Yan, Zhongjian Qiao, Runze Liu, Xiaoteng Ma, Deheng Ye, Jing-Wen Yang, Zongqing Lu, Xiu Li
AAAI 2025 RAT: Adversarial Attacks on Deep Reinforcement Agents for Targeted Behaviors Fengshuo Bai, Runze Liu, Yali Du, Ying Wen, Yaodong Yang
ICML 2024 PEARL: Zero-Shot Cross-Task Preference Alignment and Robust Reward Learning for Robotic Manipulation Runze Liu, Yali Du, Fengshuo Bai, Jiafei Lyu, Xiu Li
ICLR 2024 SEABO: A Simple Search-Based Method for Offline Imitation Learning Jiafei Lyu, Xiaoteng Ma, Le Wan, Runze Liu, Xiu Li, Zongqing Lu
NeurIPSW 2023 Zero-Shot Cross-Task Preference Alignment for Offline RL via Optimal Transport Runze Liu, Yali Du, Fengshuo Bai, Jiafei Lyu, Xiu Li
NeurIPS 2022 Meta-Reward-Net: Implicitly Differentiable Reward Learning for Preference-Based Reinforcement Learning Runze Liu, Fengshuo Bai, Yali Du, Yaodong Yang