ML Anthology
Authors
Search
About
Li, Yafu
9 publications
ICLR
2026
Conditional Advantage Estimation for Reinforcement Learning in Large Reasoning Models
Guanxu Chen
,
Yafu Li
,
Yuxian Jiang
,
Chen Qian
,
Qihan Ren
,
Yang JingYi
,
Yu Cheng
,
Dongrui Liu
,
Jing Shao
ICLR
2026
Diversity-Incentivized Exploration for Versatile Reasoning
Zican Hu
,
Shilin Zhang
,
Yafu Li
,
Jianhao Yan
,
Xuyang Hu
,
Leyang Cui
,
Xiaoye Qu
,
Chunlin Chen
,
Yu Cheng
,
Zhi Wang
ICLR
2026
ExGRPO: Learning to Reason from Experience
Runzhe Zhan
,
Yafu Li
,
Zhi Wang
,
Xiaoye Qu
,
Dongrui Liu
,
Jing Shao
,
Derek F. Wong
,
Yu Cheng
ICLR
2026
FrameThinker: Learning to Think with Long Videos via Multi-Turn Frame Spotlighting
Zefeng He
,
Xiaoye Qu
,
Yafu Li
,
Siyuan Huang
,
Daizong Liu
,
Yu Cheng
ICLR
2026
Revisual-R1: Advancing Multimodal Reasoning from Optimized Cold Start to Staged Reinforcement Learning
Shuang Chen
,
Hangyu Guo
,
Zhaochen Su
,
Yafu Li
,
Jiacheng Chen
,
Yulun Wu
,
Weijie Wang
,
ZhiYuan Feng
,
Xiaoye Qu
,
Yu Cheng
ICLR
2026
Spotlight on Token Perception for Multimodal Reinforcement Learning
Siyuan Huang
,
Xiaoye Qu
,
Yafu Li
,
Yun Luo
,
Zefeng He
,
Daizong Liu
,
Yu Cheng
NeurIPS
2025
Learning to Reason Under Off-Policy Guidance
Jianhao Yan
,
Yafu Li
,
Zican Hu
,
Zhi Wang
,
Ganqu Cui
,
Xiaoye Qu
,
Yu Cheng
,
Yue Zhang
ICML
2025
Test-Time Preference Optimization: On-the-Fly Alignment via Iterative Textual Feedback
Yafu Li
,
Xuyang Hu
,
Xiaoye Qu
,
Linjie Li
,
Yu Cheng
ICLR
2024
Understanding In-Context Learning from Repetitions
Jianhao Yan
,
Jin Xu
,
Chiyu Song
,
Chenming Wu
,
Yafu Li
,
Yue Zhang