Li, Ziniu
19 publications
ICLR
2026
Exploration vs Exploitation: Rethinking RLVR Through Clipping, Entropy, and Spurious Reward
NeurIPSW
2024
Entropic Distribution Matching for Supervised Fine-Tuning of LLMs: Less Overfitting and Better Diversity
19 publications