Dang, Haoran

1 publications

ICLR 2026 Temperature as a Meta-Policy: Adaptive Temperature in LLM Reinforcement Learning Haoran Dang, Cuiling Lan, Hai Wan, Xibin Zhao, Yan Lu