Gong, Boying

1 publications

ICLR 2026 Token Hidden Reward: Steering Exploration-Exploitation in Group Relative Deep Reinforcement Learning Wenlong Deng, Yi Ren, Yushu Li, Boying Gong, Danica J. Sutherland, Xiaoxiao Li, Christos Thrampoulidis