Ge, Zhenxing
7 publications
NeurIPS
2025
Improving Reward Models with Proximal Policy Exploration for Preference-Based Reinforcement Learning
NeurIPS
2025
Last-Iterate Convergence of Smooth Regret Matching$^+$ Variants in Learning Nash Equilibria
7 publications