Zhang, Yuheng
13 publications
ICLR
2025
Iterative Nash Policy Optimization: Aligning LLMs with General Preferences via No-Regret Learning
NeurIPS
2024
On the Curses of Future and History in Future-Dependent Value Functions for Off-Policy Evaluation
NeurIPS
2024
Online Iterative Reinforcement Learning from Human Feedback with General Preference Model