Xie, Tengyang
23 publications
ICML
2025
Do We Need to Verify Step by Step? Rethinking Process Supervision from a Theoretical Perspective
ICLR
2025
Exploratory Preference Optimization: Harnessing Implicit Q*-Approximation for Sample-Efficient RLHF
NeurIPS
2025
Trajectory Bellman Residual Minimization: A Simple Value-Based Method for LLM Reasoning
NeurIPSW
2022
AMORE: A Model-Based Framework for Improving Arbitrary Baseline Policies with Offline Data