Jiang, Nan
106 publications
NeurIPS
2025
A Snapshot of Influence: A Local Data Attribution Framework for Online Reinforcement Learning
ICML
2025
Is Best-of-N the Best of Them? Coverage, Scaling, and Optimality in Inference-Time Alignment
ICLR
2025
Iterative Nash Policy Optimization: Aligning LLMs with General Preferences via No-Regret Learning
NeurIPS
2025
Optimizing Chain-of-Thought Reasoners via Gradient Variance Minimization in Rejection Sampling and RL
NeurIPS
2024
On the Curses of Future and History in Future-Dependent Value Functions for Off-Policy Evaluation
NeurIPS
2024
Online Iterative Reinforcement Learning from Human Feedback with General Preference Model
NeurIPS
2024
Reinforcement Learning Under Latent Dynamics: Toward Statistical and Algorithmic Modularity
AAAI
2023
Learning Markov Random Fields for Combinatorial Structures via Sampling Through Lovász Local Lemma
NeurIPSW
2023
Solving Satisfiability Modulo Counting Problems in Computational Sustainability with Guarantees
AISTATS
2022
On the Convergence Rate of Off-Policy Policy Optimization Methods with Density-Ratio Correction
NeurIPS
2022
A Few Expert Queries Suffices for Sample-Efficient RL with Resets and Linear Value Approximation
NeurIPSW
2022
AMORE: A Model-Based Framework for Improving Arbitrary Baseline Policies with Offline Data
NeurIPS
2022
Beyond the Return: Off-Policy Function Estimation Under User-Specified Error-Measuring Distributions
ICMLW
2022
Beyond the Return: Off-Policy Function Estimation Under User-Specified Error-Measuring Distributions
UAI
2022
Offline Reinforcement Learning Under Value and Density-Ratio Realizability: The Power of Gaps
NeurIPS
2022
Tiered Reinforcement Learning: Pessimism in the Face of Uncertainty and Constant Regret
COLT
2021
On Query-Efficient Planning in MDPs Under Linear Realizability of the Optimal State-Value Function