Luo, Haipeng
98 publications
NeurIPS
2025
Adapting to Stochastic and Adversarial Losses in Episodic MDPs with Aggregate Bandit Feedback
NeurIPS
2025
Comparator-Adaptive $\Phi$-Regret: Improved Bounds, Simpler Algorithms, and Applications to Games
NeurIPS
2025
From Average-Iterate to Last-Iterate Convergence in Games: A Reduction and Its Applications
NeurIPS
2025
Improved Regret and Contextual Linear Extension for Pandora's Box and Prophet Inequality
COLT
2025
Instance-Dependent Regret Bounds for Learning Two-Player Zero-Sum Games with Bandit Feedback
ICLR
2025
WizardMath: Empowering Mathematical Reasoning for Large Language Models via Reinforced Evol-Instruct
AISTATS
2024
Near-Optimal Policy Optimization for Correlated Equilibrium in General-Sum Markov Games
NeurIPS
2023
Uncoupled and Convergent Learning in Two-Player Zero-Sum Markov Games with Bandit Feedback
NeurIPSW
2022
Clairvoyant Regret Minimization: Equivalence with Nemirovski’s Conceptual Prox Method and Extension to General Convex Games
NeurIPS
2022
Follow-the-Perturbed-Leader for Adversarial Markov Decision Processes with Bandit Feedback
NeurIPS
2021
Implicit Finite-Horizon Approximation and Efficient Optimal Algorithms for Stochastic Shortest Path
COLT
2021
Non-Stationary Reinforcement Learning Without Prior Knowledge: An Optimal Black-Box Approach
NeurIPS
2021
The Best of Both Worlds: Stochastic and Adversarial Episodic MDPs with Unknown Transition
NeurIPS
2020
Bias No More: High-Probability Data-Dependent Regret Bounds for Adversarial Bandits and MDPs
ICML
2020
Learning Adversarial Markov Decision Processes with Bandit Feedback and Unknown Transition
ICML
2020
Model-Free Reinforcement Learning in Infinite-Horizon Average-Reward Markov Decision Processes