ML Anthology
Authors
Search
About
Li, Lihong
67 publications
NeurIPS
2025
Ask a Strong LLM Judge When Your Reward Model Is Uncertain
Zhenghao Xu
,
Qin Lu
,
Qingru Zhang
,
Liang Qiu
,
Ilgee Hong
,
Changlong Yu
,
Wenlin Yao
,
Yao Liu
,
Haoming Jiang
,
Lihong Li
,
Hyokun Yun
,
Tuo Zhao
ICLR
2022
Understanding Domain Randomization for Sim-to-Real Transfer
Xiaoyu Chen
,
Jiachen Hu
,
Chi Jin
,
Lihong Li
,
Liwei Wang
AISTATS
2021
Off-Policy Evaluation in Infinite-Horizon Reinforcement Learning with Latent Confounders
Andrew Bennett
,
Nathan Kallus
,
Lihong Li
,
Ali Mousavi
ICLR
2021
Efficient Reinforcement Learning in Factored MDPs with Application to Constrained RL
Xiaoyu Chen
,
Jiachen Hu
,
Lihong Li
,
Liwei Wang
MLJ
2021
Guest Editorial: Special Issue on Reinforcement Learning for Real Life
Yuxi Li
,
Alborz Geramifard
,
Lihong Li
,
Csaba Szepesvári
,
Tao Wang
ICML
2021
Near-Optimal Representation Learning for Linear Bandits and Linear RL
Jiachen Hu
,
Xiaoyu Chen
,
Chi Jin
,
Lihong Li
,
Liwei Wang
ICLR
2021
Neural Thompson Sampling
Weitong Zhang
,
Dongruo Zhou
,
Lihong Li
,
Quanquan Gu
ICML
2021
On the Optimality of Batch Policy Optimization Algorithms
Chenjun Xiao
,
Yifan Wu
,
Jincheng Mei
,
Bo Dai
,
Tor Lattimore
,
Lihong Li
,
Csaba Szepesvari
,
Dale Schuurmans
ICML
2020
Batch Stationary Distribution Estimation
Junfeng Wen
,
Bo Dai
,
Lihong Li
,
Dale Schuurmans
ICLR
2020
Black-Box Off-Policy Estimation for Infinite-Horizon Reinforcement Learning
Ali Mousavi
,
Lihong Li
,
Qiang Liu
,
Denny Zhou
NeurIPS
2020
CoinDICE: Off-Policy Confidence Interval Estimation
Bo Dai
,
Ofir Nachum
,
Yinlam Chow
,
Lihong Li
,
Csaba Szepesvari
,
Dale Schuurmans
ICLR
2020
Doubly Robust Bias Reduction in Infinite Horizon Off-Policy Estimation
Ziyang Tang
,
Yihao Feng
,
Lihong Li
,
Dengyong Zhou
,
Qiang Liu
NeurIPS
2020
Escaping the Gravitational Pull of SoftMax
Jincheng Mei
,
Chenjun Xiao
,
Bo Dai
,
Lihong Li
,
Csaba Szepesvari
,
Dale Schuurmans
ICLR
2020
GenDICE: Generalized Offline Estimation of Stationary Values
Ruiyi Zhang
,
Bo Dai
,
Lihong Li
,
Dale Schuurmans
ICML
2020
Neural Contextual Bandits with UCB-Based Exploration
Dongruo Zhou
,
Lihong Li
,
Quanquan Gu
NeurIPS
2020
Off-Policy Evaluation via the Regularized Lagrangian
Mengjiao Yang
,
Ofir Nachum
,
Bo Dai
,
Lihong Li
,
Dale Schuurmans
AISTATS
2020
Randomized Exploration in Generalized Linear Bandits
Branislav Kveton
,
Manzil Zaheer
,
Csaba Szepesvari
,
Lihong Li
,
Mohammad Ghavamzadeh
,
Craig Boutilier
NeurIPS
2019
A Kernel Loss for Solving the Bellman Equation
Yihao Feng
,
Lihong Li
,
Qiang Liu
NeurIPS
2019
DualDICE: Behavior-Agnostic Estimation of Discounted Stationary Distribution Corrections
Ofir Nachum
,
Yinlam Chow
,
Bo Dai
,
Lihong Li
ICMLW
2019
DualDICE: Efficient Estimation of Off-Policy Stationary Distribution Corrections
Ofir Nachum
,
Yinlam Chow
,
Bo Dai
,
Lihong Li
ICLR
2019
Neural Logic Machines
Honghua Dong
,
Jiayuan Mao
,
Tian Lin
,
Chong Wang
,
Lihong Li
,
Denny Zhou
ICML
2019
Policy Certificates: Towards Accountable Reinforcement Learning
Christoph Dann
,
Lihong Li
,
Wei Wei
,
Emma Brunskill
NeurIPS
2018
Adversarial Attacks on Stochastic Bandits
Kwang-Sung Jun
,
Lihong Li
,
Yuzhe Ma
,
Xiaojin Zhu
AAAI
2018
BBQ-Networks: Efficient Exploration in Deep Reinforcement Learning for Task-Oriented Dialogue Systems
Zachary C. Lipton
,
Xiujun Li
,
Jianfeng Gao
,
Lihong Li
,
Faisal Ahmed
,
Li Deng
ICLR
2018
Boosting the Actor with Dual Critic
Bo Dai
,
Albert Shaw
,
Niao He
,
Lihong Li
,
Le Song
NeurIPS
2018
Breaking the Curse of Horizon: Infinite-Horizon Off-Policy Estimation
Qiang Liu
,
Lihong Li
,
Ziyang Tang
,
Dengyong Zhou
ICML
2018
SBEED: Convergent Reinforcement Learning with Nonlinear Function Approximation
Bo Dai
,
Albert Shaw
,
Lihong Li
,
Lin Xiao
,
Niao He
,
Zhen Liu
,
Jianshu Chen
,
Le Song
ICML
2018
Scalable Bilinear Pi Learning Using State and Action Features
Yichen Chen
,
Lihong Li
,
Mengdi Wang
ICLR
2017
Neuro-Symbolic Program Synthesis
Emilio Parisotto
,
Abdel-rahman Mohamed
,
Rishabh Singh
,
Lihong Li
,
Dengyong Zhou
,
Pushmeet Kohli
ICML
2017
Provably Optimal Algorithms for Generalized Linear Contextual Bandits
Lihong Li
,
Yu Lu
,
Dengyong Zhou
NeurIPS
2017
Q-LDA: Uncovering Latent Patterns in Text-Based Sequential Decision Processes
Jianshu Chen
,
Chong Wang
,
Lin Xiao
,
Ji He
,
Lihong Li
,
Li Deng
ICML
2017
Stochastic Variance Reduction Methods for Policy Evaluation
Simon S. Du
,
Jianshu Chen
,
Lihong Li
,
Lin Xiao
,
Dengyong Zhou
NeurIPS
2016
Active Learning with Oracle Epiphany
Tzu-Kuo Huang
,
Lihong Li
,
Ara Vartanian
,
Saleema Amershi
,
Xiaojin Zhu
COLT
2016
An Efficient Algorithm for Contextual Bandits with Knapsacks, and an Extension to Concave Objectives
Shipra Agrawal
,
Nikhil R. Devanur
,
Lihong Li
ICML
2016
Doubly Robust Off-Policy Value Evaluation for Reinforcement Learning
Nan Jiang
,
Lihong Li
ALT
2016
On the Prior Sensitivity of Thompson Sampling
Che-Yu Liu
,
Lihong Li
AISTATS
2015
Toward Minimax Off-Policy Value Estimation
Lihong Li
,
Rémi Munos
,
Csaba Szepesvári
ICML
2014
PAC-Inspired Option Discovery in Lifelong Reinforcement Learning
Emma Brunskill
,
Lihong Li
ICML
2014
Taming the Monster: A Fast and Simple Algorithm for Contextual Bandits
Alekh Agarwal
,
Daniel Hsu
,
Satyen Kale
,
John Langford
,
Lihong Li
,
Robert Schapire
UAI
2013
Sample Complexity of Multi-Task Reinforcement Learning
Emma Brunskill
,
Lihong Li
COLT
2012
Open Problem: Regret Bounds for Thompson Sampling
Lihong Li
,
Olivier Chapelle
UAI
2012
Sample-Efficient Nonstationary Policy Evaluation for Contextual Bandits
Miroslav Dudík
,
Dumitru Erhan
,
John Langford
,
Lihong Li
NeurIPS
2011
An Empirical Evaluation of Thompson Sampling
Olivier Chapelle
,
Lihong Li
AISTATS
2011
Contextual Bandit Algorithms with Supervised Learning Guarantees
Alina Beygelzimer
,
John Langford
,
Lihong Li
,
Lev Reyzin
,
Robert Schapire
AISTATS
2011
Contextual Bandits with Linear Payoff Functions
Wei Chu
,
Lihong Li
,
Lev Reyzin
,
Robert Schapire
ICML
2011
Doubly Robust Policy Evaluation and Learning
Miroslav Dudík
,
John Langford
,
Lihong Li
MLJ
2011
Knows What It Knows: A Framework for Self-Aware Learning
Lihong Li
,
Michael L. Littman
,
Thomas J. Walsh
,
Alexander L. Strehl
AISTATS
2011
Linear-Time Estimators for Propensity Scores
Deepak Agarwal
,
Lihong Li
,
Alexander Smola
NeurIPS
2010
Learning from Logged Implicit Exploration Data
Alex Strehl
,
John Langford
,
Lihong Li
,
Sham M. Kakade
NeurIPS
2010
Parallelized Stochastic Gradient Descent
Martin Zinkevich
,
Markus Weimer
,
Lihong Li
,
Alex J. Smola
UAI
2009
A Bayesian Sampling Approach to Exploration in Reinforcement Learning
John Asmuth
,
Lihong Li
,
Michael L. Littman
,
Ali Nouri
,
David Wingate
JMLR
2009
Provably Efficient Learning with Typed Parametric Models
Emma Brunskill
,
Bethany R. Leffler
,
Lihong Li
,
Michael L. Littman
,
Nicholas Roy
JMLR
2009
Reinforcement Learning in Finite MDPs: PAC Analysis
Alexander L. Strehl
,
Lihong Li
,
Michael L. Littman
JMLR
2009
Sparse Online Learning via Truncated Gradient
John Langford
,
Lihong Li
,
Tong Zhang
ICML
2009
The Adaptive K-Meteorologists Problem and Its Application to Structure Learning and Feature Selection in Reinforcement Learning
Carlos Diuk
,
Lihong Li
,
Bethany R. Leffler
ICML
2009
Workshop Summary: Results of the 2009 Reinforcement Learning Competition
David Wingate
,
Carlos Diuk
,
Lihong Li
,
Matthew Taylor
,
Jordan Frank
ICML
2008
A Worst-Case Comparison Between Temporal Difference and Residual Gradient with Linear Function Approximation
Lihong Li
ICML
2008
An Analysis of Linear Models, Linear Value-Function Approximation, and Feature Selection for Reinforcement Learning
Ronald Parr
,
Lihong Li
,
Gavin Taylor
,
Christopher Painter-Wakefield
,
Michael L. Littman
UAI
2008
CORL: A Continuous-State Offset-Dynamics Reinforcement Learner
Emma Brunskill
,
Bethany R. Leffler
,
Lihong Li
,
Michael L. Littman
,
Nicholas Roy
ICML
2008
Knows What It Knows: A Framework for Self-Aware Learning
Lihong Li
,
Michael L. Littman
,
Thomas J. Walsh
NeurIPS
2008
Sparse Online Learning via Truncated Gradient
John Langford
,
Lihong Li
,
Tong Zhang
ICML
2007
Analyzing Feature Generation for Value-Function Approximation
Ronald Parr
,
Christopher Painter-Wakefield
,
Lihong Li
,
Michael L. Littman
UAI
2006
Incremental Model-Based Learners with Formal Learning-Time Guarantees
Alexander L. Strehl
,
Lihong Li
,
Michael L. Littman
ICML
2006
PAC Model-Free Reinforcement Learning
Alexander L. Strehl
,
Lihong Li
,
Eric Wiewiora
,
John Langford
,
Michael L. Littman
AAAI
2005
Lazy Approximation for Solving Continuous Finite-Horizon MDPs
Lihong Li
,
Michael L. Littman
ECML-PKDD
2004
Batch Reinforcement Learning with State Importance
Lihong Li
,
Vadim Bulitko
,
Russell Greiner
IJCAI
2003
Lookahead Pathologies for Single Agent Search
Vadim Bulitko
,
Lihong Li
,
Russell Greiner
,
Ilya Levner