Li, Lihong

67 publications

NeurIPS 2025 Ask a Strong LLM Judge When Your Reward Model Is Uncertain Zhenghao Xu, Qin Lu, Qingru Zhang, Liang Qiu, Ilgee Hong, Changlong Yu, Wenlin Yao, Yao Liu, Haoming Jiang, Lihong Li, Hyokun Yun, Tuo Zhao
ICLR 2022 Understanding Domain Randomization for Sim-to-Real Transfer Xiaoyu Chen, Jiachen Hu, Chi Jin, Lihong Li, Liwei Wang
AISTATS 2021 Off-Policy Evaluation in Infinite-Horizon Reinforcement Learning with Latent Confounders Andrew Bennett, Nathan Kallus, Lihong Li, Ali Mousavi
ICLR 2021 Efficient Reinforcement Learning in Factored MDPs with Application to Constrained RL Xiaoyu Chen, Jiachen Hu, Lihong Li, Liwei Wang
MLJ 2021 Guest Editorial: Special Issue on Reinforcement Learning for Real Life Yuxi Li, Alborz Geramifard, Lihong Li, Csaba Szepesvári, Tao Wang
ICML 2021 Near-Optimal Representation Learning for Linear Bandits and Linear RL Jiachen Hu, Xiaoyu Chen, Chi Jin, Lihong Li, Liwei Wang
ICLR 2021 Neural Thompson Sampling Weitong Zhang, Dongruo Zhou, Lihong Li, Quanquan Gu
ICML 2021 On the Optimality of Batch Policy Optimization Algorithms Chenjun Xiao, Yifan Wu, Jincheng Mei, Bo Dai, Tor Lattimore, Lihong Li, Csaba Szepesvari, Dale Schuurmans
ICML 2020 Batch Stationary Distribution Estimation Junfeng Wen, Bo Dai, Lihong Li, Dale Schuurmans
ICLR 2020 Black-Box Off-Policy Estimation for Infinite-Horizon Reinforcement Learning Ali Mousavi, Lihong Li, Qiang Liu, Denny Zhou
NeurIPS 2020 CoinDICE: Off-Policy Confidence Interval Estimation Bo Dai, Ofir Nachum, Yinlam Chow, Lihong Li, Csaba Szepesvari, Dale Schuurmans
ICLR 2020 Doubly Robust Bias Reduction in Infinite Horizon Off-Policy Estimation Ziyang Tang, Yihao Feng, Lihong Li, Dengyong Zhou, Qiang Liu
NeurIPS 2020 Escaping the Gravitational Pull of SoftMax Jincheng Mei, Chenjun Xiao, Bo Dai, Lihong Li, Csaba Szepesvari, Dale Schuurmans
ICLR 2020 GenDICE: Generalized Offline Estimation of Stationary Values Ruiyi Zhang, Bo Dai, Lihong Li, Dale Schuurmans
ICML 2020 Neural Contextual Bandits with UCB-Based Exploration Dongruo Zhou, Lihong Li, Quanquan Gu
NeurIPS 2020 Off-Policy Evaluation via the Regularized Lagrangian Mengjiao Yang, Ofir Nachum, Bo Dai, Lihong Li, Dale Schuurmans
AISTATS 2020 Randomized Exploration in Generalized Linear Bandits Branislav Kveton, Manzil Zaheer, Csaba Szepesvari, Lihong Li, Mohammad Ghavamzadeh, Craig Boutilier
NeurIPS 2019 A Kernel Loss for Solving the Bellman Equation Yihao Feng, Lihong Li, Qiang Liu
NeurIPS 2019 DualDICE: Behavior-Agnostic Estimation of Discounted Stationary Distribution Corrections Ofir Nachum, Yinlam Chow, Bo Dai, Lihong Li
ICMLW 2019 DualDICE: Efficient Estimation of Off-Policy Stationary Distribution Corrections Ofir Nachum, Yinlam Chow, Bo Dai, Lihong Li
ICLR 2019 Neural Logic Machines Honghua Dong, Jiayuan Mao, Tian Lin, Chong Wang, Lihong Li, Denny Zhou
ICML 2019 Policy Certificates: Towards Accountable Reinforcement Learning Christoph Dann, Lihong Li, Wei Wei, Emma Brunskill
NeurIPS 2018 Adversarial Attacks on Stochastic Bandits Kwang-Sung Jun, Lihong Li, Yuzhe Ma, Xiaojin Zhu
AAAI 2018 BBQ-Networks: Efficient Exploration in Deep Reinforcement Learning for Task-Oriented Dialogue Systems Zachary C. Lipton, Xiujun Li, Jianfeng Gao, Lihong Li, Faisal Ahmed, Li Deng
ICLR 2018 Boosting the Actor with Dual Critic Bo Dai, Albert Shaw, Niao He, Lihong Li, Le Song
NeurIPS 2018 Breaking the Curse of Horizon: Infinite-Horizon Off-Policy Estimation Qiang Liu, Lihong Li, Ziyang Tang, Dengyong Zhou
ICML 2018 SBEED: Convergent Reinforcement Learning with Nonlinear Function Approximation Bo Dai, Albert Shaw, Lihong Li, Lin Xiao, Niao He, Zhen Liu, Jianshu Chen, Le Song
ICML 2018 Scalable Bilinear Pi Learning Using State and Action Features Yichen Chen, Lihong Li, Mengdi Wang
ICLR 2017 Neuro-Symbolic Program Synthesis Emilio Parisotto, Abdel-rahman Mohamed, Rishabh Singh, Lihong Li, Dengyong Zhou, Pushmeet Kohli
ICML 2017 Provably Optimal Algorithms for Generalized Linear Contextual Bandits Lihong Li, Yu Lu, Dengyong Zhou
NeurIPS 2017 Q-LDA: Uncovering Latent Patterns in Text-Based Sequential Decision Processes Jianshu Chen, Chong Wang, Lin Xiao, Ji He, Lihong Li, Li Deng
ICML 2017 Stochastic Variance Reduction Methods for Policy Evaluation Simon S. Du, Jianshu Chen, Lihong Li, Lin Xiao, Dengyong Zhou
NeurIPS 2016 Active Learning with Oracle Epiphany Tzu-Kuo Huang, Lihong Li, Ara Vartanian, Saleema Amershi, Xiaojin Zhu
COLT 2016 An Efficient Algorithm for Contextual Bandits with Knapsacks, and an Extension to Concave Objectives Shipra Agrawal, Nikhil R. Devanur, Lihong Li
ICML 2016 Doubly Robust Off-Policy Value Evaluation for Reinforcement Learning Nan Jiang, Lihong Li
ALT 2016 On the Prior Sensitivity of Thompson Sampling Che-Yu Liu, Lihong Li
AISTATS 2015 Toward Minimax Off-Policy Value Estimation Lihong Li, Rémi Munos, Csaba Szepesvári
ICML 2014 PAC-Inspired Option Discovery in Lifelong Reinforcement Learning Emma Brunskill, Lihong Li
ICML 2014 Taming the Monster: A Fast and Simple Algorithm for Contextual Bandits Alekh Agarwal, Daniel Hsu, Satyen Kale, John Langford, Lihong Li, Robert Schapire
UAI 2013 Sample Complexity of Multi-Task Reinforcement Learning Emma Brunskill, Lihong Li
COLT 2012 Open Problem: Regret Bounds for Thompson Sampling Lihong Li, Olivier Chapelle
UAI 2012 Sample-Efficient Nonstationary Policy Evaluation for Contextual Bandits Miroslav Dudík, Dumitru Erhan, John Langford, Lihong Li
NeurIPS 2011 An Empirical Evaluation of Thompson Sampling Olivier Chapelle, Lihong Li
AISTATS 2011 Contextual Bandit Algorithms with Supervised Learning Guarantees Alina Beygelzimer, John Langford, Lihong Li, Lev Reyzin, Robert Schapire
AISTATS 2011 Contextual Bandits with Linear Payoff Functions Wei Chu, Lihong Li, Lev Reyzin, Robert Schapire
ICML 2011 Doubly Robust Policy Evaluation and Learning Miroslav Dudík, John Langford, Lihong Li
MLJ 2011 Knows What It Knows: A Framework for Self-Aware Learning Lihong Li, Michael L. Littman, Thomas J. Walsh, Alexander L. Strehl
AISTATS 2011 Linear-Time Estimators for Propensity Scores Deepak Agarwal, Lihong Li, Alexander Smola
NeurIPS 2010 Learning from Logged Implicit Exploration Data Alex Strehl, John Langford, Lihong Li, Sham M. Kakade
NeurIPS 2010 Parallelized Stochastic Gradient Descent Martin Zinkevich, Markus Weimer, Lihong Li, Alex J. Smola
UAI 2009 A Bayesian Sampling Approach to Exploration in Reinforcement Learning John Asmuth, Lihong Li, Michael L. Littman, Ali Nouri, David Wingate
JMLR 2009 Provably Efficient Learning with Typed Parametric Models Emma Brunskill, Bethany R. Leffler, Lihong Li, Michael L. Littman, Nicholas Roy
JMLR 2009 Reinforcement Learning in Finite MDPs: PAC Analysis Alexander L. Strehl, Lihong Li, Michael L. Littman
JMLR 2009 Sparse Online Learning via Truncated Gradient John Langford, Lihong Li, Tong Zhang
ICML 2009 The Adaptive K-Meteorologists Problem and Its Application to Structure Learning and Feature Selection in Reinforcement Learning Carlos Diuk, Lihong Li, Bethany R. Leffler
ICML 2009 Workshop Summary: Results of the 2009 Reinforcement Learning Competition David Wingate, Carlos Diuk, Lihong Li, Matthew Taylor, Jordan Frank
ICML 2008 A Worst-Case Comparison Between Temporal Difference and Residual Gradient with Linear Function Approximation Lihong Li
ICML 2008 An Analysis of Linear Models, Linear Value-Function Approximation, and Feature Selection for Reinforcement Learning Ronald Parr, Lihong Li, Gavin Taylor, Christopher Painter-Wakefield, Michael L. Littman
UAI 2008 CORL: A Continuous-State Offset-Dynamics Reinforcement Learner Emma Brunskill, Bethany R. Leffler, Lihong Li, Michael L. Littman, Nicholas Roy
ICML 2008 Knows What It Knows: A Framework for Self-Aware Learning Lihong Li, Michael L. Littman, Thomas J. Walsh
NeurIPS 2008 Sparse Online Learning via Truncated Gradient John Langford, Lihong Li, Tong Zhang
ICML 2007 Analyzing Feature Generation for Value-Function Approximation Ronald Parr, Christopher Painter-Wakefield, Lihong Li, Michael L. Littman
UAI 2006 Incremental Model-Based Learners with Formal Learning-Time Guarantees Alexander L. Strehl, Lihong Li, Michael L. Littman
ICML 2006 PAC Model-Free Reinforcement Learning Alexander L. Strehl, Lihong Li, Eric Wiewiora, John Langford, Michael L. Littman
AAAI 2005 Lazy Approximation for Solving Continuous Finite-Horizon MDPs Lihong Li, Michael L. Littman
ECML-PKDD 2004 Batch Reinforcement Learning with State Importance Lihong Li, Vadim Bulitko, Russell Greiner
IJCAI 2003 Lookahead Pathologies for Single Agent Search Vadim Bulitko, Lihong Li, Russell Greiner, Ilya Levner