Agarwal, Alekh

97 publications

ICML 2025 Catoni Contextual Bandits Are Robust to Heavy-Tailed Rewards Chenlu Ye, Yujia Jin, Alekh Agarwal, Tong Zhang
ICML 2025 Design Considerations in Offline Preference-Based RL Alekh Agarwal, Christoph Dann, Teodor Vanislavov Marinov
TMLR 2025 Preserving Expert-Level Privacy in Offline Reinforcement Learning Navodita Sharma, Vishnu Vinod, Abhradeep Guha Thakurta, Alekh Agarwal, Borja Balle, Christoph Dann, Aravindan Raghuveer
ICLR 2025 Rewarding Progress: Scaling Automated Process Verifiers for LLM Reasoning Amrith Setlur, Chirag Nagpal, Adam Fisch, Xinyang Geng, Jacob Eisenstein, Rishabh Agarwal, Alekh Agarwal, Jonathan Berant, Aviral Kumar
TMLR 2025 Robust Preference Optimization Through Reward Model Distillation Adam Fisch, Jacob Eisenstein, Vicky Zayats, Alekh Agarwal, Ahmad Beirami, Chirag Nagpal, Peter Shaw, Jonathan Berant
ICML 2025 Theoretical Guarantees on the Best-of-N Alignment Policy Ahmad Beirami, Alekh Agarwal, Jonathan Berant, Alexander D’Amour, Jacob Eisenstein, Chirag Nagpal, Ananda Theertha Suresh
ALT 2024 A Mechanism for Sample-Efficient In-Context Learning for Sparse Retrieval Tasks Jacob Abernethy, Alekh Agarwal, Teodor Vanislavov Marinov, Manfred K. Warmuth
ICML 2024 A Minimaximalist Approach to Reinforcement Learning from Human Feedback Gokul Swamy, Christoph Dann, Rahul Kidambi, Steven Wu, Alekh Agarwal
NeurIPSW 2024 Conditional Language Policy: A General Framework for Steerable Multi-Objective Finetuning Kaiwen Wang, Rahul Kidambi, Ryan Sullivan, Alekh Agarwal, Christoph Dann, Andrea Michi, Marco Gelmi, Yunxuan Li, Raghav Gupta, Kumar Avinava Dubey, Alexandre Rame, Johan Ferret, Geoffrey Cideron, Le Hou, Hongkun Yu, Amr Ahmed, Aranyak Mehta, Leonard Hussenot, Olivier Bachem, Edouard Leurent
JMLR 2024 Model-Free Representation Learning and Exploration in Low-Rank MDPs Aditya Modi, Jinglin Chen, Akshay Krishnamurthy, Nan Jiang, Alekh Agarwal
ICML 2024 More Benefits of Being Distributional: Second-Order Bounds for Reinforcement Learning Kaiwen Wang, Owen Oertell, Alekh Agarwal, Nathan Kallus, Wen Sun
NeurIPSW 2024 P3O: Pessimistic Preference-Based Policy Optimization for Robust Alignment from Preferences Dhawal Gupta, Christoph Dann, Alekh Agarwal
NeurIPS 2024 Small Steps No More: Global Convergence of Stochastic Gradient Bandits for Arbitrary Learning Rates Jincheng Mei, Bo Dai, Alekh Agarwal, Sharan Vaswani, Anant Raj, Csaba Szepesvári, Dale Schuurmans
ICML 2024 The Non-Linear $f$-Design and Applications to Interactive Learning Alekh Agarwal, Jian Qian, Alexander Rakhlin, Tong Zhang
NeurIPSW 2023 An Empirical Evaluation of Federated Contextual Bandit Algorithms Alekh Agarwal, Hugh McMahan, Zheng Xu
ICML 2023 Learning in POMDPs Is Sample-Efficient with Hindsight Observability Jonathan Lee, Alekh Agarwal, Christoph Dann, Tong Zhang
NeurIPS 2023 Ordering-Based Conditions for Global Convergence of Policy Gradient Methods Jincheng Mei, Bo Dai, Alekh Agarwal, Mohammad Ghavamzadeh, Csaba Szepesvari, Dale Schuurmans
COLT 2023 Provable Benefits of Representational Transfer in Reinforcement Learning Alekh Agarwal, Yuda Song, Wen Sun, Kaiwen Wang, Mengdi Wang, Xuezhou Zhang
NeurIPSW 2023 Reward Model Underspecification in Language Model Alignment Jacob Eisenstein, Jonathan Berant, Chirag Nagpal, Alekh Agarwal, Ahmad Beirami, Alexander Nicholas D'Amour, Krishnamurthy Dj Dvijotham, Katherine A Heller, Stephen Robert Pfohl, Deepak Ramachandran
ICML 2023 Stochastic Gradient Succeeds for Bandits Jincheng Mei, Zixin Zhong, Bo Dai, Alekh Agarwal, Csaba Szepesvari, Dale Schuurmans
COLT 2023 VO$Q$L: Towards Optimal Regret in Model-Free RL with Nonlinear Function Approximation Alekh Agarwal, Yujia Jin, Tong Zhang
ICML 2022 Adversarially Trained Actor Critic for Offline Reinforcement Learning Ching-An Cheng, Tengyang Xie, Nan Jiang, Alekh Agarwal
ICML 2022 Efficient Reinforcement Learning in Block MDPs: A Model-Free Representation Learning Approach Xuezhou Zhang, Yuda Song, Masatoshi Uehara, Mengdi Wang, Alekh Agarwal, Wen Sun
COLT 2022 Minimax Regret Optimization for Robust Machine Learning Under Distribution Shift Alekh Agarwal, Tong Zhang
NeurIPS 2022 Model-Based RL with Optimistic Posterior Sampling: Structural Conditions and Sample Complexity Alekh Agarwal, Tong Zhang
COLT 2022 Non-Linear Reinforcement Learning in Large Action Spaces: Structural Conditions and Sample-Efficiency of Posterior Sampling Alekh Agarwal, Tong Zhang
NeurIPS 2022 On the Statistical Efficiency of Reward-Free Exploration in Non-Linear RL Jinglin Chen, Aditya Modi, Akshay Krishnamurthy, Nan Jiang, Alekh Agarwal
NeurIPSW 2022 Provable Benefits of Representational Transfer in Reinforcement Learning Alekh Agarwal, Yuda Song, Kaiwen Wang, Mengdi Wang, Wen Sun, Xuezhou Zhang
ICLR 2022 Provably Filtering Exogenous Distractors Using Multistep Inverse Dynamics Yonathan Efroni, Dipendra Misra, Akshay Krishnamurthy, Alekh Agarwal, John Langford
JMLR 2021 A Contextual Bandit Bake-Off Alberto Bietti, Alekh Agarwal, John Langford
NeurIPS 2021 Bellman-Consistent Pessimism for Offline Reinforcement Learning Tengyang Xie, Ching-An Cheng, Nan Jiang, Paul Mineiro, Alekh Agarwal
COLT 2021 Cautiously Optimistic Policy Optimization and Exploration with Linear Function Approximation Andrea Zanette, Ching-An Cheng, Alekh Agarwal
JMLR 2021 On the Theory of Policy Gradient Methods: Optimality, Approximation, and Distribution Shift Alekh Agarwal, Sham M. Kakade, Jason D. Lee, Gaurav Mahajan
ICML 2021 Provably Correct Optimization and Exploration with Non-Linear Policies Fei Feng, Wotao Yin, Alekh Agarwal, Lin Yang
COLT 2021 Towards a Dimension-Free Understanding of Adaptive Linear Control Juan C Perdomo, Max Simchowitz, Alekh Agarwal, Peter Bartlett
ICLR 2020 Deep Batch Active Learning by Diverse, Uncertain Gradient Lower Bounds Jordan T. Ash, Chicheng Zhang, Akshay Krishnamurthy, John Langford, Alekh Agarwal
NeurIPS 2020 FLAMBE: Structural Complexity and Representation Learning of Low Rank MDPs Alekh Agarwal, Sham Kakade, Akshay Krishnamurthy, Wen Sun
AAAI 2020 Metareasoning in Modular Software Systems: On-the-Fly Configuration Using Reinforcement Learning with Rich Contextual Representations Aditya Modi, Debadeepta Dey, Alekh Agarwal, Adith Swaminathan, Besmira Nushi, Sean Andrist, Eric Horvitz
COLT 2020 Model-Based Reinforcement Learning with a Generative Model Is Minimax Optimal Alekh Agarwal, Sham Kakade, Lin F. Yang
COLT 2020 Optimality and Approximation with Policy Gradient Methods in Markov Decision Processes Alekh Agarwal, Sham M Kakade, Jason D Lee, Gaurav Mahajan
NeurIPS 2020 PC-PG: Policy Cover Directed Exploration for Provable Policy Gradient Learning Alekh Agarwal, Mikael Henaff, Sham Kakade, Wen Sun
NeurIPS 2020 Policy Improvement via Imitation of Multiple Oracles Ching-An Cheng, Andrey Kolobov, Alekh Agarwal
NeurIPS 2020 Provably Good Batch Off-Policy Reinforcement Learning Without Great Exploration Yao Liu, Adith Swaminathan, Alekh Agarwal, Emma Brunskill
NeurIPS 2020 Safe Reinforcement Learning via Curriculum Induction Matteo Turchetta, Andrey Kolobov, Shital Shah, Andreas Krause, Alekh Agarwal
COLT 2020 Taking a Hint: How to Leverage Loss Predictors in Contextual Bandits? Chen-Yu Wei, Haipeng Luo, Alekh Agarwal
JMLR 2019 Active Learning for Cost-Sensitive Classification Akshay Krishnamurthy, Alekh Agarwal, Tzu-Kuo Huang, Hal Daumé Iii, John Langford
NeurIPS 2019 Bias Correction of Learned Generative Models Using Likelihood-Free Importance Weighting Aditya Grover, Jiaming Song, Ashish Kapoor, Kenneth Tran, Alekh Agarwal, Eric J Horvitz, Stefano Ermon
ICLRW 2019 Bias Correction of Learned Generative Models via Likelihood-Free Importance Weighting Aditya Grover, Jiaming Song, Ashish Kapoor, Kenneth Tran, Alekh Agarwal, Eric Horvitz, Stefano Ermon
ICML 2019 Fair Regression: Quantitative Definitions and Reduction-Based Algorithms Alekh Agarwal, Miroslav Dudik, Zhiwei Steven Wu
COLT 2019 Model-Based RL in Contextual Decision Processes: PAC Bounds and Exponential Improvements over Model-Free Approaches Wen Sun, Nan Jiang, Akshay Krishnamurthy, Alekh Agarwal, John Langford
ICMLW 2019 Off-Policy Policy Gradient with State Distribution Correction Yao Liu, Adith Swaminathan, Alekh Agarwal, Emma Brunskill
UAI 2019 Off-Policy Policy Gradient with Stationary Distribution Correction Yao Liu, Adith Swaminathan, Alekh Agarwal, Emma Brunskill
ICML 2019 Provably Efficient RL with Rich Observations via Latent State Decoding Simon Du, Akshay Krishnamurthy, Nan Jiang, Alekh Agarwal, Miroslav Dudik, John Langford
ICML 2019 Warm-Starting Contextual Bandits: Robustly Combining Supervised and Bandit Feedback Chicheng Zhang, Alekh Agarwal, Hal Daumé Iii, John Langford, Sahand Negahban
ICML 2018 A Reductions Approach to Fair Classification Alekh Agarwal, Alina Beygelzimer, Miroslav Dudik, John Langford, Hanna Wallach
COLT 2018 Efficient Contextual Bandits in Non-Stationary Worlds Haipeng Luo, Chen-Yu Wei, Alekh Agarwal, John Langford
ICML 2018 Hierarchical Imitation and Reinforcement Learning Hoang Le, Nan Jiang, Alekh Agarwal, Miroslav Dudik, Yisong Yue, Hal Daumé
NeurIPS 2018 On Oracle-Efficient PAC RL with Rich Observations Christoph Dann, Nan Jiang, Akshay Krishnamurthy, Alekh Agarwal, John Langford, Robert E. Schapire
COLT 2018 Open Problem: The Dependence of Sample Complexity Lower Bounds on Planning Horizon Nan Jiang, Alekh Agarwal
ICML 2018 Practical Contextual Bandits with Regression Oracles Dylan Foster, Alekh Agarwal, Miroslav Dudik, Haipeng Luo, Robert Schapire
ICML 2017 Active Learning for Cost-Sensitive Classification Akshay Krishnamurthy, Alekh Agarwal, Tzu-Kuo Huang, Hal Daumé, John Langford
ICML 2017 Contextual Decision Processes with Low Bellman Rank Are PAC-Learnable Nan Jiang, Akshay Krishnamurthy, Alekh Agarwal, John Langford, Robert E. Schapire
COLT 2017 Corralling a Band of Bandit Algorithms Alekh Agarwal, Haipeng Luo, Behnam Neyshabur, Robert E. Schapire
NeurIPS 2017 Off-Policy Evaluation for Slate Recommendation Adith Swaminathan, Akshay Krishnamurthy, Alekh Agarwal, Miro Dudik, John Langford, Damien Jose, Imed Zitouni
COLT 2017 Open Problem: First-Order Regret Bounds for Contextual Bandits Alekh Agarwal, Akshay Krishnamurthy, John Langford, Haipeng Luo, Robert E. Schapire
ICML 2017 Optimal and Adaptive Off-Policy Evaluation in Contextual Bandits Yu-Xiang Wang, Alekh Agarwal, Miroslav Dudı́k
NeurIPS 2016 Contextual Semibandits via Supervised Learning Oracles Akshay Krishnamurthy, Alekh Agarwal, Miro Dudik
NeurIPS 2016 Efficient Second Order Online Learning by Sketching Haipeng Luo, Alekh Agarwal, Nicolò Cesa-Bianchi, John Langford
NeurIPS 2016 PAC Reinforcement Learning with Rich Observations Akshay Krishnamurthy, Alekh Agarwal, John Langford
ICML 2015 A Lower Bound for the Optimization of Finite Sums Alekh Agarwal, Leon Bottou
NeurIPS 2015 Efficient and Parsimonious Agnostic Active Learning Tzu-Kuo Huang, Alekh Agarwal, Daniel J. Hsu, John Langford, Robert E. Schapire
NeurIPS 2015 Fast Convergence of Regularized Learning in Games Vasilis Syrgkanis, Alekh Agarwal, Haipeng Luo, Robert E. Schapire
ICML 2015 Learning to Search Better than Your Teacher Kai-Wei Chang, Akshay Krishnamurthy, Alekh Agarwal, Hal Daumé, John Langford
JMLR 2014 A Reliable Effective Terascale Linear Learning System Alekh Agarwal, Oliveier Chapelle, Miroslav Dudík, John Langford
COLT 2014 Learning Sparsely Used Overcomplete Dictionaries Alekh Agarwal, Animashree Anandkumar, Prateek Jain, Praneeth Netrapalli, Rashish Tandon
ICML 2014 Least Squares Revisited: Scalable Approaches for Multi-Class Prediction Alekh Agarwal, Sham Kakade, Nikos Karampatziakis, Le Song, Gregory Valiant
COLT 2014 Robust Multi-Objective Learning with Mentor Feedback Alekh Agarwal, Ashwinkumar Badanidiyuru, Miroslav Dudík, Robert E. Schapire, Aleksandrs Slivkins
NeurIPS 2014 Scalable Non-Linear Learning with Adaptive Polynomial Expansions Alekh Agarwal, Alina Beygelzimer, Daniel J. Hsu, John Langford, Matus J Telgarsky
ICML 2014 Taming the Monster: A Fast and Simple Algorithm for Contextual Bandits Alekh Agarwal, Daniel Hsu, Satyen Kale, John Langford, Lihong Li, Robert Schapire
ICML 2013 Selective Sampling Algorithms for Cost-Sensitive Multiclass Prediction Alekh Agarwal
AISTATS 2012 Contextual Bandit Learning with Predictable Rewards Alekh Agarwal, Miroslav Dudik, Satyen Kale, John Langford, Robert Schapire
NeurIPS 2012 Stochastic Optimization and Sparse Statistical Recovery: Optimal Algorithms for High Dimensions Alekh Agarwal, Sahand Negahban, Martin J. Wainwright
NeurIPS 2011 Distributed Delayed Stochastic Optimization Alekh Agarwal, John C. Duchi
UAI 2011 Learning with Missing Features Afshin Rostamizadeh, Alekh Agarwal, Peter L. Bartlett
ICML 2011 Noisy Matrix Decomposition via Convex Relaxation: Optimal Rates in High Dimensions Alekh Agarwal, Sahand N. Negahban, Martin J. Wainwright
COLT 2011 Oracle Inequalities for Computationally Budgeted Model Selection Alekh Agarwal, John C. Duchi, Peter L. Bartlett, Clement Levrard
NeurIPS 2011 Stochastic Convex Optimization with Bandit Feedback Alekh Agarwal, Dean P. Foster, Daniel J. Hsu, Sham M. Kakade, Alexander Rakhlin
NeurIPS 2010 Distributed Dual Averaging in Networks Alekh Agarwal, Martin J. Wainwright, John C. Duchi
NeurIPS 2010 Fast Global Convergence Rates of Gradient Methods for High-Dimensional Statistical Recovery Alekh Agarwal, Sahand Negahban, Martin J. Wainwright
JMLR 2010 Message-Passing for Graph-Structured Linear Programs: Proximal Methods and Rounding Schemes Pradeep Ravikumar, Alekh Agarwal, Martin J. Wainwright
COLT 2010 Optimal Algorithms for Online Convex Optimization with Multi-Point Bandit Feedback Alekh Agarwal, Ofer Dekel, Lin Xiao
AISTATS 2010 Optimal Allocation Strategies for the Dark Pool Problem Alekh Agarwal, Peter Bartlett, Max Dama
COLT 2009 A Stochastic View of Optimal Regret Through Minimax Duality Jacob D. Abernethy, Alekh Agarwal, Peter L. Bartlett, Alexander Rakhlin
NeurIPS 2009 Information-Theoretic Lower Bounds on the Oracle Complexity of Convex Optimization Alekh Agarwal, Martin J. Wainwright, Peter L. Bartlett, Pradeep K. Ravikumar
ICML 2008 Message-Passing for Graph-Structured Linear Programs: Proximal Projections, Convergence and Rounding Schemes Pradeep Ravikumar, Alekh Agarwal, Martin J. Wainwright
NeurIPS 2007 An Analysis of Inference with the Universum Olivier Chapelle, Alekh Agarwal, Fabian H. Sinz, Bernhard Schölkopf
ICML 2007 Learning Random Walks to Rank Nodes in Graphs Alekh Agarwal, Soumen Chakrabarti