Agarwal, Alekh

97 publications

ICML 2025 Catoni Contextual Bandits Are Robust to Heavy-Tailed Rewards Chenlu Ye, Yujia Jin, Alekh Agarwal, Tong Zhang

ICML 2025 Design Considerations in Offline Preference-Based RL Alekh Agarwal, Christoph Dann, Teodor Vanislavov Marinov

TMLR 2025 Preserving Expert-Level Privacy in Offline Reinforcement Learning Navodita Sharma, Vishnu Vinod, Abhradeep Guha Thakurta, Alekh Agarwal, Borja Balle, Christoph Dann, Aravindan Raghuveer

ICLR 2025 Rewarding Progress: Scaling Automated Process Verifiers for LLM Reasoning Amrith Setlur, Chirag Nagpal, Adam Fisch, Xinyang Geng, Jacob Eisenstein, Rishabh Agarwal, Alekh Agarwal, Jonathan Berant, Aviral Kumar

TMLR 2025 Robust Preference Optimization Through Reward Model Distillation Adam Fisch, Jacob Eisenstein, Vicky Zayats, Alekh Agarwal, Ahmad Beirami, Chirag Nagpal, Peter Shaw, Jonathan Berant

ICML 2025 Theoretical Guarantees on the Best-of-N Alignment Policy Ahmad Beirami, Alekh Agarwal, Jonathan Berant, Alexander D’Amour, Jacob Eisenstein, Chirag Nagpal, Ananda Theertha Suresh

ALT 2024 A Mechanism for Sample-Efficient In-Context Learning for Sparse Retrieval Tasks Jacob Abernethy, Alekh Agarwal, Teodor Vanislavov Marinov, Manfred K. Warmuth

ICML 2024 A Minimaximalist Approach to Reinforcement Learning from Human Feedback Gokul Swamy, Christoph Dann, Rahul Kidambi, Steven Wu, Alekh Agarwal

NeurIPSW 2024 Conditional Language Policy: A General Framework for Steerable Multi-Objective Finetuning Kaiwen Wang, Rahul Kidambi, Ryan Sullivan, Alekh Agarwal, Christoph Dann, Andrea Michi, Marco Gelmi, Yunxuan Li, Raghav Gupta, Kumar Avinava Dubey, Alexandre Rame, Johan Ferret, Geoffrey Cideron, Le Hou, Hongkun Yu, Amr Ahmed, Aranyak Mehta, Leonard Hussenot, Olivier Bachem, Edouard Leurent

JMLR 2024 Model-Free Representation Learning and Exploration in Low-Rank MDPs Aditya Modi, Jinglin Chen, Akshay Krishnamurthy, Nan Jiang, Alekh Agarwal

ICML 2024 More Benefits of Being Distributional: Second-Order Bounds for Reinforcement Learning Kaiwen Wang, Owen Oertell, Alekh Agarwal, Nathan Kallus, Wen Sun

NeurIPSW 2024 P3O: Pessimistic Preference-Based Policy Optimization for Robust Alignment from Preferences Dhawal Gupta, Christoph Dann, Alekh Agarwal

NeurIPS 2024 Small Steps No More: Global Convergence of Stochastic Gradient Bandits for Arbitrary Learning Rates Jincheng Mei, Bo Dai, Alekh Agarwal, Sharan Vaswani, Anant Raj, Csaba Szepesvári, Dale Schuurmans

ICML 2024 The Non-Linear $f$-Design and Applications to Interactive Learning Alekh Agarwal, Jian Qian, Alexander Rakhlin, Tong Zhang

NeurIPSW 2023 An Empirical Evaluation of Federated Contextual Bandit Algorithms Alekh Agarwal, Hugh McMahan, Zheng Xu

ICML 2023 Learning in POMDPs Is Sample-Efficient with Hindsight Observability Jonathan Lee, Alekh Agarwal, Christoph Dann, Tong Zhang

NeurIPS 2023 Ordering-Based Conditions for Global Convergence of Policy Gradient Methods Jincheng Mei, Bo Dai, Alekh Agarwal, Mohammad Ghavamzadeh, Csaba Szepesvari, Dale Schuurmans

COLT 2023 Provable Benefits of Representational Transfer in Reinforcement Learning Alekh Agarwal, Yuda Song, Wen Sun, Kaiwen Wang, Mengdi Wang, Xuezhou Zhang

NeurIPSW 2023 Reward Model Underspecification in Language Model Alignment Jacob Eisenstein, Jonathan Berant, Chirag Nagpal, Alekh Agarwal, Ahmad Beirami, Alexander Nicholas D'Amour, Krishnamurthy Dj Dvijotham, Katherine A Heller, Stephen Robert Pfohl, Deepak Ramachandran

ICML 2023 Stochastic Gradient Succeeds for Bandits Jincheng Mei, Zixin Zhong, Bo Dai, Alekh Agarwal, Csaba Szepesvari, Dale Schuurmans

COLT 2023 VO$Q$L: Towards Optimal Regret in Model-Free RL with Nonlinear Function Approximation Alekh Agarwal, Yujia Jin, Tong Zhang

ICML 2022 Adversarially Trained Actor Critic for Offline Reinforcement Learning Ching-An Cheng, Tengyang Xie, Nan Jiang, Alekh Agarwal

ICML 2022 Efficient Reinforcement Learning in Block MDPs: A Model-Free Representation Learning Approach Xuezhou Zhang, Yuda Song, Masatoshi Uehara, Mengdi Wang, Alekh Agarwal, Wen Sun

COLT 2022 Minimax Regret Optimization for Robust Machine Learning Under Distribution Shift Alekh Agarwal, Tong Zhang

NeurIPS 2022 Model-Based RL with Optimistic Posterior Sampling: Structural Conditions and Sample Complexity Alekh Agarwal, Tong Zhang

COLT 2022 Non-Linear Reinforcement Learning in Large Action Spaces: Structural Conditions and Sample-Efficiency of Posterior Sampling Alekh Agarwal, Tong Zhang

NeurIPS 2022 On the Statistical Efficiency of Reward-Free Exploration in Non-Linear RL Jinglin Chen, Aditya Modi, Akshay Krishnamurthy, Nan Jiang, Alekh Agarwal

NeurIPSW 2022 Provable Benefits of Representational Transfer in Reinforcement Learning Alekh Agarwal, Yuda Song, Kaiwen Wang, Mengdi Wang, Wen Sun, Xuezhou Zhang

ICLR 2022 Provably Filtering Exogenous Distractors Using Multistep Inverse Dynamics Yonathan Efroni, Dipendra Misra, Akshay Krishnamurthy, Alekh Agarwal, John Langford

JMLR 2021 A Contextual Bandit Bake-Off Alberto Bietti, Alekh Agarwal, John Langford

NeurIPS 2021 Bellman-Consistent Pessimism for Offline Reinforcement Learning Tengyang Xie, Ching-An Cheng, Nan Jiang, Paul Mineiro, Alekh Agarwal

COLT 2021 Cautiously Optimistic Policy Optimization and Exploration with Linear Function Approximation Andrea Zanette, Ching-An Cheng, Alekh Agarwal

JMLR 2021 On the Theory of Policy Gradient Methods: Optimality, Approximation, and Distribution Shift Alekh Agarwal, Sham M. Kakade, Jason D. Lee, Gaurav Mahajan

ICML 2021 Provably Correct Optimization and Exploration with Non-Linear Policies Fei Feng, Wotao Yin, Alekh Agarwal, Lin Yang

COLT 2021 Towards a Dimension-Free Understanding of Adaptive Linear Control Juan C Perdomo, Max Simchowitz, Alekh Agarwal, Peter Bartlett

ICLR 2020 Deep Batch Active Learning by Diverse, Uncertain Gradient Lower Bounds Jordan T. Ash, Chicheng Zhang, Akshay Krishnamurthy, John Langford, Alekh Agarwal

NeurIPS 2020 FLAMBE: Structural Complexity and Representation Learning of Low Rank MDPs Alekh Agarwal, Sham Kakade, Akshay Krishnamurthy, Wen Sun

AAAI 2020 Metareasoning in Modular Software Systems: On-the-Fly Configuration Using Reinforcement Learning with Rich Contextual Representations Aditya Modi, Debadeepta Dey, Alekh Agarwal, Adith Swaminathan, Besmira Nushi, Sean Andrist, Eric Horvitz

COLT 2020 Model-Based Reinforcement Learning with a Generative Model Is Minimax Optimal Alekh Agarwal, Sham Kakade, Lin F. Yang

COLT 2020 Optimality and Approximation with Policy Gradient Methods in Markov Decision Processes Alekh Agarwal, Sham M Kakade, Jason D Lee, Gaurav Mahajan

NeurIPS 2020 PC-PG: Policy Cover Directed Exploration for Provable Policy Gradient Learning Alekh Agarwal, Mikael Henaff, Sham Kakade, Wen Sun

NeurIPS 2020 Policy Improvement via Imitation of Multiple Oracles Ching-An Cheng, Andrey Kolobov, Alekh Agarwal

NeurIPS 2020 Provably Good Batch Off-Policy Reinforcement Learning Without Great Exploration Yao Liu, Adith Swaminathan, Alekh Agarwal, Emma Brunskill

NeurIPS 2020 Safe Reinforcement Learning via Curriculum Induction Matteo Turchetta, Andrey Kolobov, Shital Shah, Andreas Krause, Alekh Agarwal

COLT 2020 Taking a Hint: How to Leverage Loss Predictors in Contextual Bandits? Chen-Yu Wei, Haipeng Luo, Alekh Agarwal

JMLR 2019 Active Learning for Cost-Sensitive Classification Akshay Krishnamurthy, Alekh Agarwal, Tzu-Kuo Huang, Hal Daumé Iii, John Langford

NeurIPS 2019 Bias Correction of Learned Generative Models Using Likelihood-Free Importance Weighting Aditya Grover, Jiaming Song, Ashish Kapoor, Kenneth Tran, Alekh Agarwal, Eric J Horvitz, Stefano Ermon

ICLRW 2019 Bias Correction of Learned Generative Models via Likelihood-Free Importance Weighting Aditya Grover, Jiaming Song, Ashish Kapoor, Kenneth Tran, Alekh Agarwal, Eric Horvitz, Stefano Ermon

ICML 2019 Fair Regression: Quantitative Definitions and Reduction-Based Algorithms Alekh Agarwal, Miroslav Dudik, Zhiwei Steven Wu

COLT 2019 Model-Based RL in Contextual Decision Processes: PAC Bounds and Exponential Improvements over Model-Free Approaches Wen Sun, Nan Jiang, Akshay Krishnamurthy, Alekh Agarwal, John Langford

ICMLW 2019 Off-Policy Policy Gradient with State Distribution Correction Yao Liu, Adith Swaminathan, Alekh Agarwal, Emma Brunskill

UAI 2019 Off-Policy Policy Gradient with Stationary Distribution Correction Yao Liu, Adith Swaminathan, Alekh Agarwal, Emma Brunskill

ICML 2019 Provably Efficient RL with Rich Observations via Latent State Decoding Simon Du, Akshay Krishnamurthy, Nan Jiang, Alekh Agarwal, Miroslav Dudik, John Langford

ICML 2019 Warm-Starting Contextual Bandits: Robustly Combining Supervised and Bandit Feedback Chicheng Zhang, Alekh Agarwal, Hal Daumé Iii, John Langford, Sahand Negahban

ICML 2018 A Reductions Approach to Fair Classification Alekh Agarwal, Alina Beygelzimer, Miroslav Dudik, John Langford, Hanna Wallach

COLT 2018 Efficient Contextual Bandits in Non-Stationary Worlds Haipeng Luo, Chen-Yu Wei, Alekh Agarwal, John Langford

ICML 2018 Hierarchical Imitation and Reinforcement Learning Hoang Le, Nan Jiang, Alekh Agarwal, Miroslav Dudik, Yisong Yue, Hal Daumé

NeurIPS 2018 On Oracle-Efficient PAC RL with Rich Observations Christoph Dann, Nan Jiang, Akshay Krishnamurthy, Alekh Agarwal, John Langford, Robert E. Schapire

COLT 2018 Open Problem: The Dependence of Sample Complexity Lower Bounds on Planning Horizon Nan Jiang, Alekh Agarwal

ICML 2018 Practical Contextual Bandits with Regression Oracles Dylan Foster, Alekh Agarwal, Miroslav Dudik, Haipeng Luo, Robert Schapire

ICML 2017 Active Learning for Cost-Sensitive Classification Akshay Krishnamurthy, Alekh Agarwal, Tzu-Kuo Huang, Hal Daumé, John Langford

ICML 2017 Contextual Decision Processes with Low Bellman Rank Are PAC-Learnable Nan Jiang, Akshay Krishnamurthy, Alekh Agarwal, John Langford, Robert E. Schapire

COLT 2017 Corralling a Band of Bandit Algorithms Alekh Agarwal, Haipeng Luo, Behnam Neyshabur, Robert E. Schapire

NeurIPS 2017 Off-Policy Evaluation for Slate Recommendation Adith Swaminathan, Akshay Krishnamurthy, Alekh Agarwal, Miro Dudik, John Langford, Damien Jose, Imed Zitouni

COLT 2017 Open Problem: First-Order Regret Bounds for Contextual Bandits Alekh Agarwal, Akshay Krishnamurthy, John Langford, Haipeng Luo, Robert E. Schapire

ICML 2017 Optimal and Adaptive Off-Policy Evaluation in Contextual Bandits Yu-Xiang Wang, Alekh Agarwal, Miroslav Dudı́k

NeurIPS 2016 Contextual Semibandits via Supervised Learning Oracles Akshay Krishnamurthy, Alekh Agarwal, Miro Dudik

NeurIPS 2016 Efficient Second Order Online Learning by Sketching Haipeng Luo, Alekh Agarwal, Nicolò Cesa-Bianchi, John Langford

NeurIPS 2016 PAC Reinforcement Learning with Rich Observations Akshay Krishnamurthy, Alekh Agarwal, John Langford

ICML 2015 A Lower Bound for the Optimization of Finite Sums Alekh Agarwal, Leon Bottou

NeurIPS 2015 Efficient and Parsimonious Agnostic Active Learning Tzu-Kuo Huang, Alekh Agarwal, Daniel J. Hsu, John Langford, Robert E. Schapire

NeurIPS 2015 Fast Convergence of Regularized Learning in Games Vasilis Syrgkanis, Alekh Agarwal, Haipeng Luo, Robert E. Schapire

ICML 2015 Learning to Search Better than Your Teacher Kai-Wei Chang, Akshay Krishnamurthy, Alekh Agarwal, Hal Daumé, John Langford

JMLR 2014 A Reliable Effective Terascale Linear Learning System Alekh Agarwal, Oliveier Chapelle, Miroslav Dudík, John Langford

COLT 2014 Learning Sparsely Used Overcomplete Dictionaries Alekh Agarwal, Animashree Anandkumar, Prateek Jain, Praneeth Netrapalli, Rashish Tandon

ICML 2014 Least Squares Revisited: Scalable Approaches for Multi-Class Prediction Alekh Agarwal, Sham Kakade, Nikos Karampatziakis, Le Song, Gregory Valiant

COLT 2014 Robust Multi-Objective Learning with Mentor Feedback Alekh Agarwal, Ashwinkumar Badanidiyuru, Miroslav Dudík, Robert E. Schapire, Aleksandrs Slivkins

NeurIPS 2014 Scalable Non-Linear Learning with Adaptive Polynomial Expansions Alekh Agarwal, Alina Beygelzimer, Daniel J. Hsu, John Langford, Matus J Telgarsky

ICML 2014 Taming the Monster: A Fast and Simple Algorithm for Contextual Bandits Alekh Agarwal, Daniel Hsu, Satyen Kale, John Langford, Lihong Li, Robert Schapire

ICML 2013 Selective Sampling Algorithms for Cost-Sensitive Multiclass Prediction Alekh Agarwal

AISTATS 2012 Contextual Bandit Learning with Predictable Rewards Alekh Agarwal, Miroslav Dudik, Satyen Kale, John Langford, Robert Schapire

NeurIPS 2012 Stochastic Optimization and Sparse Statistical Recovery: Optimal Algorithms for High Dimensions Alekh Agarwal, Sahand Negahban, Martin J. Wainwright

NeurIPS 2011 Distributed Delayed Stochastic Optimization Alekh Agarwal, John C. Duchi

UAI 2011 Learning with Missing Features Afshin Rostamizadeh, Alekh Agarwal, Peter L. Bartlett

ICML 2011 Noisy Matrix Decomposition via Convex Relaxation: Optimal Rates in High Dimensions Alekh Agarwal, Sahand N. Negahban, Martin J. Wainwright

COLT 2011 Oracle Inequalities for Computationally Budgeted Model Selection Alekh Agarwal, John C. Duchi, Peter L. Bartlett, Clement Levrard

NeurIPS 2011 Stochastic Convex Optimization with Bandit Feedback Alekh Agarwal, Dean P. Foster, Daniel J. Hsu, Sham M. Kakade, Alexander Rakhlin

NeurIPS 2010 Distributed Dual Averaging in Networks Alekh Agarwal, Martin J. Wainwright, John C. Duchi

NeurIPS 2010 Fast Global Convergence Rates of Gradient Methods for High-Dimensional Statistical Recovery Alekh Agarwal, Sahand Negahban, Martin J. Wainwright

JMLR 2010 Message-Passing for Graph-Structured Linear Programs: Proximal Methods and Rounding Schemes Pradeep Ravikumar, Alekh Agarwal, Martin J. Wainwright

COLT 2010 Optimal Algorithms for Online Convex Optimization with Multi-Point Bandit Feedback Alekh Agarwal, Ofer Dekel, Lin Xiao

AISTATS 2010 Optimal Allocation Strategies for the Dark Pool Problem Alekh Agarwal, Peter Bartlett, Max Dama

COLT 2009 A Stochastic View of Optimal Regret Through Minimax Duality Jacob D. Abernethy, Alekh Agarwal, Peter L. Bartlett, Alexander Rakhlin

NeurIPS 2009 Information-Theoretic Lower Bounds on the Oracle Complexity of Convex Optimization Alekh Agarwal, Martin J. Wainwright, Peter L. Bartlett, Pradeep K. Ravikumar

ICML 2008 Message-Passing for Graph-Structured Linear Programs: Proximal Projections, Convergence and Rounding Schemes Pradeep Ravikumar, Alekh Agarwal, Martin J. Wainwright

NeurIPS 2007 An Analysis of Inference with the Universum Olivier Chapelle, Alekh Agarwal, Fabian H. Sinz, Bernhard Schölkopf

ICML 2007 Learning Random Walks to Rank Nodes in Graphs Alekh Agarwal, Soumen Chakrabarti