Agarwal, Rishabh

65 publications

ICLR 2025 Asynchronous RLHF: Faster and More Efficient Off-Policy RL for Language Models Michael Noukhovitch, Shengyi Huang, Sophie Xhonneux, Arian Hosseini, Rishabh Agarwal, Aaron Courville

ICLRW 2025 Don't Throw Away Data: Improving Sequence Knowledge Distillation with Minimum Bayes Risk Decoding Jun Wang, Eleftheria Briakou, Hamid Dadkhahi, Rishabh Agarwal, Colin Cherry, Trevor Cohn

ICLR 2025 Generative Verifiers: Reward Modeling as Next-Token Prediction Lunjun Zhang, Arian Hosseini, Hritik Bansal, Mehran Kazemi, Aviral Kumar, Rishabh Agarwal

ICLR 2025 Inference-Aware Fine-Tuning for Best-of-N Sampling in Large Language Models Yinlam Chow, Guy Tennenholtz, Izzeddin Gur, Vincent Zhuang, Bo Dai, Aviral Kumar, Rishabh Agarwal, Sridhar Thiagarajan, Craig Boutilier, Aleksandra Faust

ICML 2025 Reward-Guided Prompt Evolving in Reinforcement Learning for LLMs Ziyu Ye, Rishabh Agarwal, Tianqi Liu, Rishabh Joshi, Sarmishta Velury, Quoc V Le, Qijun Tan, Yuan Liu

ICLR 2025 Rewarding Progress: Scaling Automated Process Verifiers for LLM Reasoning Amrith Setlur, Chirag Nagpal, Adam Fisch, Xinyang Geng, Jacob Eisenstein, Rishabh Agarwal, Alekh Agarwal, Jonathan Berant, Aviral Kumar

ICLR 2025 Smaller, Weaker, yet Better: Training LLM Reasoners via Compute-Optimal Sampling Hritik Bansal, Arian Hosseini, Rishabh Agarwal, Vinh Q. Tran, Mehran Kazemi

ICLR 2025 Speculative Knowledge Distillation: Bridging the Teacher-Student Gap Through Interleaved Sampling Wenda Xu, Rujun Han, Zifeng Wang, Long Le, Dhruv Madeka, Lei Li, William Yang Wang, Rishabh Agarwal, Chen-Yu Lee, Tomas Pfister

ICLR 2025 Training Language Models to Self-Correct via Reinforcement Learning Aviral Kumar, Vincent Zhuang, Rishabh Agarwal, Yi Su, John D Co-Reyes, Avi Singh, Kate Baumli, Shariq Iqbal, Colton Bishop, Rebecca Roelofs, Lei M Zhang, Kay McKinney, Disha Shrivastava, Cosmin Paduraru, George Tucker, Doina Precup, Feryal Behbahani, Aleksandra Faust

TMLR 2024 Beyond Human Data: Scaling Self-Training for Problem-Solving with Language Models Avi Singh, John D Co-Reyes, Rishabh Agarwal, Ankesh Anand, Piyush Patil, Xavier Garcia, Peter J Liu, James Harrison, Jaehoon Lee, Kelvin Xu, Aaron T Parisi, Abhishek Kumar, Alexander A Alemi, Alex Rizkowsky, Azade Nova, Ben Adlam, Bernd Bohnet, Gamaleldin Fathy Elsayed, Hanie Sedghi, Igor Mordatch, Isabelle Simpson, Izzeddin Gur, Jasper Snoek, Jeffrey Pennington, Jiri Hron, Kathleen Kenealy, Kevin Swersky, Kshiteej Mahajan, Laura A Culp, Lechao Xiao, Maxwell Bileschi, Noah Constant, Roman Novak, Rosanne Liu, Tris Warkentin, Yamini Bansal, Ethan Dyer, Behnam Neyshabur, Jascha Sohl-Dickstein, Noah Fiedel

ICLRW 2024 Beyond Human Data: Scaling Self-Training for Problem-Solving with Language Models Avi Singh, John D Co-Reyes, Rishabh Agarwal

ICLR 2024 DistillSpec: Improving Speculative Decoding via Knowledge Distillation Yongchao Zhou, Kaifeng Lyu, Ankit Singh Rawat, Aditya Krishna Menon, Afshin Rostamizadeh, Sanjiv Kumar, Jean-François Kagy, Rishabh Agarwal

NeurIPSW 2024 Evolving Alignment via Asymmetric Self-Play Ziyu Ye, Rishabh Agarwal, Tianqi Liu, Rishabh Joshi, Sarmishta Velury, Quoc V Le, Qijun Tan, Yuan Liu

NeurIPSW 2024 Faster, More Efficient RLHF Through Off-Policy Asynchronous Learning Michael Noukhovitch, Shengyi Huang, Sophie Xhonneux, Arian Hosseini, Rishabh Agarwal, Aaron Courville

NeurIPSW 2024 Generative Verifiers: Reward Modeling as Next-Token Prediction Lunjun Zhang, Arian Hosseini, Hritik Bansal, Mehran Kazemi, Aviral Kumar, Rishabh Agarwal

NeurIPS 2024 Many-Shot In-Context Learning Rishabh Agarwal, Avi Singh, Lei Zhang, Bernd Bohnet, Luis Rosias, Stephanie Chan, Biao Zhang, Ankesh Anand, Zaheer Abbas, Azade Nova, John D. Co-Reyes, Eric Chu, Feryal Behbahani, Aleksandra Faust, Hugo Larochelle

ICMLW 2024 Many-Shot In-Context Learning Rishabh Agarwal, Avi Singh, Lei M Zhang, Bernd Bohnet, Luis Rosias, Stephanie C.Y. Chan, Biao Zhang, Aleksandra Faust, Hugo Larochelle

ICMLW 2024 Many-Shot In-Context Learning Rishabh Agarwal, Avi Singh, Lei M Zhang, Bernd Bohnet, Luis Rosias, Stephanie C.Y. Chan, Biao Zhang, Ankesh Anand, Zaheer Abbas, Azade Nova, John D Co-Reyes, Eric Chu, Feryal Behbahani, Aleksandra Faust, Hugo Larochelle

NeurIPSW 2024 Not All LLM Reasoners Are Created Equal Arian Hosseini, Alessandro Sordoni, Daniel Kenji Toyama, Aaron Courville, Rishabh Agarwal

NeurIPS 2024 On Scalable Oversight with Weak LLMs Judging Strong LLMs Zachary Kenton, Noah Y. Siegel, János Kramár, Jonah Brown-Cohen, Samuel Albanie, Jannis Bulian, Rishabh Agarwal, David Lindner, Yunhao Tang, Noah D. Goodman, Rohin Shah

ICLR 2024 On-Policy Distillation of Language Models: Learning from Self-Generated Mistakes Rishabh Agarwal, Nino Vieillard, Yongchao Zhou, Piotr Stanczyk, Sabela Ramos Garea, Matthieu Geist, Olivier Bachem

ICML 2024 SiT: Symmetry-Invariant Transformers for Generalisation in Reinforcement Learning Matthias Weissenbacher, Rishabh Agarwal, Yoshinobu Kawahara

NeurIPSW 2024 Smaller, Weaker, yet Better: Training LLM Reasoners via Compute-Optimal Sampling Hritik Bansal, Arian Hosseini, Rishabh Agarwal, Vinh Q. Tran, Mehran Kazemi

ICML 2024 Stop Regressing: Training Value Functions via Classification for Scalable Deep RL Jesse Farebrother, Jordi Orbay, Quan Vuong, Adrien Ali Taiga, Yevgen Chebotar, Ted Xiao, Alex Irpan, Sergey Levine, Pablo Samuel Castro, Aleksandra Faust, Aviral Kumar, Rishabh Agarwal

ICLRW 2024 Transformers Can Achieve Length Generalization but Not Robustly Yongchao Zhou, Uri Alon, Xinyun Chen, Xuezhi Wang, Rishabh Agarwal, Denny Zhou

AISTATS 2023 A Novel Stochastic Gradient Descent Algorithm for Learning Principal Subspaces Charline Le Lan, Joshua Greaves, Jesse Farebrother, Mark Rowland, Fabian Pedregosa, Rishabh Agarwal, Marc G. Bellemare

ICML 2023 Bigger, Better, Faster: Human-Level Atari with Human-Level Efficiency Max Schwarzer, Johan Samir Obando Ceron, Aaron Courville, Marc G Bellemare, Rishabh Agarwal, Pablo Samuel Castro

ICML 2023 Bootstrapped Representations in Reinforcement Learning Charline Le Lan, Stephen Tu, Mark Rowland, Anna Harutyunyan, Rishabh Agarwal, Marc G Bellemare, Will Dabney

ICLRW 2023 Bootstrapped Representations in Reinforcement Learning Charline Le Lan, Stephen Tu, Mark Rowland, Anna Harutyunyan, Rishabh Agarwal, Marc G Bellemare, Will Dabney

ICLR 2023 Investigating Multi-Task Pretraining and Generalization in Reinforcement Learning Adrien Ali Taiga, Rishabh Agarwal, Jesse Farebrother, Aaron Courville, Marc G Bellemare

NeurIPSW 2023 Learning Silicon Dopant Transitions in Graphene Using Scanning Transmission Electron Microscopy Max Schwarzer, Jesse Farebrother, Joshua Greaves, Kevin Roccapriore, Ekin Cubuk, Rishabh Agarwal, Aaron Courville, Marc Bellemare, Sergei Kalinin, Igor Mordatch, Pablo Castro

ICLR 2023 Offline Q-Learning on Diverse Multi-Task Data Both Scales and Generalizes Aviral Kumar, Rishabh Agarwal, Xinyang Geng, George Tucker, Sergey Levine

ICLR 2023 Proto-Value Networks: Scaling Representation Learning with Auxiliary Tasks Jesse Farebrother, Joshua Greaves, Rishabh Agarwal, Charline Le Lan, Ross Goroshin, Pablo Samuel Castro, Marc G Bellemare

ICML 2023 Revisiting Bellman Errors for Offline Model Selection Joshua P Zitovsky, Daniel De Marchi, Rishabh Agarwal, Michael Rene Kosorok

NeurIPSW 2023 Scaling Offline Q-Learning with Vision Transformers Yingjie Miao, Jordi Orbay, Rishabh Agarwal, Aviral Kumar, George Tucker, Aleksandra Faust

ICML 2023 The Dormant Neuron Phenomenon in Deep Reinforcement Learning Ghada Sokar, Rishabh Agarwal, Pablo Samuel Castro, Utku Evci

NeurIPS 2023 Waymax: An Accelerated, Data-Driven Simulator for Large-Scale Autonomous Driving Research Cole Gulino, Justin Fu, Wenjie Luo, George Tucker, Eli Bronstein, Yiren Lu, Jean Harb, Xinlei Pan, Yan Wang, Xiangyu Chen, John Co-Reyes, Rishabh Agarwal, Rebecca Roelofs, Yao Lu, Nico Montali, Paul Mougin, Zoey Yang, Brandyn White, Aleksandra Faust, Rowan McAllister, Dragomir Anguelov, Benjamin Sapp

AISTATS 2022 On the Generalization of Representations in Reinforcement Learning Charline Le Lan, Stephen Tu, Adam Oberman, Rishabh Agarwal, Marc G. Bellemare

NeurIPSW 2022 A Novel Stochastic Gradient Descent Algorithm for LearningPrincipal Subspaces Charline Le Lan, Joshua Greaves, Jesse Farebrother, Mark Rowland, Fabian Pedregosa, Rishabh Agarwal, Marc G Bellemare

AAAI 2022 Control-Oriented Model-Based Reinforcement Learning with Implicit Differentiation Evgenii Nikishin, Romina Abachi, Rishabh Agarwal, Pierre-Luc Bacon

ICLR 2022 DR3: Value-Based Deep Reinforcement Learning Requires Explicit Regularization Aviral Kumar, Rishabh Agarwal, Tengyu Ma, Aaron Courville, George Tucker, Sergey Levine

NeurIPSW 2022 Democratizing RL Research by Reusing Prior Computation Rishabh Agarwal

NeurIPSW 2022 Investigating Multi-Task Pretraining and Generalization in Reinforcement Learning Adrien Ali Taiga, Rishabh Agarwal, Jesse Farebrother, Aaron Courville, Marc G Bellemare

NeurIPSW 2022 Offline Q-Learning on Diverse Multi-Task Data Both Scales and Generalizes Aviral Kumar, Rishabh Agarwal, Xinyang Geng, George Tucker, Sergey Levine

NeurIPSW 2022 Proto-Value Networks: Scaling Representation Learning with Auxiliary Tasks Jesse Farebrother, Joshua Greaves, Rishabh Agarwal, Charline Le Lan, Ross Goroshin, Pablo Samuel Castro, Marc G Bellemare

NeurIPS 2022 Reincarnating Reinforcement Learning: Reusing Prior Computation to Accelerate Progress Rishabh Agarwal, Max Schwarzer, Pablo Samuel Castro, Aaron C. Courville, Marc Bellemare

NeurIPSW 2022 Revisiting Bellman Errors for Offline Model Selection Joshua P. Zitovsky, Daniel de Marchi, Rishabh Agarwal, Michael Rene Kosorok

NeurIPSW 2022 Revisiting Bellman Errors for Offline Model Selection Joshua P Zitovsky, Rishabh Agarwal, Daniel de Marchi, Michael R Kosorok

NeurIPSW 2021 Behavior Predictive Representations for Generalization in Reinforcement Learning Siddhant Agarwal, Aaron Courville, Rishabh Agarwal

ICLR 2021 Contrastive Behavioral Similarity Embeddings for Generalization in Reinforcement Learning Rishabh Agarwal, Marlos C. Machado, Pablo Samuel Castro, Marc G Bellemare

ICMLW 2021 Control-Oriented Model-Based Reinforcement Learning with Implicit Differentiation Evgenii Nikishin, Romina Abachi, Rishabh Agarwal, Pierre-Luc Bacon

NeurIPSW 2021 DR3: Value-Based Deep Reinforcement Learning Requires Explicit Regularization Aviral Kumar, Rishabh Agarwal, Tengyu Ma, Aaron Courville, George Tucker, Sergey Levine

NeurIPS 2021 Deep Reinforcement Learning at the Edge of the Statistical Precipice Rishabh Agarwal, Max Schwarzer, Pablo Samuel Castro, Aaron C. Courville, Marc Bellemare

ICLR 2021 Implicit Under-Parameterization Inhibits Data-Efficient Deep Reinforcement Learning Aviral Kumar, Rishabh Agarwal, Dibya Ghosh, Sergey Levine

NeurIPS 2021 Neural Additive Models: Interpretable Machine Learning with Neural Nets Rishabh Agarwal, Levi Melnick, Nicholas Frosst, Xuezhou Zhang, Ben Lengerich, Rich Caruana, Geoffrey E. Hinton

ICML 2020 An Optimistic Perspective on Offline Reinforcement Learning Rishabh Agarwal, Dale Schuurmans, Mohammad Norouzi

NeurIPS 2020 RL Unplugged: A Suite of Benchmarks for Offline Reinforcement Learning Caglar Gulcehre, Ziyu Wang, Alexander Novikov, Thomas Paine, Sergio Gómez, Konrad Zolna, Rishabh Agarwal, Josh S Merel, Daniel J Mankowitz, Cosmin Paduraru, Gabriel Dulac-Arnold, Jerry Li, Mohammad Norouzi, Matthew Hoffman, Nicolas Heess, Nando de Freitas

ICML 2020 Revisiting Fundamentals of Experience Replay William Fedus, Prajit Ramachandran, Rishabh Agarwal, Yoshua Bengio, Hugo Larochelle, Mark Rowland, Will Dabney

ICML 2019 Learning to Generalize from Sparse and Underspecified Rewards Rishabh Agarwal, Chen Liang, Dale Schuurmans, Mohammad Norouzi