Paduraru, Cosmin

10 publications

ICLR 2025 Training Language Models to Self-Correct via Reinforcement Learning Aviral Kumar, Vincent Zhuang, Rishabh Agarwal, Yi Su, John D Co-Reyes, Avi Singh, Kate Baumli, Shariq Iqbal, Colton Bishop, Rebecca Roelofs, Lei M Zhang, Kay McKinney, Disha Shrivastava, Cosmin Paduraru, George Tucker, Doina Precup, Feryal Behbahani, Aleksandra Faust
ICML 2023 Transformers Meet Directed Graphs Simon Geisler, Yujia Li, Daniel J Mankowitz, Ali Taylan Cemgil, Stephan Günnemann, Cosmin Paduraru
ICLR 2022 COptiDICE: Offline Constrained Reinforcement Learning via Stationary Distribution Correction Estimation Jongmin Lee, Cosmin Paduraru, Daniel J Mankowitz, Nicolas Heess, Doina Precup, Kee-Eung Kim, Arthur Guez
NeurIPS 2021 Active Offline Policy Selection Ksenia Konyushova, Yutian Chen, Thomas Paine, Caglar Gulcehre, Cosmin Paduraru, Daniel J Mankowitz, Misha Denil, Nando de Freitas
ICLR 2021 Autoregressive Dynamics Models for Offline Policy Evaluation and Optimization Michael R Zhang, Thomas Paine, Ofir Nachum, Cosmin Paduraru, George Tucker, Ziyu Wang, Mohammad Norouzi
ICLR 2021 Benchmarks for Deep Off-Policy Evaluation Justin Fu, Mohammad Norouzi, Ofir Nachum, George Tucker, Ziyu Wang, Alexander Novikov, Mengjiao Yang, Michael R Zhang, Yutian Chen, Aviral Kumar, Cosmin Paduraru, Sergey Levine, Thomas Paine
MLJ 2021 Challenges of Real-World Reinforcement Learning: Definitions, Benchmarks and Analysis Gabriel Dulac-Arnold, Nir Levine, Daniel J. Mankowitz, Jerry Li, Cosmin Paduraru, Sven Gowal, Todd Hester
NeurIPS 2020 RL Unplugged: A Suite of Benchmarks for Offline Reinforcement Learning Caglar Gulcehre, Ziyu Wang, Alexander Novikov, Thomas Paine, Sergio Gómez, Konrad Zolna, Rishabh Agarwal, Josh S Merel, Daniel J Mankowitz, Cosmin Paduraru, Gabriel Dulac-Arnold, Jerry Li, Mohammad Norouzi, Matthew Hoffman, Nicolas Heess, Nando de Freitas
IJCAI 2007 Grounding Abstractions in Predictive State Representations Brian Tanner, Vadim Bulitko, Anna Koop, Cosmin Paduraru
NeurIPS 2005 Off-Policy Learning with Options and Recognizers Doina Precup, Cosmin Paduraru, Anna Koop, Richard S. Sutton, Satinder P. Singh