Schulman, John

21 publications

NeurIPS 2025 Quantifying Elicitation of Latent Capabilities in Language Models Elizabeth Donoway, Hailey Joren, Arushi Somani, Henry Sleight, Julian Michael, Michael R DeWeese, John Schulman, Ethan Perez, Fabien Roger, Jan Leike
ICLR 2024 Let's Verify Step by Step Hunter Lightman, Vineet Kosaraju, Yuri Burda, Harrison Edwards, Bowen Baker, Teddy Lee, Jan Leike, John Schulman, Ilya Sutskever, Karl Cobbe
ICMLW 2024 Rule Based Rewards for Fine-Grained LLM Safety Tong Mu, Alec Helyar, Johannes Heidecke, Joshua Achiam, Andrea Vallone, Ian D Kivlichan, Molly Lin, Alex Beutel, John Schulman, Lilian Weng
NeurIPS 2024 Rule Based Rewards for Language Model Safety Tong Mu, Alec Helyar, Johannes Heidecke, Joshua Achiam, Andrea Vallone, Ian Kivlichan, Molly Lin, Alex Beutel, John Schulman, Lilian Weng
ICML 2023 Scaling Laws for Reward Model Overoptimization Leo Gao, John Schulman, Jacob Hilton
NeurIPS 2022 Batch Size-Invariance for Policy Optimization Jacob Hilton, Karl Cobbe, John Schulman
NeurIPS 2022 Training Language Models to Follow Instructions with Human Feedback Long Ouyang, Jeffrey Wu, Xu Jiang, Diogo Almeida, Carroll Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, John Schulman, Jacob Hilton, Fraser Kelton, Luke Miller, Maddie Simens, Amanda Askell, Peter Welinder, Paul F Christiano, Jan Leike, Ryan Lowe
ICML 2021 Phasic Policy Gradient Karl W Cobbe, Jacob Hilton, Oleg Klimov, John Schulman
ICML 2020 Distribution Augmentation for Generative Modeling Heewoo Jun, Rewon Child, Mark Chen, John Schulman, Aditya Ramesh, Alec Radford, Ilya Sutskever
ICML 2020 Leveraging Procedural Generation to Benchmark Reinforcement Learning Karl Cobbe, Chris Hesse, Jacob Hilton, John Schulman
ICML 2019 Quantifying Generalization in Reinforcement Learning Karl Cobbe, Oleg Klimov, Chris Hesse, Taehoon Kim, John Schulman
ICLR 2018 Meta Learning Shared Hierarchies Kevin Frans, Jonathan Ho, Xi Chen, Pieter Abbeel, John Schulman
CoRL 2018 Model-Based Reinforcement Learning via Meta-Policy Optimization Ignasi Clavera, Jonas Rothfuss, John Schulman, Yasuhiro Fujita, Tamim Asfour, Pieter Abbeel
NeurIPS 2017 #Exploration: A Study of Count-Based Exploration for Deep Reinforcement Learning Haoran Tang, Rein Houthooft, Davis Foote, Adam Stooke, OpenAI Xi Chen, Yan Duan, John Schulman, Filip DeTurck, Pieter Abbeel
ICLR 2017 Variational Lossy Autoencoder Xi Chen, Diederik P. Kingma, Tim Salimans, Yan Duan, Prafulla Dhariwal, John Schulman, Ilya Sutskever, Pieter Abbeel
ICML 2016 Benchmarking Deep Reinforcement Learning for Continuous Control Yan Duan, Xi Chen, Rein Houthooft, John Schulman, Pieter Abbeel
ICLR 2016 High-Dimensional Continuous Control Using Generalized Advantage Estimation John Schulman, Philipp Moritz, Sergey Levine, Michael I. Jordan, Pieter Abbeel
NeurIPS 2016 InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets Xi Chen, Yan Duan, Rein Houthooft, John Schulman, Ilya Sutskever, Pieter Abbeel
NeurIPS 2016 VIME: Variational Information Maximizing Exploration Rein Houthooft, Xi Chen, Xi Chen, Yan Duan, John Schulman, Filip De Turck, Pieter Abbeel
NeurIPS 2015 Gradient Estimation Using Stochastic Computation Graphs John Schulman, Nicolas Heess, Theophane Weber, Pieter Abbeel
ICML 2015 Trust Region Policy Optimization John Schulman, Sergey Levine, Pieter Abbeel, Michael Jordan, Philipp Moritz