Xu, Kelvin

14 publications

ICML 2025 LMRL Gym: Benchmarks for Multi-Turn Reinforcement Learning with Language Models Marwa Abdulhai, Isadora White, Charlie Victor Snell, Charles Sun, Joey Hong, Yuexiang Zhai, Kelvin Xu, Sergey Levine
ICLR 2025 Scaling LLM Test-Time Compute Optimally Can Be More Effective than Scaling Parameters for Reasoning Charlie Victor Snell, Jaehoon Lee, Kelvin Xu, Aviral Kumar
TMLR 2024 Beyond Human Data: Scaling Self-Training for Problem-Solving with Language Models Avi Singh, John D Co-Reyes, Rishabh Agarwal, Ankesh Anand, Piyush Patil, Xavier Garcia, Peter J Liu, James Harrison, Jaehoon Lee, Kelvin Xu, Aaron T Parisi, Abhishek Kumar, Alexander A Alemi, Alex Rizkowsky, Azade Nova, Ben Adlam, Bernd Bohnet, Gamaleldin Fathy Elsayed, Hanie Sedghi, Igor Mordatch, Isabelle Simpson, Izzeddin Gur, Jasper Snoek, Jeffrey Pennington, Jiri Hron, Kathleen Kenealy, Kevin Swersky, Kshiteej Mahajan, Laura A Culp, Lechao Xiao, Maxwell Bileschi, Noah Constant, Roman Novak, Rosanne Liu, Tris Warkentin, Yamini Bansal, Ethan Dyer, Behnam Neyshabur, Jascha Sohl-Dickstein, Noah Fiedel
ICLR 2024 Small-Scale Proxies for Large-Scale Transformer Training Instabilities Mitchell Wortsman, Peter J Liu, Lechao Xiao, Katie E Everett, Alexander A Alemi, Ben Adlam, John D Co-Reyes, Izzeddin Gur, Abhishek Kumar, Roman Novak, Jeffrey Pennington, Jascha Sohl-Dickstein, Kelvin Xu, Jaehoon Lee, Justin Gilmer, Simon Kornblith
ICLR 2022 Autonomous Reinforcement Learning: Formalism and Benchmarking Archit Sharma, Kelvin Xu, Nikhil Sardana, Abhishek Gupta, Karol Hausman, Sergey Levine, Chelsea Finn
NeurIPS 2020 Continual Learning of Control Primitives : Skill Discovery via Reset-Games Kelvin Xu, Siddharth Verma, Chelsea Finn, Sergey Levine
ICLR 2020 Meta-Dataset: A Dataset of Datasets for Learning to Learn from Few Examples Eleni Triantafillou, Tyler Zhu, Vincent Dumoulin, Pascal Lamblin, Utku Evci, Kelvin Xu, Ross Goroshin, Carles Gelada, Kevin Swersky, Pierre-Antoine Manzagol, Hugo Larochelle
ICML 2019 Learning a Prior over Intent via Meta-Inverse Reinforcement Learning Kelvin Xu, Ellis Ratner, Anca Dragan, Sergey Levine, Chelsea Finn
NeurIPS 2018 Probabilistic Model-Agnostic Meta-Learning Chelsea Finn, Kelvin Xu, Sergey Levine
ICLR 2018 Trust-PCL: An Off-Policy Trust Region Method for Continuous Control Ofir Nachum, Mohammad Norouzi, Kelvin Xu, Dale Schuurmans
ICLR 2017 An Actor-Critic Algorithm for Sequence Prediction Dzmitry Bahdanau, Philemon Brakel, Kelvin Xu, Anirudh Goyal, Ryan Lowe, Joelle Pineau, Aaron C. Courville, Yoshua Bengio
NeurIPS 2017 Bridging the Gap Between Value and Policy Based Reinforcement Learning Ofir Nachum, Mohammad Norouzi, Kelvin Xu, Dale Schuurmans
ICLR 2017 Unsupervised Perceptual Rewards for Imitation Learning Pierre Sermanet, Kelvin Xu, Sergey Levine
ICML 2015 Show, Attend and Tell: Neural Image Caption Generation with Visual Attention Kelvin Xu, Jimmy Ba, Ryan Kiros, Kyunghyun Cho, Aaron Courville, Ruslan Salakhudinov, Rich Zemel, Yoshua Bengio