Piot, Bilal

37 publications

ICLR 2025 Building Math Agents with Multi-Turn Iterative Preference Learning Wei Xiong, Chengshuai Shi, Jiaming Shen, Aviv Rosenberg, Zhen Qin, Daniele Calandriello, Misha Khalman, Rishabh Joshi, Bilal Piot, Mohammad Saleh, Chi Jin, Tong Zhang, Tianqi Liu
ICLR 2025 Learning from Negative Feedback, or Positive Feedback or Both Abbas Abdolmaleki, Bilal Piot, Bobak Shahriari, Jost Tobias Springenberg, Tim Hertweck, Michael Bloesch, Rishabh Joshi, Thomas Lampe, Junhyuk Oh, Nicolas Heess, Jonas Buchli, Martin Riedmiller
ICLR 2025 RRM: Robust Reward Model Training Mitigates Reward Hacking Tianqi Liu, Wei Xiong, Jie Ren, Lichang Chen, Junru Wu, Rishabh Joshi, Yang Gao, Jiaming Shen, Zhen Qin, Tianhe Yu, Daniel Sohn, Anastasia Makarova, Jeremiah Zhe Liu, Yuan Liu, Bilal Piot, Abe Ittycheriah, Aviral Kumar, Mohammad Saleh
AISTATS 2024 A General Theoretical Paradigm to Understand Learning from Human Preferences Mohammad Gheshlaghi Azar, Zhaohan Daniel Guo, Bilal Piot, Remi Munos, Mark Rowland, Michal Valko, Daniele Calandriello
ICML 2024 Generalized Preference Optimization: A Unified Approach to Offline Alignment Yunhao Tang, Zhaohan Daniel Guo, Zeyu Zheng, Daniele Calandriello, Remi Munos, Mark Rowland, Pierre Harvey Richemond, Michal Valko, Bernardo Avila Pires, Bilal Piot
ICML 2024 Human Alignment of Large Language Models Through Online Preference Optimisation Daniele Calandriello, Zhaohan Daniel Guo, Remi Munos, Mark Rowland, Yunhao Tang, Bernardo Avila Pires, Pierre Harvey Richemond, Charline Le Lan, Michal Valko, Tianqi Liu, Rishabh Joshi, Zeyu Zheng, Bilal Piot
NeurIPS 2024 Multi-Turn Reinforcement Learning with Preference Human Feedback Lior Shani, Aviv Rosenberg, Asaf Cassel, Oran Lang, Daniele Calandriello, Avital Zipori, Hila Noga, Orgad Keller, Bilal Piot, Idan Szpektor, Avinatan Hassidim, Yossi Matias, Rémi Munos
ICML 2024 Nash Learning from Human Feedback Remi Munos, Michal Valko, Daniele Calandriello, Mohammad Gheshlaghi Azar, Mark Rowland, Zhaohan Daniel Guo, Yunhao Tang, Matthieu Geist, Thomas Mesnard, Côme Fiegel, Andrea Michi, Marco Selvi, Sertan Girgin, Nikola Momchev, Olivier Bachem, Daniel J Mankowitz, Doina Precup, Bilal Piot
ICLR 2024 Unlocking the Power of Representations in Long-Term Novelty-Based Exploration Alaa Saade, Steven Kapturowski, Daniele Calandriello, Charles Blundell, Pablo Sprechmann, Leopoldo Sarra, Oliver Groth, Michal Valko, Bilal Piot
ICML 2023 The Edge of Orthogonality: A Simple View of What Makes BYOL Tick Pierre Harvey Richemond, Allison Tam, Yunhao Tang, Florian Strub, Bilal Piot, Felix Hill
ICML 2023 Understanding Self-Predictive Learning for Reinforcement Learning Yunhao Tang, Zhaohan Daniel Guo, Pierre Harvey Richemond, Bernardo Avila Pires, Yash Chandak, Remi Munos, Mark Rowland, Mohammad Gheshlaghi Azar, Charline Le Lan, Clare Lyle, András György, Shantanu Thakoor, Will Dabney, Bilal Piot, Daniele Calandriello, Michal Valko
NeurIPSW 2023 Unlocking the Power of Representations in Long-Term Novelty-Based Exploration Steven Kapturowski, Alaa Saade, Daniele Calandriello, Charles Blundell, Pablo Sprechmann, Leopoldo Sarra, Oliver Groth, Michal Valko, Bilal Piot
NeurIPSW 2022 BLaDE: Robust Exploration via Diffusion Models Bilal Piot, Zhaohan Daniel Guo, Shantanu Thakoor, Mohammad Gheshlaghi Azar
NeurIPS 2022 BYOL-Explore: Exploration by Bootstrapped Prediction Zhaohan Guo, Shantanu Thakoor, Miruna Pislar, Bernardo Avila Pires, Florent Altché, Corentin Tallec, Alaa Saade, Daniele Calandriello, Jean-Bastien Grill, Yunhao Tang, Michal Valko, Remi Munos, Mohammad Gheshlaghi Azar, Bilal Piot
ICLR 2022 Emergent Communication at Scale Rahma Chaabouni, Florian Strub, Florent Altché, Eugene Tarassov, Corentin Tallec, Elnaz Davoodi, Kory Wallace Mathewson, Olivier Tieleman, Angeliki Lazaridou, Bilal Piot
ICML 2020 Agent57: Outperforming the Atari Human Benchmark Adrià Puigdomènech Badia, Bilal Piot, Steven Kapturowski, Pablo Sprechmann, Alex Vitvitskyi, Zhaohan Daniel Guo, Charles Blundell
ICML 2020 Bootstrap Latent-Predictive Representations for Multitask Reinforcement Learning Zhaohan Daniel Guo, Bernardo Avila Pires, Bilal Piot, Jean-Bastien Grill, Florent Altché, Remi Munos, Mohammad Gheshlaghi Azar
NeurIPS 2020 Bootstrap Your Own Latent - A New Approach to Self-Supervised Learning Jean-Bastien Grill, Florian Strub, Florent Altché, Corentin Tallec, Pierre Richemond, Elena Buchatskaya, Carl Doersch, Bernardo Avila Pires, Zhaohan Guo, Mohammad Gheshlaghi Azar, Bilal Piot, Koray Kavukcuoglu, Remi Munos, Michal Valko
ICLR 2020 Never Give up: Learning Directed Exploration Strategies Adrià Puigdomènech Badia, Pablo Sprechmann, Alex Vitvitskyi, Daniel Guo, Bilal Piot, Steven Kapturowski, Olivier Tieleman, Martín Arjovsky, Alexander Pritzel, Andew Bolt, Charles Blundell
NeurIPS 2019 Hindsight Credit Assignment Anna Harutyunyan, Will Dabney, Thomas Mesnard, Mohammad Gheshlaghi Azar, Bilal Piot, Nicolas Heess, Hado P van Hasselt, Gregory Wayne, Satinder Singh, Doina Precup, Remi Munos
AISTATS 2018 Actor-Critic Fictitious Play in Simultaneous Move Multistage Games Julien Pérolat, Bilal Piot, Olivier Pietquin
AAAI 2018 Deep Q-Learning from Demonstrations Todd Hester, Matej Vecerík, Olivier Pietquin, Marc Lanctot, Tom Schaul, Bilal Piot, Dan Horgan, John Quan, Andrew Sendonaris, Ian Osband, Gabriel Dulac-Arnold, John P. Agapiou, Joel Z. Leibo, Audrunas Gruslys
ICLR 2018 Noisy Networks for Exploration Meire Fortunato, Mohammad Gheshlaghi Azar, Bilal Piot, Jacob Menick, Matteo Hessel, Ian Osband, Alex Graves, Volodymyr Mnih, Remi Munos, Demis Hassabis, Olivier Pietquin, Charles Blundell, Shane Legg
AAAI 2018 Rainbow: Combining Improvements in Deep Reinforcement Learning Matteo Hessel, Joseph Modayil, Hado van Hasselt, Tom Schaul, Georg Ostrovski, Will Dabney, Dan Horgan, Bilal Piot, Mohammad Gheshlaghi Azar, David Silver
ICLR 2018 The Reactor: A Fast and Sample-Efficient Actor-Critic Agent for Reinforcement Learning Audrunas Gruslys, Will Dabney, Mohammad Gheshlaghi Azar, Bilal Piot, Marc Bellemare, Remi Munos
IJCAI 2017 End-to-End Optimization of Goal-Driven and Visually Grounded Dialogue Systems Florian Strub, Harm de Vries, Jérémie Mary, Bilal Piot, Aaron C. Courville, Olivier Pietquin
NeurIPS 2017 Is the Bellman Residual a Bad Proxy? Matthieu Geist, Bilal Piot, Olivier Pietquin
AISTATS 2017 Learning Nash Equilibrium for General-Sum Markov Games from Batch Data Julien Pérolat, Florian Strub, Bilal Piot, Olivier Pietquin
AISTATS 2016 On the Use of Non-Stationary Strategies for Solving Two-Player Zero-Sum Markov Games Julien Pérolat, Bilal Piot, Bruno Scherrer, Olivier Pietquin
ICML 2016 Softened Approximate Policy Iteration for Markov Games Julien Pérolat, Bilal Piot, Matthieu Geist, Bruno Scherrer, Olivier Pietquin
ICML 2015 Approximate Dynamic Programming for Two-Player Zero-Sum Markov Games Julien Perolat, Bruno Scherrer, Bilal Piot, Olivier Pietquin
IJCAI 2015 Inverse Reinforcement Learning in Relational Domains Thibaut Munzer, Bilal Piot, Matthieu Geist, Olivier Pietquin, Manuel Lopes
ECML-PKDD 2014 Boosted Bellman Residual Minimization Handling Expert Demonstrations Bilal Piot, Matthieu Geist, Olivier Pietquin
NeurIPS 2014 Difference of Convex Functions Programming for Reinforcement Learning Bilal Piot, Matthieu Geist, Olivier Pietquin
ECML-PKDD 2013 A Cascaded Supervised Learning Approach to Inverse Reinforcement Learning Edouard Klein, Bilal Piot, Matthieu Geist, Olivier Pietquin
ECML-PKDD 2013 Learning from Demonstrations: Is It Worth Estimating a Reward Function? Bilal Piot, Matthieu Geist, Olivier Pietquin
NeurIPS 2012 Inverse Reinforcement Learning Through Structured Classification Edouard Klein, Matthieu Geist, Bilal Piot, Olivier Pietquin