Munos, Remi

173 publications

NeurIPS 2025 Asymmetric REINFORCE for Off-Policy Reinforcement Learning: Balancing Positive and Negative Rewards Charles Arnal, Gaëtan Narozniak, Vivien Cabannes, Yunhao Tang, Julia Kempe, Remi Munos
NeurIPS 2025 Beyond Verifiable Rewards: Scaling Reinforcement Learning in Language Models to Unverifiable Data Yunhao Tang, Sid Wang, Lovish Madaan, Remi Munos
ICML 2025 Optimizing Language Models for Inference Time Objectives Using Reinforcement Learning Yunhao Tang, Kunhao Zheng, Gabriel Synnaeve, Remi Munos
JMLR 2025 Optimizing Return Distributions with Distributional Dynamic Programming Bernardo Ávila Pires, Mark Rowland, Diana Borsa, Zhaohan Daniel Guo, Khimya Khetarpal, André Barreto, David Abel, Rémi Munos, Will Dabney
ICML 2025 Temporal Difference Flows Jesse Farebrother, Matteo Pirotta, Andrea Tirinzoni, Remi Munos, Alessandro Lazaric, Ahmed Touati
ICLRW 2025 Temporal Difference Flows Jesse Farebrother, Matteo Pirotta, Andrea Tirinzoni, Remi Munos, Alessandro Lazaric, Ahmed Touati
AISTATS 2024 A General Theoretical Paradigm to Understand Learning from Human Preferences Mohammad Gheshlaghi Azar, Zhaohan Daniel Guo, Bilal Piot, Remi Munos, Mark Rowland, Michal Valko, Daniele Calandriello
JMLR 2024 An Analysis of Quantile Temporal-Difference Learning Mark Rowland, Rémi Munos, Mohammad Gheshlaghi Azar, Yunhao Tang, Georg Ostrovski, Anna Harutyunyan, Karl Tuyls, Marc G. Bellemare, Will Dabney
ICML 2024 Generalized Preference Optimization: A Unified Approach to Offline Alignment Yunhao Tang, Zhaohan Daniel Guo, Zeyu Zheng, Daniele Calandriello, Remi Munos, Mark Rowland, Pierre Harvey Richemond, Michal Valko, Bernardo Avila Pires, Bilal Piot
ICML 2024 Human Alignment of Large Language Models Through Online Preference Optimisation Daniele Calandriello, Zhaohan Daniel Guo, Remi Munos, Mark Rowland, Yunhao Tang, Bernardo Avila Pires, Pierre Harvey Richemond, Charline Le Lan, Michal Valko, Tianqi Liu, Rishabh Joshi, Zeyu Zheng, Bilal Piot
NeurIPS 2024 Local and Adaptive Mirror Descents in Extensive-Form Games Côme Fiegel, Pierre Ménard, Tadashi Kozuno, Rémi Munos, Vianney Perchet, Michal Valko
NeurIPS 2024 Multi-Turn Reinforcement Learning with Preference Human Feedback Lior Shani, Aviv Rosenberg, Asaf Cassel, Oran Lang, Daniele Calandriello, Avital Zipori, Hila Noga, Orgad Keller, Bilal Piot, Idan Szpektor, Avinatan Hassidim, Yossi Matias, Rémi Munos
ICML 2024 Nash Learning from Human Feedback Remi Munos, Michal Valko, Daniele Calandriello, Mohammad Gheshlaghi Azar, Mark Rowland, Zhaohan Daniel Guo, Yunhao Tang, Matthieu Geist, Thomas Mesnard, Côme Fiegel, Andrea Michi, Marco Selvi, Sertan Girgin, Nikola Momchev, Olivier Bachem, Daniel J Mankowitz, Doina Precup, Bilal Piot
NeurIPS 2024 Near-Minimax-Optimal Distributional Reinforcement Learning with a Generative Model Mark Rowland, Li Kevin Wenliang, Rémi Munos, Clare Lyle, Yunhao Tang, Will Dabney
ICML 2023 Adapting to Game Trees in Zero-Sum Imperfect Information Games Côme Fiegel, Pierre Menard, Tadashi Kozuno, Remi Munos, Vianney Perchet, Michal Valko
ICML 2023 Curiosity in Hindsight: Intrinsic Exploration in Stochastic Environments Daniel Jarrett, Corentin Tallec, Florent Altché, Thomas Mesnard, Remi Munos, Michal Valko
ICML 2023 DoMo-AC: Doubly Multi-Step Off-Policy Actor-Critic Algorithm Yunhao Tang, Tadashi Kozuno, Mark Rowland, Anna Harutyunyan, Remi Munos, Bernardo Avila Pires, Michal Valko
ICML 2023 Fast Rates for Maximum Entropy Exploration Daniil Tiapkin, Denis Belomestny, Daniele Calandriello, Eric Moulines, Remi Munos, Alexey Naumov, Pierre Perrault, Yunhao Tang, Michal Valko, Pierre Menard
NeurIPS 2023 Model-Free Posterior Sampling via Learning Rate Randomization Daniil Tiapkin, Denis Belomestny, Daniele Calandriello, Eric Moulines, Remi Munos, Alexey Naumov, Pierre Perrault, Michal Valko, Pierre Ménard
ICML 2023 Quantile Credit Assignment Thomas Mesnard, Wenqi Chen, Alaa Saade, Yunhao Tang, Mark Rowland, Theophane Weber, Clare Lyle, Audrunas Gruslys, Michal Valko, Will Dabney, Georg Ostrovski, Eric Moulines, Remi Munos
ICML 2023 Regularization and Variance-Weighted Regression Achieves Minimax Optimality in Linear MDPs: Theory and Practice Toshinori Kitamura, Tadashi Kozuno, Yunhao Tang, Nino Vieillard, Michal Valko, Wenhao Yang, Jincheng Mei, Pierre Menard, Mohammad Gheshlaghi Azar, Remi Munos, Olivier Pietquin, Matthieu Geist, Csaba Szepesvari, Wataru Kumagai, Yutaka Matsuo
ICML 2023 Representations and Exploration for Deep Reinforcement Learning Using Singular Value Decomposition Yash Chandak, Shantanu Thakoor, Zhaohan Daniel Guo, Yunhao Tang, Remi Munos, Will Dabney, Diana L Borsa
ICML 2023 The Statistical Benefits of Quantile Temporal-Difference Learning for Value Estimation Mark Rowland, Yunhao Tang, Clare Lyle, Remi Munos, Marc G Bellemare, Will Dabney
ICML 2023 Towards a Better Understanding of Representation Dynamics Under TD-Learning Yunhao Tang, Remi Munos
ICML 2023 Understanding Self-Predictive Learning for Reinforcement Learning Yunhao Tang, Zhaohan Daniel Guo, Pierre Harvey Richemond, Bernardo Avila Pires, Yash Chandak, Remi Munos, Mark Rowland, Mohammad Gheshlaghi Azar, Charline Le Lan, Clare Lyle, András György, Shantanu Thakoor, Will Dabney, Bilal Piot, Daniele Calandriello, Michal Valko
ICML 2023 VA-Learning as a More Efficient Alternative to Q-Learning Yunhao Tang, Remi Munos, Mark Rowland, Michal Valko
AISTATS 2022 Marginalized Operators for Off-Policy Reinforcement Learning Yunhao Tang, Mark Rowland, Remi Munos, Michal Valko
NeurIPS 2022 BYOL-Explore: Exploration by Bootstrapped Prediction Zhaohan Guo, Shantanu Thakoor, Miruna Pislar, Bernardo Avila Pires, Florent Altché, Corentin Tallec, Alaa Saade, Daniele Calandriello, Jean-Bastien Grill, Yunhao Tang, Michal Valko, Remi Munos, Mohammad Gheshlaghi Azar, Bilal Piot
NeurIPSW 2022 Curiosity in Hindsight Daniel Jarrett, Corentin Tallec, Florent Altché, Thomas Mesnard, Remi Munos, Michal Valko
ICML 2022 Generalised Policy Improvement with Geometric Policy Composition Shantanu Thakoor, Mark Rowland, Diana Borsa, Will Dabney, Remi Munos, Andre Barreto
ICLR 2022 Large-Scale Representation Learning on Graphs via Bootstrapping Shantanu Thakoor, Corentin Tallec, Mohammad Gheshlaghi Azar, Mehdi Azabou, Eva L Dyer, Remi Munos, Petar Veličković, Michal Valko
NeurIPS 2022 Optimistic Posterior Sampling for Reinforcement Learning with Few Samples and Tight Guarantees Daniil Tiapkin, Denis Belomestny, Daniele Calandriello, Eric Moulines, Remi Munos, Alexey Naumov, Mark Rowland, Michal Valko, Pierre Ménard
NeurIPS 2022 The Nature of Temporal Difference Errors in Multi-Step Distributional Reinforcement Learning Yunhao Tang, Remi Munos, Mark Rowland, Bernardo Avila Pires, Will Dabney, Marc Bellemare
ICLRW 2021 Bootstrapped Representation Learning on Graphs Shantanu Thakoor, Corentin Tallec, Mohammad Gheshlaghi Azar, Remi Munos, Petar Veličković, Michal Valko
MLJ 2021 Concentration Bounds for Temporal Difference Learning with Linear Function Approximation: The Case of Batch Data and Uniform Sampling L. A. Prashanth, Nathaniel Korda, Rémi Munos
ICML 2021 Counterfactual Credit Assignment in Model-Free Reinforcement Learning Thomas Mesnard, Theophane Weber, Fabio Viola, Shantanu Thakoor, Alaa Saade, Anna Harutyunyan, Will Dabney, Thomas S Stepleton, Nicolas Heess, Arthur Guez, Eric Moulines, Marcus Hutter, Lars Buesing, Remi Munos
ICMLW 2021 Density-Based Bonuses on Learned Representations for Reward-Free Exploration in Deep Reinforcement Learning Omar Darwiche Domingues, Corentin Tallec, Remi Munos, Michal Valko
ICML 2021 From Poincaré Recurrence to Convergence in Imperfect Information Games: Finding Equilibrium via Regularization Julien Perolat, Remi Munos, Jean-Baptiste Lespiau, Shayegan Omidshafiei, Mark Rowland, Pedro Ortega, Neil Burch, Thomas Anthony, David Balduzzi, Bart De Vylder, Georgios Piliouras, Marc Lanctot, Karl Tuyls
JAIR 2021 Game Plan: What AI Can Do for Football, and What Football Can Do for AI Karl Tuyls, Shayegan Omidshafiei, Paul Muller, Zhe Wang, Jerome T. Connor, Daniel Hennes, Ian Graham, William Spearman, Tim Waskett, Dafydd Steele, Pauline Luc, Adrià Recasens, Alexandre Galashov, Gregory Thornton, Romuald Elie, Pablo Sprechmann, Pol Moreno, Kris Cao, Marta Garnelo, Praneet Dutta, Michal Valko, Nicolas Heess, Alex Bridgland, Julien Pérolat, Bart De Vylder, S. M. Ali Eslami, Mark Rowland, Andrew Jaegle, Rémi Munos, Trevor Back, Razia Ahamed, Simon Bouton, Nathalie Beauguerlange, Jackson Broshear, Thore Graepel, Demis Hassabis
NeurIPS 2021 Learning in Two-Player Zero-Sum Partially Observable Markov Games with Perfect Recall Tadashi Kozuno, Pierre Ménard, Remi Munos, Michal Valko
ICML 2021 Revisiting Peng’s Q($λ$) for Modern Reinforcement Learning Tadashi Kozuno, Yunhao Tang, Mark Rowland, Remi Munos, Steven Kapturowski, Will Dabney, Michal Valko, David Abel
ICML 2021 Taylor Expansion of Discount Factors Yunhao Tang, Mark Rowland, Remi Munos, Michal Valko
NeurIPS 2021 Unifying Gradient Estimators for Meta-Reinforcement Learning via Off-Policy Evaluation Yunhao Tang, Tadashi Kozuno, Mark Rowland, Remi Munos, Michal Valko
ICLR 2020 A Generalized Training Approach for Multiagent Learning Paul Muller, Shayegan Omidshafiei, Mark Rowland, Karl Tuyls, Julien Perolat, Siqi Liu, Daniel Hennes, Luke Marris, Marc Lanctot, Edward Hughes, Zhe Wang, Guy Lever, Nicolas Heess, Thore Graepel, Remi Munos
AISTATS 2020 Adaptive Trade-Offs in Off-Policy Learning Mark Rowland, Will Dabney, Remi Munos
ICML 2020 Bootstrap Latent-Predictive Representations for Multitask Reinforcement Learning Zhaohan Daniel Guo, Bernardo Avila Pires, Bilal Piot, Jean-Bastien Grill, Florent Altché, Remi Munos, Mohammad Gheshlaghi Azar
NeurIPS 2020 Bootstrap Your Own Latent - A New Approach to Self-Supervised Learning Jean-Bastien Grill, Florian Strub, Florent Altché, Corentin Tallec, Pierre Richemond, Elena Buchatskaya, Carl Doersch, Bernardo Avila Pires, Zhaohan Guo, Mohammad Gheshlaghi Azar, Bilal Piot, Koray Kavukcuoglu, Remi Munos, Michal Valko
AISTATS 2020 Conditional Importance Sampling for Off-Policy Learning Mark Rowland, Anna Harutyunyan, Hado Hasselt, Diana Borsa, Tom Schaul, Remi Munos, Will Dabney
ICML 2020 Fast Computation of Nash Equilibria in Imperfect Information Games Remi Munos, Julien Perolat, Jean-Baptiste Lespiau, Mark Rowland, Bart De Vylder, Marc Lanctot, Finbarr Timbers, Daniel Hennes, Shayegan Omidshafiei, Audrunas Gruslys, Mohammad Gheshlaghi Azar, Edward Lockhart, Karl Tuyls
NeurIPS 2020 Leverage the Average: An Analysis of KL Regularization in Reinforcement Learning Nino Vieillard, Tadashi Kozuno, Bruno Scherrer, Olivier Pietquin, Remi Munos, Matthieu Geist
ICML 2020 Monte-Carlo Tree Search as Regularized Policy Optimization Jean-Bastien Grill, Florent Altché, Yunhao Tang, Thomas Hubert, Michal Valko, Ioannis Antonoglou, Remi Munos
JMLR 2020 Spectral Bandits Tomáš Kocák, Rémi Munos, Branislav Kveton, Shipra Agrawal, Michal Valko
ICML 2020 Taylor Expansion Policy Optimization Yunhao Tang, Michal Valko, Remi Munos
NeurIPS 2019 Hindsight Credit Assignment Anna Harutyunyan, Will Dabney, Thomas Mesnard, Mohammad Gheshlaghi Azar, Bilal Piot, Nicolas Heess, Hado P van Hasselt, Gregory Wayne, Satinder Singh, Doina Precup, Remi Munos
NeurIPS 2019 Multiagent Evaluation Under Incomplete Information Mark Rowland, Shayegan Omidshafiei, Karl Tuyls, Julien Perolat, Michal Valko, Georgios Piliouras, Remi Munos
NeurIPS 2019 Planning in Entropy-Regularized Markov Decision Processes and Games Jean-Bastien Grill, Omar Darwiche Domingues, Pierre Menard, Remi Munos, Michal Valko
ICLR 2019 Recurrent Experience Replay in Distributed Reinforcement Learning Steven Kapturowski, Georg Ostrovski, John Quan, Remi Munos, Will Dabney
ICML 2019 Statistics and Samples in Distributional Reinforcement Learning Mark Rowland, Robert Dadashi, Saurabh Kumar, Remi Munos, Marc G. Bellemare, Will Dabney
AISTATS 2019 The Termination Critic Anna Harutyunyan, Will Dabney, Diana Borsa, Nicolas Heess, Remi Munos, Doina Precup
ICLR 2019 Universal Successor Features Approximators Diana Borsa, Andre Barreto, John Quan, Daniel J. Mankowitz, Hado van Hasselt, Remi Munos, David Silver, Tom Schaul
NeurIPS 2018 Actor-Critic Policy Optimization in Partially Observable Multiagent Environments Sriram Srinivasan, Marc Lanctot, Vinicius Zambaldi, Julien Perolat, Karl Tuyls, Remi Munos, Michael Bowling
AISTATS 2018 An Analysis of Categorical Distributional Reinforcement Learning Mark Rowland, Marc G. Bellemare, Will Dabney, Rémi Munos, Yee Whye Teh
ICML 2018 Autoregressive Quantile Networks for Generative Modeling Georg Ostrovski, Will Dabney, Remi Munos
AAAI 2018 Distributional Reinforcement Learning with Quantile Regression Will Dabney, Mark Rowland, Marc G. Bellemare, Rémi Munos
ICML 2018 IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures Lasse Espeholt, Hubert Soyer, Remi Munos, Karen Simonyan, Vlad Mnih, Tom Ward, Yotam Doron, Vlad Firoiu, Tim Harley, Iain Dunning, Shane Legg, Koray Kavukcuoglu
ICML 2018 Implicit Quantile Networks for Distributional Reinforcement Learning Will Dabney, Georg Ostrovski, David Silver, Remi Munos
ICML 2018 Learning to Search with MCTSnets Arthur Guez, Theophane Weber, Ioannis Antonoglou, Karen Simonyan, Oriol Vinyals, Daan Wierstra, Remi Munos, David Silver
ICLR 2018 Maximum a Posteriori Policy Optimisation Abbas Abdolmaleki, Jost Tobias Springenberg, Yuval Tassa, Remi Munos, Nicolas Heess, Martin Riedmiller
ICLR 2018 Noisy Networks for Exploration Meire Fortunato, Mohammad Gheshlaghi Azar, Bilal Piot, Jacob Menick, Matteo Hessel, Ian Osband, Alex Graves, Volodymyr Mnih, Remi Munos, Demis Hassabis, Olivier Pietquin, Charles Blundell, Shane Legg
NeurIPS 2018 Optimistic Optimization of a Brownian Jean-Bastien Grill, Michal Valko, Remi Munos
ICLR 2018 The Reactor: A Fast and Sample-Efficient Actor-Critic Agent for Reinforcement Learning Audrunas Gruslys, Will Dabney, Mohammad Gheshlaghi Azar, Bilal Piot, Marc Bellemare, Remi Munos
ICML 2018 The Uncertainty Bellman Equation and Exploration Brendan O’Donoghue, Ian Osband, Remi Munos, Vlad Mnih
ICML 2018 Transfer in Deep Reinforcement Learning Using Successor Features and Generalised Policy Improvement Andre Barreto, Diana Borsa, John Quan, Tom Schaul, David Silver, Matteo Hessel, Daniel Mankowitz, Augustin Zidek, Remi Munos
ICML 2017 A Distributional Perspective on Reinforcement Learning Marc G. Bellemare, Will Dabney, Rémi Munos
ICML 2017 Automated Curriculum Learning for Neural Networks Alex Graves, Marc G. Bellemare, Jacob Menick, Rémi Munos, Koray Kavukcuoglu
ICLR 2017 Combining Policy Gradient and Q-Learning Brendan O'Donoghue, Rémi Munos, Koray Kavukcuoglu, Volodymyr Mnih
ICML 2017 Count-Based Exploration with Neural Density Models Georg Ostrovski, Marc G. Bellemare, Aäron Oord, Rémi Munos
ICML 2017 Minimax Regret Bounds for Reinforcement Learning Mohammad Gheshlaghi Azar, Ian Osband, Rémi Munos
ICLR 2017 Sample Efficient Actor-Critic with Experience Replay Ziyu Wang, Victor Bapst, Nicolas Heess, Volodymyr Mnih, Rémi Munos, Koray Kavukcuoglu, Nando de Freitas
NeurIPS 2017 Successor Features for Transfer in Reinforcement Learning Andre Barreto, Will Dabney, Remi Munos, Jonathan J Hunt, Tom Schaul, Hado P van Hasselt, David Silver
JMLR 2016 Analysis of Classification-Based Policy Iteration Algorithms Alessandro Lazaric, Mohammad Ghavamzadeh, Rémi Munos
NeurIPS 2016 Blazing the Trails Before Beating the Path: Sample-Efficient Monte-Carlo Planning Jean-Bastien Grill, Michal Valko, Remi Munos
AAAI 2016 Generalized Emphatic Temporal Difference Learning: Bias-Variance Analysis Assaf Hallak, Aviv Tamar, Rémi Munos, Shie Mannor
AAAI 2016 Increasing the Action Gap: New Operators for Reinforcement Learning Marc G. Bellemare, Georg Ostrovski, Arthur Guez, Philip S. Thomas, Rémi Munos
NeurIPS 2016 Memory-Efficient Backpropagation Through Time Audrunas Gruslys, Remi Munos, Ivo Danihelka, Marc Lanctot, Alex Graves
ALT 2016 Q(λ) with Off-Policy Corrections Anna Harutyunyan, Marc G. Bellemare, Tom Stepleton, Rémi Munos
NeurIPS 2016 Safe and Efficient Off-Policy Reinforcement Learning Remi Munos, Tom Stepleton, Anna Harutyunyan, Marc Bellemare
NeurIPS 2016 Unifying Count-Based Exploration and Intrinsic Motivation Marc Bellemare, Sriram Srinivasan, Georg Ostrovski, Tom Schaul, David Saxton, Remi Munos
JMLR 2015 Adaptive Strategy for Stratified Monte Carlo Sampling Alexandra Carpentier, Remi Munos, András Antos
NeurIPS 2015 Black-Box Optimization of Noisy Functions with Unknown Smoothness Jean-Bastien Grill, Michal Valko, Remi Munos, Remi Munos
NeurIPS 2015 Black-Box Optimization of Noisy Functions with Unknown Smoothness Jean-Bastien Grill, Michal Valko, Remi Munos, Remi Munos
ICML 2015 Cheap Bandits Manjesh Hanawal, Venkatesh Saligrama, Michal Valko, Remi Munos
AAAI 2015 Fast Gradient Descent for Drifting Least Squares Regression, with Application to Bandits Nathaniel Korda, Prashanth L. A., Rémi Munos
AISTATS 2015 Toward Minimax Off-Policy Value Estimation Lihong Li, Rémi Munos, Csaba Szepesvári
NeurIPS 2014 Active Regression by Stratification Sivan Sabato, Remi Munos
NeurIPS 2014 Best-Arm Identification in Linear Bandits Marta Soare, Alessandro Lazaric, Remi Munos
NeurIPS 2014 Bounded Regret for Finite-Armed Structured Bandits Tor Lattimore, Remi Munos
NeurIPS 2014 Efficient Learning by Implicit Exploration in Bandit Problems with Side Observations Tomáš Kocák, Gergely Neu, Michal Valko, Remi Munos
ECML-PKDD 2014 Fast LSTD Using Stochastic Approximation: Finite Time Analysis and Application to Traffic Control L. A. Prashanth, Nathaniel Korda, Rémi Munos
FnTML 2014 From Bandits to Monte-Carlo Tree Search: The Optimistic Principle Applied to Optimization and Planning Rémi Munos
NeurIPS 2014 Optimistic Planning in Markov Decision Processes Using a Generative Model Balázs Szörényi, Gunnar Kedenburg, Remi Munos
ICML 2014 Relative Upper Confidence Bound for the K-Armed Dueling Bandit Problem Masrour Zoghi, Shimon Whiteson, Remi Munos, Maarten Rijke
ICML 2014 Spectral Bandits for Smooth Graph Functions Michal Valko, Remi Munos, Branislav Kveton, Tomáš Kocák
AAAI 2014 Spectral Thompson Sampling Tomás Kocák, Michal Valko, Rémi Munos, Shipra Agrawal
NeurIPS 2013 Aggregating Optimistic Planning Trees for Solving Markov Decision Processes Gunnar Kedenburg, Raphael Fonteneau, Remi Munos
ALT 2013 Algorithmic Learning Theory - 24th International Conference, ALT 2013, Singapore, October 6-9, 2013. Proceedings Sanjay Jain, Rémi Munos, Frank Stephan, Thomas Zeugmann
ALT 2013 Editors' Introduction Sanjay Jain, Rémi Munos, Frank Stephan, Thomas Zeugmann
UAI 2013 Finite-Time Analysis of Kernelised Contextual Bandits Michal Valko, Nathaniel Korda, Rémi Munos, Ilias N. Flaounas, Nello Cristianini
MLJ 2013 Minimax PAC Bounds on the Sample Complexity of Reinforcement Learning with a Generative Model Mohammad Gheshlaghi Azar, Rémi Munos, Hilbert J. Kappen
ICML 2013 Stochastic Simultaneous Optimistic Optimization Michal Valko, Alexandra Carpentier, Rémi Munos
NeurIPS 2013 Thompson Sampling for 1-Dimensional Exponential Family Bandits Nathaniel Korda, Emilie Kaufmann, Remi Munos
ICML 2013 Toward Optimal Stratification for Stratified Monte-Carlo Integration Alexandra Carpentier, Rémi Munos
NeurIPS 2012 Adaptive Stratified Sampling for Monte-Carlo Integration of Differentiable Functions Alexandra Carpentier, Rémi Munos
NeurIPS 2012 Bandit Algorithms Boost Brain Computer Interfaces for Motor-Task Selection of a Brain-Controlled Button Joan Fruitet, Alexandra Carpentier, Maureen Clerc, Rémi Munos
AISTATS 2012 Bandit Theory Meets Compressed Sensing for High Dimensional Stochastic Linear Bandit Alexandra Carpentier, Remi Munos
JMLR 2012 Finite-Sample Analysis of Least-Squares Policy Iteration Alessandro Lazaric, Mohammad Ghavamzadeh, Rémi Munos
JMLR 2012 Linear Regression with Random Projections Odalric-Ambrym Maillard, Rémi Munos
ALT 2012 Minimax Number of Strata for Online Stratified Sampling Given Noisy Samples Alexandra Carpentier, Rémi Munos
ICML 2012 On the Sample Complexity of Reinforcement Learning with a Generative Model Mohammad Gheshlaghi Azar, Rémi Munos, Bert Kappen
AISTATS 2012 Optimistic Planning for Markov Decision Processes Lucian Busoniu, Remi Munos
ALT 2012 Regret Bounds for Restless Markov Bandits Ronald Ortner, Daniil Ryabko, Peter Auer, Rémi Munos
NeurIPS 2012 Risk-Aversion in Multi-Armed Bandits Amir Sani, Alessandro Lazaric, Rémi Munos
ALT 2012 Thompson Sampling: An Asymptotically Optimal Finite-Time Analysis Emilie Kaufmann, Nathaniel Korda, Rémi Munos
COLT 2011 A Finite-Time Analysis of Multi-Armed Bandits Problems with Kullback-Leibler Divergences Odalric-Ambrym Maillard, Rémi Munos, Gilles Stoltz
AISTATS 2011 Adaptive Bandits: Towards the Best History-Dependent Strategy Maillard Odalric, Rémi Munos
NeurIPS 2011 Finite Time Analysis of Stratified Sampling for Monte Carlo Alexandra Carpentier, Rémi Munos
ICML 2011 Finite-Sample Analysis of Lasso-TD Mohammad Ghavamzadeh, Alessandro Lazaric, Rémi Munos, Matthew W. Hoffman
NeurIPS 2011 Optimistic Optimization of a Deterministic Function Without the Knowledge of Its Smoothness Rémi Munos
NeurIPS 2011 Selecting the State-Representation in Reinforcement Learning Odalric-ambrym Maillard, Daniil Ryabko, Rémi Munos
NeurIPS 2011 Sparse Recovery with Brownian Sensing Alexandra Carpentier, Odalric-ambrym Maillard, Rémi Munos
NeurIPS 2011 Speedy Q-Learning Mohammad Ghavamzadeh, Hilbert J. Kappen, Mohammad G. Azar, Rémi Munos
ALT 2011 Upper-Confidence-Bound Algorithms for Active Learning in Multi-Armed Bandits Alexandra Carpentier, Alessandro Lazaric, Mohammad Ghavamzadeh, Rémi Munos, Peter Auer
JMLR 2011 X-Armed Bandits Sébastien Bubeck, Rémi Munos, Gilles Stoltz, Csaba Szepesvári
ICML 2010 Analysis of a Classification-Based Policy Iteration Algorithm Alessandro Lazaric, Mohammad Ghavamzadeh, Rémi Munos
COLT 2010 Best Arm Identification in Multi-Armed Bandits Jean-Yves Audibert, Sébastien Bubeck, Rémi Munos
NeurIPS 2010 Error Propagation for Approximate Policy and Value Iteration Amir-massoud Farahmand, Csaba Szepesvári, Rémi Munos
ACML 2010 Finite-Sample Analysis of Bellman Residual Minimization Odalric-Ambrym Maillard, Remi Munos, Alessandro Lazaric, Mohammad Ghavamzadeh
ICML 2010 Finite-Sample Analysis of LSTD Alessandro Lazaric, Mohammad Ghavamzadeh, Rémi Munos
NeurIPS 2010 LSTD with Random Projections Mohammad Ghavamzadeh, Alessandro Lazaric, Odalric Maillard, Rémi Munos
ECML-PKDD 2010 Online Learning in Adversarial Lipschitz Environments Odalric-Ambrym Maillard, Rémi Munos
COLT 2010 Open Loop Optimistic Planning Sébastien Bubeck, Rémi Munos
NeurIPS 2010 Scrambled Objects for Least-Squares Regression Odalric Maillard, Rémi Munos
NeurIPS 2009 Compressed Least-Squares Regression Odalric Maillard, Rémi Munos
COLT 2009 Hybrid Stochastic-Adversarial On-Line Learning Alessandro Lazaric, Rémi Munos
ALT 2009 Pure Exploration in Multi-Armed Bandits Problems Sébastien Bubeck, Rémi Munos, Gilles Stoltz
NeurIPS 2009 Sensitivity Analysis in HMMs with Application to Likelihood Maximization Pierre-arnaud Coquelin, Romain Deguest, Rémi Munos
ICML 2009 Workshop Summary: On-Line Learning with Limited Feedback Jean-Yves Audibert, Peter Auer, Alessandro Lazaric, Rémi Munos, Daniil Ryabko, Csaba Szepesvári
NeurIPS 2008 Algorithms for Infinitely Many-Armed Bandits Yizao Wang, Jean-yves Audibert, Rémi Munos
JMLR 2008 Finite-Time Bounds for Fitted Value Iteration Rémi Munos, Csaba Szepesvári
MLJ 2008 Learning Near-Optimal Policies with Bellman-Residual Minimization Based Fitted Policy Iteration and a Single Sample Path András Antos, Csaba Szepesvári, Rémi Munos
NeurIPS 2008 Online Optimization in X-Armed Bandits Sébastien Bubeck, Gilles Stoltz, Csaba Szepesvári, Rémi Munos
NeurIPS 2008 Particle Filter-Based Policy Gradient in POMDPs Pierre-arnaud Coquelin, Romain Deguest, Rémi Munos
UAI 2007 Bandit Algorithms for Tree Search Pierre-Arnaud Coquelin, Rémi Munos
NeurIPS 2007 Fitted Q-Iteration in Continuous Action-Space MDPs András Antos, Csaba Szepesvári, Rémi Munos
ALT 2007 Tuning Bandit Algorithms in Stochastic Environments Jean-Yves Audibert, Rémi Munos, Csaba Szepesvári
JMLR 2006 Geometric Variance Reduction in Markov Chains: Application to Value Function and Gradient Estimation Rémi Munos
COLT 2006 Learning Near-Optimal Policies with Bellman-Residual Minimization Based Fitted Policy Iteration and a Single Sample Path András Antos, Csaba Szepesvári, Rémi Munos
JMLR 2006 Policy Gradient in Continuous Time Rémi Munos
AAAI 2005 Error Bounds for Approximate Value Iteration Rémi Munos
ICML 2005 Finite Time Bounds for Sampling Based Fitted Value Iteration Csaba Szepesvári, Rémi Munos
AAAI 2005 Geometric Variance Reduction in Markov Chains. Application to Value Function and Gradient Estimation Rémi Munos
ICML 2003 Error Bounds for Approximate Policy Iteration Rémi Munos
MLJ 2002 Variable Resolution Discretization in Optimal Control Rémi Munos, Andrew W. Moore
NeurIPS 2001 Efficient Resources Allocation for Markov Decision Processes Rémi Munos
MLJ 2000 A Study of Reinforcement Learning in the Continuous Case by the Means of Viscosity Solutions Rémi Munos
ICML 2000 Rates of Convergence for Variable Resolution Schemes in Optimal Control Rémi Munos, Andrew W. Moore
IJCAI 1999 Variable Resolution Discretization for High-Accuracy Solutions of Optimal Control Problems Rémi Munos, Andrew W. Moore
ECML-PKDD 1998 A General Convergence Method for Reinforcement Learning in the Continuous Case Rémi Munos
NeurIPS 1998 Barycentric Interpolators for Continuous Space and Time Reinforcement Learning Rémi Munos, Andrew W. Moore
IJCAI 1997 A Convergent Reinforcement Learning Algorithm in the Continuous Case Based on a Finite Difference Method Rémi Munos
ECML-PKDD 1997 Finite-Element Methods with Local Triangulation Refinement for Continuous Reimforcement Learning Problems Rémi Munos
NeurIPS 1997 Reinforcement Learning for Continuous Stochastic Control Problems Rémi Munos, Paul Bourgine
ICML 1996 A Convergent Reinforcement Learning Algorithm in the Continuous Case: The Finite-Element Reinforcement Learning Rémi Munos