Maillard, Odalric-Ambrym

52 publications

ICLRW 2025 Hierarchical Subspaces of Policies for Continual Offline Reinforcement Learning Anthony Kobanda, Rémy Portelas, Odalric-Ambrym Maillard, Ludovic Denoyer
ICML 2025 Monte-Carlo Tree Search with Uncertainty Propagation via Optimal Transport Tuan Quang Dam, Pascal Stenger, Lukas Schneider, Joni Pajarinen, Carlo D’Eramo, Odalric-Ambrym Maillard
TMLR 2024 AdaStop: Adaptive Statistical Testing for Sound Comparisons of Deep RL Agents Timothée Mathieu, Matheus Medeiros Centa, Riccardo Della Vecchia, Hector Kohler, Alena Shilova, Odalric-Ambrym Maillard, Philippe Preux
TMLR 2024 Bandits Corrupted by Nature: Lower Bounds on Regret and Robust Optimistic Algorithms Timothée Mathieu, Debabrota Basu, Odalric-Ambrym Maillard
ALT 2024 CRIMED: Lower and Upper Bounds on Regret for Bandits with Unbounded Stochastic Corruption Shubhada Agrawal, Timothée Mathieu, Debabrota Basu, Odalric-Ambrym Maillard
ICMLW 2024 Distributional Monte-Carlo Planning with Thompson Sampling in Stochastic Environments Tuan Quang Dam, Brahim Driss, Odalric-Ambrym Maillard
UAI 2024 Power Mean Estimation in Stochastic Monte-Carlo Tree Search Tuan Dam, Odalric-Ambrym Maillard, Emilie Kaufmann
ICMLW 2024 Power Mean Estimation in Stochastic Monte-Carlo Tree Search Tuan Quang Dam, Odalric-Ambrym Maillard, Emilie Kaufmann
AISTATS 2023 Exploration in Reward Machines with Low Regret Hippolyte Bourel, Anders Jonsson, Odalric-Ambrym Maillard, Mohammad Sadegh Talebi
NeurIPS 2023 Fast Asymptotically Optimal Algorithms for Non-Parametric Stochastic Bandits Dorian Baudry, Fabien Pesquerel, Rémy Degenne, Odalric-Ambrym Maillard
ACML 2023 Logarithmic Regret in Communicating MDPs: Leveraging Known Dynamics with Bandits Hassan Saber, Fabien Pesquerel, Odalric-Ambrym Maillard, Mohammad Sadegh Talebi
TMLR 2022 Collaborative Algorithms for Online Personalized Mean Estimation Mahsa Asadi, Aurélien Bellet, Odalric-Ambrym Maillard, Marc Tommasi
JMLR 2022 Efficient Change-Point Detection for Tackling Piecewise-Stationary Bandits Lilian Besson, Emilie Kaufmann, Odalric-Ambrym Maillard, Julien Seznec
ICMLW 2022 Exploration in Reward Machines with Low Regret Hippolyte Bourel, Anders Jonsson, Odalric-Ambrym Maillard, Mohammad Sadegh Talebi
NeurIPS 2022 IMED-RL: Regret Optimal Learning of Ergodic Markov Decision Processes Fabien Pesquerel, Odalric-Ambrym Maillard
AISTATS 2021 Reinforcement Learning in Parametric MDPs with Exponential Families Sayak Ray Chowdhury, Aditya Gopalan, Odalric-Ambrym Maillard
NeurIPS 2021 From Optimality to Robustness: Adaptive Re-Sampling Strategies in Stochastic Bandits Dorian Baudry, Patrick Saux, Odalric-Ambrym Maillard
NeurIPS 2021 Indexed Minimum Empirical Divergence for Unimodal Bandits Hassan Saber, Pierre Ménard, Odalric-Ambrym Maillard
ICLR 2021 Learning Value Functions in Deep Policy Gradients Using Residual Variance Yannis Flet-Berliac, Reda Ouhamma, Odalric-Ambrym Maillard, Philippe Preux
ECML-PKDD 2021 Routine Bandits: Minimizing Regret on Recurring Problems Hassan Saber, Léo Saci, Odalric-Ambrym Maillard, Audrey Durand
NeurIPS 2021 Stochastic Bandits with Groups of Similar Arms. Fabien Pesquerel, Hassan Saber, Odalric-Ambrym Maillard
NeurIPS 2021 Stochastic Online Linear Regression: The Forward Algorithm to Replace Ridge Reda Ouhamma, Odalric-Ambrym Maillard, Vianney Perchet
ACML 2020 Monte-Carlo Graph Search: The Value of Merging Similar States Edouard Leurent, Odalric-Ambrym Maillard
NeurIPS 2020 Robust-Adaptive Control of Linear Systems: Beyond Quadratic Costs Edouard Leurent, Odalric-Ambrym Maillard, Denis Efimov
NeurIPS 2020 Sub-Sampling for Efficient Non-Parametric Bandit Exploration Dorian Baudry, Emilie Kaufmann, Odalric-Ambrym Maillard
NeurIPS 2019 Budgeted Reinforcement Learning in Continuous State Space Nicolas Carrara, Edouard Leurent, Romain Laroche, Tanguy Urvoy, Odalric-Ambrym Maillard, Olivier Pietquin
NeurIPS 2019 Learning Multiple Markov Chains via Adaptive Allocation Mohammad Sadegh Talebi, Odalric-Ambrym Maillard
ACML 2019 Model-Based Reinforcement Learning Exploiting State-Action Equivalence Mahsa Asadi, Mohammad Sadegh Talebi, Hippolyte Bourel, Odalric-Ambrym Maillard
ECML-PKDD 2019 Practical Open-Loop Optimistic Planning Edouard Leurent, Odalric-Ambrym Maillard
NeurIPS 2019 Regret Bounds for Learning State Representations in Reinforcement Learning Ronald Ortner, Matteo Pirotta, Alessandro Lazaric, Ronan Fruit, Odalric-Ambrym Maillard
ALT 2019 Sequential Change-Point Detection: Laplace Concentration of Scan Statistics and Non-Asymptotic Delay Bounds Odalric-Ambrym Maillard
JMLR 2018 Streaming Kernel Regression with Provably Adaptive Mean, Variance, and Regularization Audrey Durand, Odalric-Ambrym Maillard, Joelle Pineau
ALT 2018 Variance-Aware Regret Bounds for Undiscounted Reinforcement Learning in MDPs Mohammad Sadegh Talebi, Odalric-Ambrym Maillard
ALT 2017 Boundary Crossing for General Exponential Families Odalric-Ambrym Maillard
ALT 2017 Efficient Tracking of a Growing Number of Experts Jaouad Mourtada, Odalric-Ambrym Maillard
ICML 2017 Spectral Learning from a Single Trajectory Under Finite-State Policies Borja Balle, Odalric-Ambrym Maillard
NeurIPS 2014 How Hard Is My MDP?" the Distribution-Norm to the Rescue" Odalric-Ambrym Maillard, Timothy A Mann, Shie Mannor
ICML 2014 Latent Bandits. Odalric-Ambrym Maillard, Shie Mannor
ALT 2014 Selecting Near-Optimal Approximate State Representations in Reinforcement Learning Ronald Ortner, Odalric-Ambrym Maillard, Daniil Ryabko
ECML-PKDD 2014 Sub-Sampling for Multi-Armed Bandits Akram Baransi, Odalric-Ambrym Maillard, Shie Mannor
AISTATS 2013 Competing with an Infinite Set of Models in Reinforcement Learning Phuong Nguyen, Odalric-Ambrym Maillard, Daniil Ryabko, Ronald Ortner
ICML 2013 Optimal Regret Bounds for Selecting the State Representation in Reinforcement Learning Odalric-Ambrym Maillard, Phuong Nguyen, Ronald Ortner, Daniil Ryabko
ALT 2013 Robust Risk-Averse Stochastic Multi-Armed Bandits Odalric-Ambrym Maillard
NeurIPS 2012 Hierarchical Optimistic Region Selection Driven by Curiosity Odalric-ambrym Maillard
JMLR 2012 Linear Regression with Random Projections Odalric-Ambrym Maillard, Rémi Munos
NeurIPS 2012 Online Allocation and Homogeneous Partitioning for Piecewise Constant Mean-Approximation Alexandra Carpentier, Odalric-ambrym Maillard
COLT 2011 A Finite-Time Analysis of Multi-Armed Bandits Problems with Kullback-Leibler Divergences Odalric-Ambrym Maillard, Rémi Munos, Gilles Stoltz
NeurIPS 2011 Selecting the State-Representation in Reinforcement Learning Odalric-ambrym Maillard, Daniil Ryabko, Rémi Munos
NeurIPS 2011 Sparse Recovery with Brownian Sensing Alexandra Carpentier, Odalric-ambrym Maillard, Rémi Munos
ACML 2010 Finite-Sample Analysis of Bellman Residual Minimization Odalric-Ambrym Maillard, Remi Munos, Alessandro Lazaric, Mohammad Ghavamzadeh
ECML-PKDD 2010 Online Learning in Adversarial Lipschitz Environments Odalric-Ambrym Maillard, Rémi Munos
ALT 2009 Complexity Versus Agreement for Many Views Odalric-Ambrym Maillard, Nicolas Vayatis