ML Anthology
Authors
Search
About
Maillard, Odalric-Ambrym
52 publications
ICLRW
2025
Hierarchical Subspaces of Policies for Continual Offline Reinforcement Learning
Anthony Kobanda
,
Rémy Portelas
,
Odalric-Ambrym Maillard
,
Ludovic Denoyer
ICML
2025
Monte-Carlo Tree Search with Uncertainty Propagation via Optimal Transport
Tuan Quang Dam
,
Pascal Stenger
,
Lukas Schneider
,
Joni Pajarinen
,
Carlo D’Eramo
,
Odalric-Ambrym Maillard
TMLR
2024
AdaStop: Adaptive Statistical Testing for Sound Comparisons of Deep RL Agents
Timothée Mathieu
,
Matheus Medeiros Centa
,
Riccardo Della Vecchia
,
Hector Kohler
,
Alena Shilova
,
Odalric-Ambrym Maillard
,
Philippe Preux
TMLR
2024
Bandits Corrupted by Nature: Lower Bounds on Regret and Robust Optimistic Algorithms
Timothée Mathieu
,
Debabrota Basu
,
Odalric-Ambrym Maillard
ALT
2024
CRIMED: Lower and Upper Bounds on Regret for Bandits with Unbounded Stochastic Corruption
Shubhada Agrawal
,
Timothée Mathieu
,
Debabrota Basu
,
Odalric-Ambrym Maillard
ICMLW
2024
Distributional Monte-Carlo Planning with Thompson Sampling in Stochastic Environments
Tuan Quang Dam
,
Brahim Driss
,
Odalric-Ambrym Maillard
UAI
2024
Power Mean Estimation in Stochastic Monte-Carlo Tree Search
Tuan Dam
,
Odalric-Ambrym Maillard
,
Emilie Kaufmann
ICMLW
2024
Power Mean Estimation in Stochastic Monte-Carlo Tree Search
Tuan Quang Dam
,
Odalric-Ambrym Maillard
,
Emilie Kaufmann
AISTATS
2023
Exploration in Reward Machines with Low Regret
Hippolyte Bourel
,
Anders Jonsson
,
Odalric-Ambrym Maillard
,
Mohammad Sadegh Talebi
NeurIPS
2023
Fast Asymptotically Optimal Algorithms for Non-Parametric Stochastic Bandits
Dorian Baudry
,
Fabien Pesquerel
,
Rémy Degenne
,
Odalric-Ambrym Maillard
ACML
2023
Logarithmic Regret in Communicating MDPs: Leveraging Known Dynamics with Bandits
Hassan Saber
,
Fabien Pesquerel
,
Odalric-Ambrym Maillard
,
Mohammad Sadegh Talebi
TMLR
2022
Collaborative Algorithms for Online Personalized Mean Estimation
Mahsa Asadi
,
Aurélien Bellet
,
Odalric-Ambrym Maillard
,
Marc Tommasi
JMLR
2022
Efficient Change-Point Detection for Tackling Piecewise-Stationary Bandits
Lilian Besson
,
Emilie Kaufmann
,
Odalric-Ambrym Maillard
,
Julien Seznec
ICMLW
2022
Exploration in Reward Machines with Low Regret
Hippolyte Bourel
,
Anders Jonsson
,
Odalric-Ambrym Maillard
,
Mohammad Sadegh Talebi
NeurIPS
2022
IMED-RL: Regret Optimal Learning of Ergodic Markov Decision Processes
Fabien Pesquerel
,
Odalric-Ambrym Maillard
AISTATS
2021
Reinforcement Learning in Parametric MDPs with Exponential Families
Sayak Ray Chowdhury
,
Aditya Gopalan
,
Odalric-Ambrym Maillard
NeurIPS
2021
From Optimality to Robustness: Adaptive Re-Sampling Strategies in Stochastic Bandits
Dorian Baudry
,
Patrick Saux
,
Odalric-Ambrym Maillard
NeurIPS
2021
Indexed Minimum Empirical Divergence for Unimodal Bandits
Hassan Saber
,
Pierre Ménard
,
Odalric-Ambrym Maillard
ICLR
2021
Learning Value Functions in Deep Policy Gradients Using Residual Variance
Yannis Flet-Berliac
,
Reda Ouhamma
,
Odalric-Ambrym Maillard
,
Philippe Preux
ECML-PKDD
2021
Routine Bandits: Minimizing Regret on Recurring Problems
Hassan Saber
,
Léo Saci
,
Odalric-Ambrym Maillard
,
Audrey Durand
NeurIPS
2021
Stochastic Bandits with Groups of Similar Arms.
Fabien Pesquerel
,
Hassan Saber
,
Odalric-Ambrym Maillard
NeurIPS
2021
Stochastic Online Linear Regression: The Forward Algorithm to Replace Ridge
Reda Ouhamma
,
Odalric-Ambrym Maillard
,
Vianney Perchet
ACML
2020
Monte-Carlo Graph Search: The Value of Merging Similar States
Edouard Leurent
,
Odalric-Ambrym Maillard
NeurIPS
2020
Robust-Adaptive Control of Linear Systems: Beyond Quadratic Costs
Edouard Leurent
,
Odalric-Ambrym Maillard
,
Denis Efimov
NeurIPS
2020
Sub-Sampling for Efficient Non-Parametric Bandit Exploration
Dorian Baudry
,
Emilie Kaufmann
,
Odalric-Ambrym Maillard
NeurIPS
2019
Budgeted Reinforcement Learning in Continuous State Space
Nicolas Carrara
,
Edouard Leurent
,
Romain Laroche
,
Tanguy Urvoy
,
Odalric-Ambrym Maillard
,
Olivier Pietquin
NeurIPS
2019
Learning Multiple Markov Chains via Adaptive Allocation
Mohammad Sadegh Talebi
,
Odalric-Ambrym Maillard
ACML
2019
Model-Based Reinforcement Learning Exploiting State-Action Equivalence
Mahsa Asadi
,
Mohammad Sadegh Talebi
,
Hippolyte Bourel
,
Odalric-Ambrym Maillard
ECML-PKDD
2019
Practical Open-Loop Optimistic Planning
Edouard Leurent
,
Odalric-Ambrym Maillard
NeurIPS
2019
Regret Bounds for Learning State Representations in Reinforcement Learning
Ronald Ortner
,
Matteo Pirotta
,
Alessandro Lazaric
,
Ronan Fruit
,
Odalric-Ambrym Maillard
ALT
2019
Sequential Change-Point Detection: Laplace Concentration of Scan Statistics and Non-Asymptotic Delay Bounds
Odalric-Ambrym Maillard
JMLR
2018
Streaming Kernel Regression with Provably Adaptive Mean, Variance, and Regularization
Audrey Durand
,
Odalric-Ambrym Maillard
,
Joelle Pineau
ALT
2018
Variance-Aware Regret Bounds for Undiscounted Reinforcement Learning in MDPs
Mohammad Sadegh Talebi
,
Odalric-Ambrym Maillard
ALT
2017
Boundary Crossing for General Exponential Families
Odalric-Ambrym Maillard
ALT
2017
Efficient Tracking of a Growing Number of Experts
Jaouad Mourtada
,
Odalric-Ambrym Maillard
ICML
2017
Spectral Learning from a Single Trajectory Under Finite-State Policies
Borja Balle
,
Odalric-Ambrym Maillard
NeurIPS
2014
How Hard Is My MDP?" the Distribution-Norm to the Rescue"
Odalric-Ambrym Maillard
,
Timothy A Mann
,
Shie Mannor
ICML
2014
Latent Bandits.
Odalric-Ambrym Maillard
,
Shie Mannor
ALT
2014
Selecting Near-Optimal Approximate State Representations in Reinforcement Learning
Ronald Ortner
,
Odalric-Ambrym Maillard
,
Daniil Ryabko
ECML-PKDD
2014
Sub-Sampling for Multi-Armed Bandits
Akram Baransi
,
Odalric-Ambrym Maillard
,
Shie Mannor
AISTATS
2013
Competing with an Infinite Set of Models in Reinforcement Learning
Phuong Nguyen
,
Odalric-Ambrym Maillard
,
Daniil Ryabko
,
Ronald Ortner
ICML
2013
Optimal Regret Bounds for Selecting the State Representation in Reinforcement Learning
Odalric-Ambrym Maillard
,
Phuong Nguyen
,
Ronald Ortner
,
Daniil Ryabko
ALT
2013
Robust Risk-Averse Stochastic Multi-Armed Bandits
Odalric-Ambrym Maillard
NeurIPS
2012
Hierarchical Optimistic Region Selection Driven by Curiosity
Odalric-ambrym Maillard
JMLR
2012
Linear Regression with Random Projections
Odalric-Ambrym Maillard
,
Rémi Munos
NeurIPS
2012
Online Allocation and Homogeneous Partitioning for Piecewise Constant Mean-Approximation
Alexandra Carpentier
,
Odalric-ambrym Maillard
COLT
2011
A Finite-Time Analysis of Multi-Armed Bandits Problems with Kullback-Leibler Divergences
Odalric-Ambrym Maillard
,
Rémi Munos
,
Gilles Stoltz
NeurIPS
2011
Selecting the State-Representation in Reinforcement Learning
Odalric-ambrym Maillard
,
Daniil Ryabko
,
Rémi Munos
NeurIPS
2011
Sparse Recovery with Brownian Sensing
Alexandra Carpentier
,
Odalric-ambrym Maillard
,
Rémi Munos
ACML
2010
Finite-Sample Analysis of Bellman Residual Minimization
Odalric-Ambrym Maillard
,
Remi Munos
,
Alessandro Lazaric
,
Mohammad Ghavamzadeh
ECML-PKDD
2010
Online Learning in Adversarial Lipschitz Environments
Odalric-Ambrym Maillard
,
Rémi Munos
ALT
2009
Complexity Versus Agreement for Many Views
Odalric-Ambrym Maillard
,
Nicolas Vayatis