ML Anthology
Authors
Search
About
Ghavamzadeh, Mohammad
114 publications
TMLR
2026
$\texttt{C2-DPO}$: Constrained Controlled Direct Preference Optimization
Kavosh Asadi
,
Xingzi Xu
,
Julien Han
,
Ege Beyazit
,
Idan Pipano
,
Dominique Perrault-Joncas
,
Shoham Sabach
,
Mohammad Ghavamzadeh
,
Karim Bouyarmane
L4DC
2025
Bridging Distributionally Robust Learning and Offline RL: An Approach to Mitigate Distribution Shift and Partial Data Coverage
Kishan Panaganti
,
Zaiyan Xu
,
Dileep Kalathil
,
Mohammad Ghavamzadeh
ICLR
2025
Conservative Contextual Bandits: Beyond Linear Representations
Rohan Deb
,
Mohammad Ghavamzadeh
,
Arindam Banerjee
JMLR
2025
Contextual Bandits with Stage-Wise Constraints
Aldo Pacchiano
,
Mohammad Ghavamzadeh
,
Peter Bartlett
NeurIPS
2025
Does Thinking More Always Help? Mirage of Test-Time Scaling in Reasoning Models
Soumya Suvra Ghosal
,
Souradip Chakraborty
,
Avinash Reddy
,
Yifu Lu
,
Mengdi Wang
,
Dinesh Manocha
,
Furong Huang
,
Mohammad Ghavamzadeh
,
Amrit Singh Bedi
AISTATS
2025
Q-Learning for Quantile MDPs: A Decomposition, Performance, and Convergence Analysis
Jia Lin Hau
,
Erick Delage
,
Esther Derman
,
Mohammad Ghavamzadeh
,
Marek Petrik
ICML
2024
Bayesian Regret Minimization in Offline Bandits
Marek Petrik
,
Guy Tennenholtz
,
Mohammad Ghavamzadeh
ICMLW
2024
Bridging Distributionally Robust Learning and Offline RL: An Approach to Mitigate Distribution Shift and Partial Data Coverage
Kishan Panaganti
,
Zaiyan Xu
,
Dileep Kalathil
,
Mohammad Ghavamzadeh
ICLR
2024
Confidence-Aware Reward Optimization for Fine-Tuning Text-to-Image Models
Kyuyoung Kim
,
Jongheon Jeong
,
Minyong An
,
Mohammad Ghavamzadeh
,
Krishnamurthy Dj Dvijotham
,
Jinwoo Shin
,
Kimin Lee
ICLR
2024
Maximum Entropy Model Correction in Reinforcement Learning
Amin Rakhsha
,
Mete Kemertas
,
Mohammad Ghavamzadeh
,
Amir-massoud Farahmand
ICLR
2023
A Mixture-of-Expert Approach to RL-Based Dialogue Management
Yinlam Chow
,
Azamat Tulepbergenov
,
Ofir Nachum
,
Dhawal Gupta
,
Moonkyung Ryu
,
Mohammad Ghavamzadeh
,
Craig Boutilier
ICMLW
2023
Algorithms for Optimal Adaptation ofDiffusion Models to Reward Functions
Krishnamurthy Dj Dvijotham
,
Shayegan Omidshafiei
,
Kimin Lee
,
Katherine M. Collins
,
Deepak Ramachandran
,
Adrian Weller
,
Mohammad Ghavamzadeh
,
Milad Nasr
,
Ying Fan
,
Jeremiah Zhe Liu
NeurIPS
2023
DPOK: Reinforcement Learning for Fine-Tuning Text-to-Image Diffusion Models
Ying Fan
,
Olivia Watkins
,
Yuqing Du
,
Hao Liu
,
Moonkyung Ryu
,
Craig Boutilier
,
Pieter Abbeel
,
Mohammad Ghavamzadeh
,
Kangwook Lee
,
Kimin Lee
AISTATS
2023
Entropic Risk Optimization in Discounted MDPs
Jia Lin Hau
,
Marek Petrik
,
Mohammad Ghavamzadeh
AAAI
2023
Meta-Learning for Simple Regret Minimization
Mohammad Javad Azizi
,
Branislav Kveton
,
Mohammad Ghavamzadeh
,
Sumeet Katariya
ICML
2023
Multi-Task Off-Policy Learning from Bandit Feedback
Joey Hong
,
Branislav Kveton
,
Manzil Zaheer
,
Sumeet Katariya
,
Mohammad Ghavamzadeh
AISTATS
2023
Multiple-Policy High-Confidence Policy Evaluation
Chris Dann
,
Mohammad Ghavamzadeh
,
Teodor V. Marinov
NeurIPSW
2023
Non-Adaptive Online Finetuning for Offline Reinforcement Learning
Audrey Huang
,
Mohammad Ghavamzadeh
,
Nan Jiang
,
Marek Petrik
NeurIPS
2023
Offline Reinforcement Learning for Mixture-of-Expert Dialogue Management
Dhawal Gupta
,
Yinlam Chow
,
Azamat Tulepbergenov
,
Mohammad Ghavamzadeh
,
Craig Boutilier
NeurIPS
2023
On Dynamic Programming Decompositions of Static Risk Measures in Markov Decision Processes
Jia Lin Hau
,
Erick Delage
,
Mohammad Ghavamzadeh
,
Marek Petrik
NeurIPS
2023
Ordering-Based Conditions for Global Convergence of Policy Gradient Methods
Jincheng Mei
,
Bo Dai
,
Alekh Agarwal
,
Mohammad Ghavamzadeh
,
Csaba Szepesvari
,
Dale Schuurmans
AISTATS
2022
Hierarchical Bayesian Bandits
Joey Hong
,
Branislav Kveton
,
Manzil Zaheer
,
Mohammad Ghavamzadeh
AISTATS
2022
Thompson Sampling with a Mixture Prior
Joey Hong
,
Branislav Kveton
,
Manzil Zaheer
,
Mohammad Ghavamzadeh
,
Craig Boutilier
NeurIPSW
2022
A Mixture-of-Expert Approach to RL-Based Dialogue Management
Yinlam Chow
,
Azamat Tulepbergenov
,
Ofir Nachum
,
Dhawal Gupta
,
Moonkyung Ryu
,
Mohammad Ghavamzadeh
,
Craig Boutilier
ICML
2022
Deep Hierarchy in Bandits
Joey Hong
,
Branislav Kveton
,
Sumeet Katariya
,
Manzil Zaheer
,
Mohammad Ghavamzadeh
NeurIPS
2022
Efficient Risk-Averse Reinforcement Learning
Ido Greenberg
,
Yinlam Chow
,
Mohammad Ghavamzadeh
,
Shie Mannor
ICML
2022
Feature and Parameter Selection in Stochastic Linear Bandits
Ahmadreza Moradipari
,
Berkay Turan
,
Yasin Abbasi-Yadkori
,
Mahnoosh Alizadeh
,
Mohammad Ghavamzadeh
IJCAI
2022
Fixed-Budget Best-Arm Identification in Structured Bandits
Mohammad Javad Azizi
,
Branislav Kveton
,
Mohammad Ghavamzadeh
ICLR
2022
Mirror Descent Policy Optimization
Manan Tomar
,
Lior Shani
,
Yonathan Efroni
,
Mohammad Ghavamzadeh
NeurIPS
2022
Operator Splitting Value Iteration
Amin Rakhsha
,
Andrew Wang
,
Mohammad Ghavamzadeh
,
Amir-massoud Farahmand
NeurIPS
2022
Private and Communication-Efficient Algorithms for Entropy Estimation
Gecia Bravo-Hermsdorff
,
Róbert Busa-Fekete
,
Mohammad Ghavamzadeh
,
Andres Munoz Medina
,
Umar Syed
NeurIPS
2022
Robust Reinforcement Learning Using Offline Data
Kishan Panaganti
,
Zaiyan Xu
,
Dileep Kalathil
,
Mohammad Ghavamzadeh
AISTATS
2021
Stochastic Bandits with Linear Constraints
Aldo Pacchiano
,
Mohammad Ghavamzadeh
,
Peter Bartlett
,
Heinrich Jiang
NeurIPS
2021
Adaptive Sampling for Minimax Fair Classification
Shubhanshu Shekhar
,
Greg Fields
,
Mohammad Ghavamzadeh
,
Tara Javidi
ICLR
2021
Control-Aware Representations for Model-Based Reinforcement Learning
Brandon Cui
,
Yinlam Chow
,
Mohammad Ghavamzadeh
AAAI
2021
Deep Bayesian Quadrature Policy Optimization
Ravi Tej Akella
,
Kamyar Azizzadenesheli
,
Mohammad Ghavamzadeh
,
Animashree Anandkumar
,
Yisong Yue
L4DC
2021
Neural Lyapunov Redesign
Arash Mehrjou
,
Mohammad Ghavamzadeh
,
Bernhard Schölkopf
ICML
2021
PID Accelerated Value Iteration Algorithm
Amir-Massoud Farahmand
,
Mohammad Ghavamzadeh
IJCAI
2021
Variational Model-Based Policy Optimization
Yinlam Chow
,
Brandon Cui
,
Moonkyung Ryu
,
Mohammad Ghavamzadeh
UAI
2020
Active Model Estimation in Markov Decision Processes
Jean Tarbouriech
,
Shubhanshu Shekhar
,
Matteo Pirotta
,
Mohammad Ghavamzadeh
,
Alessandro Lazaric
ICML
2020
Adaptive Sampling for Estimating Probability Distributions
Shubhanshu Shekhar
,
Tara Javidi
,
Mohammad Ghavamzadeh
AISTATS
2020
Conservative Exploration in Reinforcement Learning
Evrard Garcelon
,
Mohammad Ghavamzadeh
,
Alessandro Lazaric
,
Matteo Pirotta
AAAI
2020
Improved Algorithms for Conservative Exploration in Bandits
Evrard Garcelon
,
Mohammad Ghavamzadeh
,
Alessandro Lazaric
,
Matteo Pirotta
ICML
2020
Multi-Step Greedy Reinforcement Learning Algorithms
Manan Tomar
,
Yonathan Efroni
,
Mohammad Ghavamzadeh
NeurIPS
2020
Online Planning with Lookahead Policies
Yonathan Efroni
,
Mohammad Ghavamzadeh
,
Shie Mannor
ICLR
2020
Prediction, Consistency, Curvature: Representation Learning for Locally-Linear Control
Nir Levine
,
Yinlam Chow
,
Rui Shu
,
Ang Li
,
Mohammad Ghavamzadeh
,
Hung Bui
ICML
2020
Predictive Coding for Locally-Linear Control
Rui Shu
,
Tung Nguyen
,
Yinlam Chow
,
Tuan Pham
,
Khoat Than
,
Mohammad Ghavamzadeh
,
Stefano Ermon
,
Hung Bui
AISTATS
2020
Randomized Exploration in Generalized Linear Bandits
Branislav Kveton
,
Manzil Zaheer
,
Csaba Szepesvari
,
Lihong Li
,
Mohammad Ghavamzadeh
,
Craig Boutilier
CoRL
2020
Safe Policy Learning for Continuous Control
Yinlam Chow
,
Ofir Nachum
,
Aleksandra Faust
,
Edgar Dueñez-Guzman
,
Mohammad Ghavamzadeh
ICML
2019
Garbage in, Reward Out: Bootstrapping Exploration in Multi-Armed Bandits
Branislav Kveton
,
Csaba Szepesvari
,
Sharan Vaswani
,
Zheng Wen
,
Tor Lattimore
,
Mohammad Ghavamzadeh
ICMLW
2019
Lyapunov-Based Safe Policy Optimization for Continuous Control
Yinlam Chow
,
Ofir Nachum
,
Aleksandra Faust
,
Edgar Duenez-Guzman
,
Mohammad Ghavamzadeh
AISTATS
2019
Optimizing over a Restricted Policy Class in MDPs
Ershad Banijamali
,
Yasin Abbasi-Yadkori
,
Mohammad Ghavamzadeh
,
Nikos Vlassis
UAI
2019
Perturbed-History Exploration in Stochastic Linear Bandits
Branislav Kveton
,
Csaba Szepesvári
,
Mohammad Ghavamzadeh
,
Craig Boutilier
IJCAI
2019
Perturbed-History Exploration in Stochastic Multi-Armed Bandits
Branislav Kveton
,
Csaba Szepesvári
,
Mohammad Ghavamzadeh
,
Craig Boutilier
AISTATS
2019
Risk-Sensitive Generative Adversarial Imitation Learning
Jonathan Lacotte
,
Mohammad Ghavamzadeh
,
Yinlam Chow
,
Marco Pavone
NeurIPS
2019
Tight Regret Bounds for Model-Based Reinforcement Learning with Greedy Policies
Yonathan Efroni
,
Nadav Merlis
,
Mohammad Ghavamzadeh
,
Shie Mannor
NeurIPS
2018
A Block Coordinate Ascent Algorithm for Mean-Variance Optimization
Tengyang Xie
,
Bo Liu
,
Yangyang Xu
,
Mohammad Ghavamzadeh
,
Yinlam Chow
,
Daoming Lyu
,
Daesub Yoon
NeurIPS
2018
A Lyapunov-Based Approach to Safe Reinforcement Learning
Yinlam Chow
,
Ofir Nachum
,
Edgar Duenez-Guzman
,
Mohammad Ghavamzadeh
ICML
2018
More Robust Doubly Robust Off-Policy Evaluation
Mehrdad Farajtabar
,
Yinlam Chow
,
Mohammad Ghavamzadeh
ICML
2018
Path Consistency Learning in Tsallis Entropy Regularized MDPs
Yinlam Chow
,
Ofir Nachum
,
Mohammad Ghavamzadeh
JAIR
2018
Proximal Gradient Temporal Difference Learning: Stable Reinforcement Learning with Polynomial Sample Complexity
Bo Liu
,
Ian Gemp
,
Mohammad Ghavamzadeh
,
Ji Liu
,
Sridhar Mahadevan
,
Marek Petrik
AISTATS
2018
Robust Locally-Linear Controllable Embedding
Ershad Banijamali
,
Rui Shu
,
Mohammad Ghavamzadeh
,
Hung Bui
,
Ali Ghodsi
ICML
2017
Active Learning for Accurate Estimation of Linear Models
Carlos Riquelme
,
Mohammad Ghavamzadeh
,
Alessandro Lazaric
AAAI
2017
Automated Data Cleansing Through Meta-Learning
Ian Gemp
,
Georgios Theocharous
,
Mohammad Ghavamzadeh
ICML
2017
Bottleneck Conditional Density Estimation
Rui Shu
,
Hung H. Bui
,
Mohammad Ghavamzadeh
NeurIPS
2017
Conservative Contextual Linear Bandits
Abbas Kazerouni
,
Mohammad Ghavamzadeh
,
Yasin Abbasi Yadkori
,
Benjamin Van Roy
ICML
2017
Model-Independent Online Learning for Influence Maximization
Sharan Vaswani
,
Branislav Kveton
,
Zheng Wen
,
Mohammad Ghavamzadeh
,
Laks V. S. Lakshmanan
,
Mark Schmidt
ICML
2017
Online Learning to Rank in Stochastic Click Models
Masrour Zoghi
,
Tomas Tunys
,
Mohammad Ghavamzadeh
,
Branislav Kveton
,
Csaba Szepesvari
,
Zheng Wen
AAAI
2017
Predictive Off-Policy Policy Evaluation for Nonstationary Decision Problems, with Applications to Digital Marketing
Philip S. Thomas
,
Georgios Theocharous
,
Mohammad Ghavamzadeh
,
Ishan Durugkar
,
Emma Brunskill
AISTATS
2017
Sequential Multiple Hypothesis Testing with Type I Error Control
Alan Malek
,
Sumeet Katariya
,
Yinlam Chow
,
Mohammad Ghavamzadeh
JMLR
2016
Analysis of Classification-Based Policy Iteration Algorithms
Alessandro Lazaric
,
Mohammad Ghavamzadeh
,
Rémi Munos
JMLR
2016
Bayesian Policy Gradient and Actor-Critic Algorithms
Mohammad Ghavamzadeh
,
Yaakov Engel
,
Michal Valko
ECML-PKDD
2016
Graphical Model Sketch
Branislav Kveton
,
Hung Bui
,
Mohammad Ghavamzadeh
,
Georgios Theocharous
,
S. Muthukrishnan
,
Siqi Sun
AISTATS
2016
Improved Learning Complexity in Combinatorial Pure Exploration Bandits
Victor Gabillon
,
Alessandro Lazaric
,
Mohammad Ghavamzadeh
,
Ronald Ortner
,
Peter L. Bartlett
IJCAI
2016
Proximal Gradient Temporal Difference Learning Algorithms
Bo Liu
,
Ji Liu
,
Mohammad Ghavamzadeh
,
Sridhar Mahadevan
,
Marek Petrik
JMLR
2016
Regularized Policy Iteration with Nonparametric Function Spaces
Amir-massoud Farahmand
,
Mohammad Ghavamzadeh
,
Csaba Szepesvári
,
Shie Mannor
NeurIPS
2016
Safe Policy Improvement by Minimizing Robust Baseline Regret
Mohammad Ghavamzadeh
,
Marek Petrik
,
Yinlam Chow
MLJ
2016
Variance-Constrained Actor-Critic Algorithms for Discounted and Average Reward MDPs
L. A. Prashanth
,
Mohammad Ghavamzadeh
JMLR
2015
Approximate Modified Policy Iteration and Its Application to the Game of Tetris
Bruno Scherrer
,
Mohammad Ghavamzadeh
,
Victor Gabillon
,
Boris Lesner
,
Matthieu Geist
FnTML
2015
Bayesian Reinforcement Learning: A Survey
Mohammad Ghavamzadeh
,
Shie Mannor
,
Joelle Pineau
,
Aviv Tamar
UAI
2015
Finite-Sample Analysis of Proximal Gradient TD Algorithms
Bo Liu
,
Ji Liu
,
Mohammad Ghavamzadeh
,
Sridhar Mahadevan
,
Marek Petrik
ICML
2015
High Confidence Policy Improvement
Philip Thomas
,
Georgios Theocharous
,
Mohammad Ghavamzadeh
AAAI
2015
High-Confidence Off-Policy Evaluation
Philip S. Thomas
,
Georgios Theocharous
,
Mohammad Ghavamzadeh
IJCAI
2015
Maximum Entropy Semi-Supervised Inverse Reinforcement Learning
Julien Audiffren
,
Michal Valko
,
Alessandro Lazaric
,
Mohammad Ghavamzadeh
IJCAI
2015
Personalized Ad Recommendation Systems for Life-Time Value Optimization with Guarantees
Georgios Theocharous
,
Philip S. Thomas
,
Mohammad Ghavamzadeh
NeurIPS
2015
Policy Gradient for Coherent Risk Measures
Aviv Tamar
,
Yinlam Chow
,
Mohammad Ghavamzadeh
,
Shie Mannor
NeurIPS
2014
Algorithms for CVaR Optimization in MDPs
Yinlam Chow
,
Mohammad Ghavamzadeh
ICML
2013
A Generalized Kernel Approach to Structured Output Learning
Hachem Kadri
,
Mohammad Ghavamzadeh
,
Philippe Preux
NeurIPS
2013
Actor-Critic Algorithms for Risk-Sensitive MDPs
Prashanth L.A.
,
Mohammad Ghavamzadeh
NeurIPS
2013
Approximate Dynamic Programming Finally Performs Well in the Game of Tetris
Victor Gabillon
,
Mohammad Ghavamzadeh
,
Bruno Scherrer
ICML
2013
Cost-Sensitive Multiclass Classification Risk Bounds
Bernardo Ávila Pires
,
Csaba Szepesvari
,
Mohammad Ghavamzadeh
ICML
2012
A Dantzig Selector Approach to Temporal Difference Learning
Matthieu Geist
,
Bruno Scherrer
,
Alessandro Lazaric
,
Mohammad Ghavamzadeh
ICML
2012
Approximate Modified Policy Iteration
Bruno Scherrer
,
Victor Gabillon
,
Mohammad Ghavamzadeh
,
Matthieu Geist
NeurIPS
2012
Best Arm Identification: A Unified Approach to Fixed Budget and Fixed Confidence
Victor Gabillon
,
Mohammad Ghavamzadeh
,
Alessandro Lazaric
AAAI
2012
Conservative and Greedy Approaches to Classification-Based Policy Iteration
Mohammad Ghavamzadeh
,
Alessandro Lazaric
JMLR
2012
Finite-Sample Analysis of Least-Squares Policy Iteration
Alessandro Lazaric
,
Mohammad Ghavamzadeh
,
Rémi Munos
ICML
2011
Classification-Based Policy Iteration with a Critic
Victor Gabillon
,
Alessandro Lazaric
,
Mohammad Ghavamzadeh
,
Bruno Scherrer
ICML
2011
Finite-Sample Analysis of Lasso-TD
Mohammad Ghavamzadeh
,
Alessandro Lazaric
,
Rémi Munos
,
Matthew W. Hoffman
NeurIPS
2011
Multi-Bandit Best Arm Identification
Victor Gabillon
,
Mohammad Ghavamzadeh
,
Alessandro Lazaric
,
Sébastien Bubeck
NeurIPS
2011
Speedy Q-Learning
Mohammad Ghavamzadeh
,
Hilbert J. Kappen
,
Mohammad G. Azar
,
Rémi Munos
ALT
2011
Upper-Confidence-Bound Algorithms for Active Learning in Multi-Armed Bandits
Alexandra Carpentier
,
Alessandro Lazaric
,
Mohammad Ghavamzadeh
,
Rémi Munos
,
Peter Auer
ICML
2010
Analysis of a Classification-Based Policy Iteration Algorithm
Alessandro Lazaric
,
Mohammad Ghavamzadeh
,
Rémi Munos
ICML
2010
Bayesian Multi-Task Reinforcement Learning
Alessandro Lazaric
,
Mohammad Ghavamzadeh
ACML
2010
Finite-Sample Analysis of Bellman Residual Minimization
Odalric-Ambrym Maillard
,
Remi Munos
,
Alessandro Lazaric
,
Mohammad Ghavamzadeh
ICML
2010
Finite-Sample Analysis of LSTD
Alessandro Lazaric
,
Mohammad Ghavamzadeh
,
Rémi Munos
NeurIPS
2010
LSTD with Random Projections
Mohammad Ghavamzadeh
,
Alessandro Lazaric
,
Odalric Maillard
,
Rémi Munos
NeurIPS
2008
Regularized Policy Iteration
Amir M. Farahmand
,
Mohammad Ghavamzadeh
,
Shie Mannor
,
Csaba Szepesvári
ICML
2007
Bayesian Actor-Critic Algorithms
Mohammad Ghavamzadeh
,
Yaakov Engel
JMLR
2007
Hierarchical Average Reward Reinforcement Learning
Mohammad Ghavamzadeh
,
Sridhar Mahadevan
NeurIPS
2007
Incremental Natural Actor-Critic Algorithms
Shalabh Bhatnagar
,
Mohammad Ghavamzadeh
,
Mark Lee
,
Richard S. Sutton
NeurIPS
2006
Bayesian Policy Gradient Algorithms
Mohammad Ghavamzadeh
,
Yaakov Engel
ICML
2003
Hierarchical Policy Gradient Algorithms
Mohammad Ghavamzadeh
,
Sridhar Mahadevan
ICML
2002
Hierarchically Optimal Average Reward Reinforcement Learning
Mohammad Ghavamzadeh
,
Sridhar Mahadevan
ICML
2001
Continuous-Time Hierarchical Reinforcement Learning
Mohammad Ghavamzadeh
,
Sridhar Mahadevan