Ghavamzadeh, Mohammad

114 publications

TMLR 2026 $\texttt{C2-DPO}$: Constrained Controlled Direct Preference Optimization Kavosh Asadi, Xingzi Xu, Julien Han, Ege Beyazit, Idan Pipano, Dominique Perrault-Joncas, Shoham Sabach, Mohammad Ghavamzadeh, Karim Bouyarmane
L4DC 2025 Bridging Distributionally Robust Learning and Offline RL: An Approach to Mitigate Distribution Shift and Partial Data Coverage Kishan Panaganti, Zaiyan Xu, Dileep Kalathil, Mohammad Ghavamzadeh
ICLR 2025 Conservative Contextual Bandits: Beyond Linear Representations Rohan Deb, Mohammad Ghavamzadeh, Arindam Banerjee
JMLR 2025 Contextual Bandits with Stage-Wise Constraints Aldo Pacchiano, Mohammad Ghavamzadeh, Peter Bartlett
NeurIPS 2025 Does Thinking More Always Help? Mirage of Test-Time Scaling in Reasoning Models Soumya Suvra Ghosal, Souradip Chakraborty, Avinash Reddy, Yifu Lu, Mengdi Wang, Dinesh Manocha, Furong Huang, Mohammad Ghavamzadeh, Amrit Singh Bedi
AISTATS 2025 Q-Learning for Quantile MDPs: A Decomposition, Performance, and Convergence Analysis Jia Lin Hau, Erick Delage, Esther Derman, Mohammad Ghavamzadeh, Marek Petrik
ICML 2024 Bayesian Regret Minimization in Offline Bandits Marek Petrik, Guy Tennenholtz, Mohammad Ghavamzadeh
ICMLW 2024 Bridging Distributionally Robust Learning and Offline RL: An Approach to Mitigate Distribution Shift and Partial Data Coverage Kishan Panaganti, Zaiyan Xu, Dileep Kalathil, Mohammad Ghavamzadeh
ICLR 2024 Confidence-Aware Reward Optimization for Fine-Tuning Text-to-Image Models Kyuyoung Kim, Jongheon Jeong, Minyong An, Mohammad Ghavamzadeh, Krishnamurthy Dj Dvijotham, Jinwoo Shin, Kimin Lee
ICLR 2024 Maximum Entropy Model Correction in Reinforcement Learning Amin Rakhsha, Mete Kemertas, Mohammad Ghavamzadeh, Amir-massoud Farahmand
ICLR 2023 A Mixture-of-Expert Approach to RL-Based Dialogue Management Yinlam Chow, Azamat Tulepbergenov, Ofir Nachum, Dhawal Gupta, Moonkyung Ryu, Mohammad Ghavamzadeh, Craig Boutilier
ICMLW 2023 Algorithms for Optimal Adaptation ofDiffusion Models to Reward Functions Krishnamurthy Dj Dvijotham, Shayegan Omidshafiei, Kimin Lee, Katherine M. Collins, Deepak Ramachandran, Adrian Weller, Mohammad Ghavamzadeh, Milad Nasr, Ying Fan, Jeremiah Zhe Liu
NeurIPS 2023 DPOK: Reinforcement Learning for Fine-Tuning Text-to-Image Diffusion Models Ying Fan, Olivia Watkins, Yuqing Du, Hao Liu, Moonkyung Ryu, Craig Boutilier, Pieter Abbeel, Mohammad Ghavamzadeh, Kangwook Lee, Kimin Lee
AISTATS 2023 Entropic Risk Optimization in Discounted MDPs Jia Lin Hau, Marek Petrik, Mohammad Ghavamzadeh
AAAI 2023 Meta-Learning for Simple Regret Minimization Mohammad Javad Azizi, Branislav Kveton, Mohammad Ghavamzadeh, Sumeet Katariya
ICML 2023 Multi-Task Off-Policy Learning from Bandit Feedback Joey Hong, Branislav Kveton, Manzil Zaheer, Sumeet Katariya, Mohammad Ghavamzadeh
AISTATS 2023 Multiple-Policy High-Confidence Policy Evaluation Chris Dann, Mohammad Ghavamzadeh, Teodor V. Marinov
NeurIPSW 2023 Non-Adaptive Online Finetuning for Offline Reinforcement Learning Audrey Huang, Mohammad Ghavamzadeh, Nan Jiang, Marek Petrik
NeurIPS 2023 Offline Reinforcement Learning for Mixture-of-Expert Dialogue Management Dhawal Gupta, Yinlam Chow, Azamat Tulepbergenov, Mohammad Ghavamzadeh, Craig Boutilier
NeurIPS 2023 On Dynamic Programming Decompositions of Static Risk Measures in Markov Decision Processes Jia Lin Hau, Erick Delage, Mohammad Ghavamzadeh, Marek Petrik
NeurIPS 2023 Ordering-Based Conditions for Global Convergence of Policy Gradient Methods Jincheng Mei, Bo Dai, Alekh Agarwal, Mohammad Ghavamzadeh, Csaba Szepesvari, Dale Schuurmans
AISTATS 2022 Hierarchical Bayesian Bandits Joey Hong, Branislav Kveton, Manzil Zaheer, Mohammad Ghavamzadeh
AISTATS 2022 Thompson Sampling with a Mixture Prior Joey Hong, Branislav Kveton, Manzil Zaheer, Mohammad Ghavamzadeh, Craig Boutilier
NeurIPSW 2022 A Mixture-of-Expert Approach to RL-Based Dialogue Management Yinlam Chow, Azamat Tulepbergenov, Ofir Nachum, Dhawal Gupta, Moonkyung Ryu, Mohammad Ghavamzadeh, Craig Boutilier
ICML 2022 Deep Hierarchy in Bandits Joey Hong, Branislav Kveton, Sumeet Katariya, Manzil Zaheer, Mohammad Ghavamzadeh
NeurIPS 2022 Efficient Risk-Averse Reinforcement Learning Ido Greenberg, Yinlam Chow, Mohammad Ghavamzadeh, Shie Mannor
ICML 2022 Feature and Parameter Selection in Stochastic Linear Bandits Ahmadreza Moradipari, Berkay Turan, Yasin Abbasi-Yadkori, Mahnoosh Alizadeh, Mohammad Ghavamzadeh
IJCAI 2022 Fixed-Budget Best-Arm Identification in Structured Bandits Mohammad Javad Azizi, Branislav Kveton, Mohammad Ghavamzadeh
ICLR 2022 Mirror Descent Policy Optimization Manan Tomar, Lior Shani, Yonathan Efroni, Mohammad Ghavamzadeh
NeurIPS 2022 Operator Splitting Value Iteration Amin Rakhsha, Andrew Wang, Mohammad Ghavamzadeh, Amir-massoud Farahmand
NeurIPS 2022 Private and Communication-Efficient Algorithms for Entropy Estimation Gecia Bravo-Hermsdorff, Róbert Busa-Fekete, Mohammad Ghavamzadeh, Andres Munoz Medina, Umar Syed
NeurIPS 2022 Robust Reinforcement Learning Using Offline Data Kishan Panaganti, Zaiyan Xu, Dileep Kalathil, Mohammad Ghavamzadeh
AISTATS 2021 Stochastic Bandits with Linear Constraints Aldo Pacchiano, Mohammad Ghavamzadeh, Peter Bartlett, Heinrich Jiang
NeurIPS 2021 Adaptive Sampling for Minimax Fair Classification Shubhanshu Shekhar, Greg Fields, Mohammad Ghavamzadeh, Tara Javidi
ICLR 2021 Control-Aware Representations for Model-Based Reinforcement Learning Brandon Cui, Yinlam Chow, Mohammad Ghavamzadeh
AAAI 2021 Deep Bayesian Quadrature Policy Optimization Ravi Tej Akella, Kamyar Azizzadenesheli, Mohammad Ghavamzadeh, Animashree Anandkumar, Yisong Yue
L4DC 2021 Neural Lyapunov Redesign Arash Mehrjou, Mohammad Ghavamzadeh, Bernhard Schölkopf
ICML 2021 PID Accelerated Value Iteration Algorithm Amir-Massoud Farahmand, Mohammad Ghavamzadeh
IJCAI 2021 Variational Model-Based Policy Optimization Yinlam Chow, Brandon Cui, Moonkyung Ryu, Mohammad Ghavamzadeh
UAI 2020 Active Model Estimation in Markov Decision Processes Jean Tarbouriech, Shubhanshu Shekhar, Matteo Pirotta, Mohammad Ghavamzadeh, Alessandro Lazaric
ICML 2020 Adaptive Sampling for Estimating Probability Distributions Shubhanshu Shekhar, Tara Javidi, Mohammad Ghavamzadeh
AISTATS 2020 Conservative Exploration in Reinforcement Learning Evrard Garcelon, Mohammad Ghavamzadeh, Alessandro Lazaric, Matteo Pirotta
AAAI 2020 Improved Algorithms for Conservative Exploration in Bandits Evrard Garcelon, Mohammad Ghavamzadeh, Alessandro Lazaric, Matteo Pirotta
ICML 2020 Multi-Step Greedy Reinforcement Learning Algorithms Manan Tomar, Yonathan Efroni, Mohammad Ghavamzadeh
NeurIPS 2020 Online Planning with Lookahead Policies Yonathan Efroni, Mohammad Ghavamzadeh, Shie Mannor
ICLR 2020 Prediction, Consistency, Curvature: Representation Learning for Locally-Linear Control Nir Levine, Yinlam Chow, Rui Shu, Ang Li, Mohammad Ghavamzadeh, Hung Bui
ICML 2020 Predictive Coding for Locally-Linear Control Rui Shu, Tung Nguyen, Yinlam Chow, Tuan Pham, Khoat Than, Mohammad Ghavamzadeh, Stefano Ermon, Hung Bui
AISTATS 2020 Randomized Exploration in Generalized Linear Bandits Branislav Kveton, Manzil Zaheer, Csaba Szepesvari, Lihong Li, Mohammad Ghavamzadeh, Craig Boutilier
CoRL 2020 Safe Policy Learning for Continuous Control Yinlam Chow, Ofir Nachum, Aleksandra Faust, Edgar Dueñez-Guzman, Mohammad Ghavamzadeh
ICML 2019 Garbage in, Reward Out: Bootstrapping Exploration in Multi-Armed Bandits Branislav Kveton, Csaba Szepesvari, Sharan Vaswani, Zheng Wen, Tor Lattimore, Mohammad Ghavamzadeh
ICMLW 2019 Lyapunov-Based Safe Policy Optimization for Continuous Control Yinlam Chow, Ofir Nachum, Aleksandra Faust, Edgar Duenez-Guzman, Mohammad Ghavamzadeh
AISTATS 2019 Optimizing over a Restricted Policy Class in MDPs Ershad Banijamali, Yasin Abbasi-Yadkori, Mohammad Ghavamzadeh, Nikos Vlassis
UAI 2019 Perturbed-History Exploration in Stochastic Linear Bandits Branislav Kveton, Csaba Szepesvári, Mohammad Ghavamzadeh, Craig Boutilier
IJCAI 2019 Perturbed-History Exploration in Stochastic Multi-Armed Bandits Branislav Kveton, Csaba Szepesvári, Mohammad Ghavamzadeh, Craig Boutilier
AISTATS 2019 Risk-Sensitive Generative Adversarial Imitation Learning Jonathan Lacotte, Mohammad Ghavamzadeh, Yinlam Chow, Marco Pavone
NeurIPS 2019 Tight Regret Bounds for Model-Based Reinforcement Learning with Greedy Policies Yonathan Efroni, Nadav Merlis, Mohammad Ghavamzadeh, Shie Mannor
NeurIPS 2018 A Block Coordinate Ascent Algorithm for Mean-Variance Optimization Tengyang Xie, Bo Liu, Yangyang Xu, Mohammad Ghavamzadeh, Yinlam Chow, Daoming Lyu, Daesub Yoon
NeurIPS 2018 A Lyapunov-Based Approach to Safe Reinforcement Learning Yinlam Chow, Ofir Nachum, Edgar Duenez-Guzman, Mohammad Ghavamzadeh
ICML 2018 More Robust Doubly Robust Off-Policy Evaluation Mehrdad Farajtabar, Yinlam Chow, Mohammad Ghavamzadeh
ICML 2018 Path Consistency Learning in Tsallis Entropy Regularized MDPs Yinlam Chow, Ofir Nachum, Mohammad Ghavamzadeh
JAIR 2018 Proximal Gradient Temporal Difference Learning: Stable Reinforcement Learning with Polynomial Sample Complexity Bo Liu, Ian Gemp, Mohammad Ghavamzadeh, Ji Liu, Sridhar Mahadevan, Marek Petrik
AISTATS 2018 Robust Locally-Linear Controllable Embedding Ershad Banijamali, Rui Shu, Mohammad Ghavamzadeh, Hung Bui, Ali Ghodsi
ICML 2017 Active Learning for Accurate Estimation of Linear Models Carlos Riquelme, Mohammad Ghavamzadeh, Alessandro Lazaric
AAAI 2017 Automated Data Cleansing Through Meta-Learning Ian Gemp, Georgios Theocharous, Mohammad Ghavamzadeh
ICML 2017 Bottleneck Conditional Density Estimation Rui Shu, Hung H. Bui, Mohammad Ghavamzadeh
NeurIPS 2017 Conservative Contextual Linear Bandits Abbas Kazerouni, Mohammad Ghavamzadeh, Yasin Abbasi Yadkori, Benjamin Van Roy
ICML 2017 Model-Independent Online Learning for Influence Maximization Sharan Vaswani, Branislav Kveton, Zheng Wen, Mohammad Ghavamzadeh, Laks V. S. Lakshmanan, Mark Schmidt
ICML 2017 Online Learning to Rank in Stochastic Click Models Masrour Zoghi, Tomas Tunys, Mohammad Ghavamzadeh, Branislav Kveton, Csaba Szepesvari, Zheng Wen
AAAI 2017 Predictive Off-Policy Policy Evaluation for Nonstationary Decision Problems, with Applications to Digital Marketing Philip S. Thomas, Georgios Theocharous, Mohammad Ghavamzadeh, Ishan Durugkar, Emma Brunskill
AISTATS 2017 Sequential Multiple Hypothesis Testing with Type I Error Control Alan Malek, Sumeet Katariya, Yinlam Chow, Mohammad Ghavamzadeh
JMLR 2016 Analysis of Classification-Based Policy Iteration Algorithms Alessandro Lazaric, Mohammad Ghavamzadeh, Rémi Munos
JMLR 2016 Bayesian Policy Gradient and Actor-Critic Algorithms Mohammad Ghavamzadeh, Yaakov Engel, Michal Valko
ECML-PKDD 2016 Graphical Model Sketch Branislav Kveton, Hung Bui, Mohammad Ghavamzadeh, Georgios Theocharous, S. Muthukrishnan, Siqi Sun
AISTATS 2016 Improved Learning Complexity in Combinatorial Pure Exploration Bandits Victor Gabillon, Alessandro Lazaric, Mohammad Ghavamzadeh, Ronald Ortner, Peter L. Bartlett
IJCAI 2016 Proximal Gradient Temporal Difference Learning Algorithms Bo Liu, Ji Liu, Mohammad Ghavamzadeh, Sridhar Mahadevan, Marek Petrik
JMLR 2016 Regularized Policy Iteration with Nonparametric Function Spaces Amir-massoud Farahmand, Mohammad Ghavamzadeh, Csaba Szepesvári, Shie Mannor
NeurIPS 2016 Safe Policy Improvement by Minimizing Robust Baseline Regret Mohammad Ghavamzadeh, Marek Petrik, Yinlam Chow
MLJ 2016 Variance-Constrained Actor-Critic Algorithms for Discounted and Average Reward MDPs L. A. Prashanth, Mohammad Ghavamzadeh
JMLR 2015 Approximate Modified Policy Iteration and Its Application to the Game of Tetris Bruno Scherrer, Mohammad Ghavamzadeh, Victor Gabillon, Boris Lesner, Matthieu Geist
FnTML 2015 Bayesian Reinforcement Learning: A Survey Mohammad Ghavamzadeh, Shie Mannor, Joelle Pineau, Aviv Tamar
UAI 2015 Finite-Sample Analysis of Proximal Gradient TD Algorithms Bo Liu, Ji Liu, Mohammad Ghavamzadeh, Sridhar Mahadevan, Marek Petrik
ICML 2015 High Confidence Policy Improvement Philip Thomas, Georgios Theocharous, Mohammad Ghavamzadeh
AAAI 2015 High-Confidence Off-Policy Evaluation Philip S. Thomas, Georgios Theocharous, Mohammad Ghavamzadeh
IJCAI 2015 Maximum Entropy Semi-Supervised Inverse Reinforcement Learning Julien Audiffren, Michal Valko, Alessandro Lazaric, Mohammad Ghavamzadeh
IJCAI 2015 Personalized Ad Recommendation Systems for Life-Time Value Optimization with Guarantees Georgios Theocharous, Philip S. Thomas, Mohammad Ghavamzadeh
NeurIPS 2015 Policy Gradient for Coherent Risk Measures Aviv Tamar, Yinlam Chow, Mohammad Ghavamzadeh, Shie Mannor
NeurIPS 2014 Algorithms for CVaR Optimization in MDPs Yinlam Chow, Mohammad Ghavamzadeh
ICML 2013 A Generalized Kernel Approach to Structured Output Learning Hachem Kadri, Mohammad Ghavamzadeh, Philippe Preux
NeurIPS 2013 Actor-Critic Algorithms for Risk-Sensitive MDPs Prashanth L.A., Mohammad Ghavamzadeh
NeurIPS 2013 Approximate Dynamic Programming Finally Performs Well in the Game of Tetris Victor Gabillon, Mohammad Ghavamzadeh, Bruno Scherrer
ICML 2013 Cost-Sensitive Multiclass Classification Risk Bounds Bernardo Ávila Pires, Csaba Szepesvari, Mohammad Ghavamzadeh
ICML 2012 A Dantzig Selector Approach to Temporal Difference Learning Matthieu Geist, Bruno Scherrer, Alessandro Lazaric, Mohammad Ghavamzadeh
ICML 2012 Approximate Modified Policy Iteration Bruno Scherrer, Victor Gabillon, Mohammad Ghavamzadeh, Matthieu Geist
NeurIPS 2012 Best Arm Identification: A Unified Approach to Fixed Budget and Fixed Confidence Victor Gabillon, Mohammad Ghavamzadeh, Alessandro Lazaric
AAAI 2012 Conservative and Greedy Approaches to Classification-Based Policy Iteration Mohammad Ghavamzadeh, Alessandro Lazaric
JMLR 2012 Finite-Sample Analysis of Least-Squares Policy Iteration Alessandro Lazaric, Mohammad Ghavamzadeh, Rémi Munos
ICML 2011 Classification-Based Policy Iteration with a Critic Victor Gabillon, Alessandro Lazaric, Mohammad Ghavamzadeh, Bruno Scherrer
ICML 2011 Finite-Sample Analysis of Lasso-TD Mohammad Ghavamzadeh, Alessandro Lazaric, Rémi Munos, Matthew W. Hoffman
NeurIPS 2011 Multi-Bandit Best Arm Identification Victor Gabillon, Mohammad Ghavamzadeh, Alessandro Lazaric, Sébastien Bubeck
NeurIPS 2011 Speedy Q-Learning Mohammad Ghavamzadeh, Hilbert J. Kappen, Mohammad G. Azar, Rémi Munos
ALT 2011 Upper-Confidence-Bound Algorithms for Active Learning in Multi-Armed Bandits Alexandra Carpentier, Alessandro Lazaric, Mohammad Ghavamzadeh, Rémi Munos, Peter Auer
ICML 2010 Analysis of a Classification-Based Policy Iteration Algorithm Alessandro Lazaric, Mohammad Ghavamzadeh, Rémi Munos
ICML 2010 Bayesian Multi-Task Reinforcement Learning Alessandro Lazaric, Mohammad Ghavamzadeh
ACML 2010 Finite-Sample Analysis of Bellman Residual Minimization Odalric-Ambrym Maillard, Remi Munos, Alessandro Lazaric, Mohammad Ghavamzadeh
ICML 2010 Finite-Sample Analysis of LSTD Alessandro Lazaric, Mohammad Ghavamzadeh, Rémi Munos
NeurIPS 2010 LSTD with Random Projections Mohammad Ghavamzadeh, Alessandro Lazaric, Odalric Maillard, Rémi Munos
NeurIPS 2008 Regularized Policy Iteration Amir M. Farahmand, Mohammad Ghavamzadeh, Shie Mannor, Csaba Szepesvári
ICML 2007 Bayesian Actor-Critic Algorithms Mohammad Ghavamzadeh, Yaakov Engel
JMLR 2007 Hierarchical Average Reward Reinforcement Learning Mohammad Ghavamzadeh, Sridhar Mahadevan
NeurIPS 2007 Incremental Natural Actor-Critic Algorithms Shalabh Bhatnagar, Mohammad Ghavamzadeh, Mark Lee, Richard S. Sutton
NeurIPS 2006 Bayesian Policy Gradient Algorithms Mohammad Ghavamzadeh, Yaakov Engel
ICML 2003 Hierarchical Policy Gradient Algorithms Mohammad Ghavamzadeh, Sridhar Mahadevan
ICML 2002 Hierarchically Optimal Average Reward Reinforcement Learning Mohammad Ghavamzadeh, Sridhar Mahadevan
ICML 2001 Continuous-Time Hierarchical Reinforcement Learning Mohammad Ghavamzadeh, Sridhar Mahadevan