Uehara, Masatoshi

41 publications

ICLR 2025 Adding Conditional Control to Diffusion Models with Reinforcement Learning Yulai Zhao, Masatoshi Uehara, Gabriele Scalia, Sunyuan Kung, Tommaso Biancalani, Sergey Levine, Ehsan Hajiramezanali
NeurIPS 2025 Derivative-Free Guidance in Continuous and Discrete Diffusion Models with Soft Value-Based Decoding Xiner Li, Yulai Zhao, Chenyu Wang, Gabriele Scalia, Gökcen Eraslan, Surag Nair, Tommaso Biancalani, Shuiwang Ji, Aviv Regev, Sergey Levine, Masatoshi Uehara
ICLR 2025 Fine-Tuning Discrete Diffusion Models via Reward Optimization with Applications to DNA and Protein Design Chenyu Wang, Masatoshi Uehara, Yichun He, Amy Wang, Avantika Lal, Tommi Jaakkola, Sergey Levine, Aviv Regev, Hanchen, Tommaso Biancalani
ICML 2025 Reward-Guided Iterative Refinement in Diffusion Models at Test-Time with Applications to Protein and DNA Design Masatoshi Uehara, Xingyu Su, Yulai Zhao, Xiner Li, Aviv Regev, Shuiwang Ji, Sergey Levine, Tommaso Biancalani
NeurIPS 2024 Bridging Model-Based Optimization and Generative Modeling via Conservative Fine-Tuning of Diffusion Models Masatoshi Uehara, Yulai Zhao, Ehsan Hajiramezanali, Gabriele Scalia, Gokcen Eraslan, Avantika Lal, Sergey Levine, Tommaso Biancalani
NeurIPSW 2024 Derivative-Free Guidance in Continuous and Discrete Diffusion Models with Soft Value-Based Decoding Xiner Li, Yulai Zhao, Chenyu Wang, Gabriele Scalia, Gökcen Eraslan, Surag Nair, Tommaso Biancalani, Shuiwang Ji, Aviv Regev, Sergey Levine, Masatoshi Uehara
ICML 2024 Feedback Efficient Online Fine-Tuning of Diffusion Models Masatoshi Uehara, Yulai Zhao, Kevin Black, Ehsan Hajiramezanali, Gabriele Scalia, Nathaniel Lee Diamant, Alex M Tseng, Sergey Levine, Tommaso Biancalani
NeurIPSW 2024 Fine-Tuning Discrete Diffusion Models via Reward Optimization with Applications to DNA and Protein Design Chenyu Wang, Masatoshi Uehara, Yichun He, Amy Wang, Tommaso Biancalani, Avantika Lal, Tommi Jaakkola, Sergey Levine, Hanchen, Aviv Regev
AISTATS 2024 Functional Graphical Models: Structure Enables Offline Data-Driven Optimization Kuba Grudzien, Masatoshi Uehara, Sergey Levine, Pieter Abbeel
JMLR 2024 Localized Debiased Machine Learning: Efficient Inference on Quantile Treatment Effects and Beyond Nathan Kallus, Xiaojie Mao, Masatoshi Uehara
ICLR 2024 Provable Offline Preference-Based Reinforcement Learning Wenhao Zhan, Masatoshi Uehara, Nathan Kallus, Jason D. Lee, Wen Sun
ICLR 2024 Provable Reward-Agnostic Preference-Based Reinforcement Learning Wenhao Zhan, Masatoshi Uehara, Wen Sun, Jason D. Lee
ICML 2023 Computationally Efficient PAC RL in POMDPs with Latent Determinism and Conditional Embeddings Masatoshi Uehara, Ayush Sekhari, Jason D. Lee, Nathan Kallus, Wen Sun
ICML 2023 Distributional Offline Policy Evaluation with Predictive Error Guarantees Runzhe Wu, Masatoshi Uehara, Wen Sun
NeurIPS 2023 Future-Dependent Value-Based Off-Policy Evaluation in POMDPs Masatoshi Uehara, Haruka Kiyohara, Andrew Bennett, Victor Chernozhukov, Nan Jiang, Nathan Kallus, Chengchun Shi, Wen Sun
ICMLW 2023 How to Query Human Feedback Efficiently in RL? Wenhao Zhan, Masatoshi Uehara, Wen Sun, Jason D. Lee
ICMLW 2023 How to Query Human Feedback Efficiently in RL? Wenhao Zhan, Masatoshi Uehara, Wen Sun, Jason D. Lee
COLT 2023 Inference on Strongly Identified Functionals of Weakly Identified Functions Andrew Bennett, Nathan Kallus, Xiaojie Mao, Whitney Newey, Vasilis Syrgkanis, Masatoshi Uehara
COLT 2023 Minimax Instrumental Variable Regression and $l_2$ Convergence Guarantees Without Identification or Closedness Andrew Bennett, Nathan Kallus, Xiaojie Mao, Whitney Newey, Vasilis Syrgkanis, Masatoshi Uehara
NeurIPS 2023 Offline Minimax Soft-Q-Learning Under Realizability and Partial Coverage Masatoshi Uehara, Nathan Kallus, Jason Lee, Wen Sun
ICLR 2023 PAC Reinforcement Learning for Predictive State Representations Wenhao Zhan, Masatoshi Uehara, Wen Sun, Jason D. Lee
ICMLW 2023 Provable Offline Reinforcement Learning with Human Feedback Wenhao Zhan, Masatoshi Uehara, Nathan Kallus, Jason D. Lee, Wen Sun
ICMLW 2023 Provable Offline Reinforcement Learning with Human Feedback Wenhao Zhan, Masatoshi Uehara, Nathan Kallus, Jason D. Lee, Wen Sun
ICML 2022 A Minimax Learning Approach to Off-Policy Evaluation in Confounded Partially Observable Markov Decision Processes Chengchun Shi, Masatoshi Uehara, Jiawei Huang, Nan Jiang
ICML 2022 Efficient Reinforcement Learning in Block MDPs: A Model-Free Representation Learning Approach Xuezhou Zhang, Yuda Song, Masatoshi Uehara, Mengdi Wang, Alekh Agarwal, Wen Sun
ICLR 2022 Pessimistic Model-Based Offline Reinforcement Learning Under Partial Coverage Masatoshi Uehara, Wen Sun
NeurIPS 2022 Provably Efficient Reinforcement Learning in Partially Observable Dynamical Systems Masatoshi Uehara, Ayush Sekhari, Jason Lee, Nathan Kallus, Wen Sun
ICLR 2022 Representation Learning for Online and Offline RL in Low-Rank MDPs Masatoshi Uehara, Xuezhou Zhang, Wen Sun
COLT 2021 Fast Rates for the Regret of Offline Reinforcement Learning Yichun Hu, Nathan Kallus, Masatoshi Uehara
JMLR 2021 Information Criteria for Non-Normalized Models Takeru Matsuda, Masatoshi Uehara, Aapo Hyvarinen
NeurIPS 2021 Mitigating Covariate Shift in Imitation Learning via Offline Data with Partial Coverage Jonathan Chang, Masatoshi Uehara, Dhruv Sreenivas, Rahul Kidambi, Wen Sun
ICML 2021 Optimal Off-Policy Evaluation from Multiple Logging Policies Nathan Kallus, Yuta Saito, Masatoshi Uehara
AISTATS 2020 A Unified Statistically Efficient Estimation Framework for Unnormalized Models Masatoshi Uehara, Takafumi Kanamori, Takashi Takenouchi, Takeru Matsuda
JMLR 2020 Double Reinforcement Learning for Efficient Off-Policy Evaluation in Markov Decision Processes Nathan Kallus, Masatoshi Uehara
ICML 2020 Double Reinforcement Learning for Efficient and Robust Off-Policy Evaluation Nathan Kallus, Masatoshi Uehara
NeurIPS 2020 Doubly Robust Off-Policy Value and Gradient Estimation for Deterministic Policies Nathan Kallus, Masatoshi Uehara
AISTATS 2020 Imputation Estimators for Unnormalized Models with Missing Data Masatoshi Uehara, Takeru Matsuda, Jae Kwang Kim
ICML 2020 Minimax Weight and Q-Function Learning for Off-Policy Evaluation Masatoshi Uehara, Jiawei Huang, Nan Jiang
NeurIPS 2020 Off-Policy Evaluation and Learning for External Validity Under a Covariate Shift Masatoshi Uehara, Masahiro Kato, Shota Yasui
ICML 2020 Statistically Efficient Off-Policy Policy Gradients Nathan Kallus, Masatoshi Uehara
NeurIPS 2019 Intrinsically Efficient, Stable, and Bounded Off-Policy Evaluation for Reinforcement Learning Nathan Kallus, Masatoshi Uehara