Tang, Yunhao

49 publications

AISTATS 2025 A Unifying Framework for Action-Conditional Self-Predictive Reinforcement Learning Khimya Khetarpal, Zhaohan Daniel Guo, Bernardo Avila Pires, Yunhao Tang, Clare Lyle, Mark Rowland, Nicolas Heess, Diana L Borsa, Arthur Guez, Will Dabney
NeurIPS 2025 Asymmetric REINFORCE for Off-Policy Reinforcement Learning: Balancing Positive and Negative Rewards Charles Arnal, Gaëtan Narozniak, Vivien Cabannes, Yunhao Tang, Julia Kempe, Remi Munos
NeurIPS 2025 Beyond Verifiable Rewards: Scaling Reinforcement Learning in Language Models to Unverifiable Data Yunhao Tang, Sid Wang, Lovish Madaan, Remi Munos
ICML 2025 Categorical Distributional Reinforcement Learning with Kullback-Leibler Divergence: Convergence and Asymptotics Tyler Kastner, Mark Rowland, Yunhao Tang, Murat A Erdogdu, Amir-Massoud Farahmand
ICML 2025 Optimizing Language Models for Inference Time Objectives Using Reinforcement Learning Yunhao Tang, Kunhao Zheng, Gabriel Synnaeve, Remi Munos
ICML 2024 A Distributional Analogue to the Successor Representation Harley Wiltzer, Jesse Farebrother, Arthur Gretton, Yunhao Tang, Andre Barreto, Will Dabney, Marc G Bellemare, Mark Rowland
NeurIPSW 2024 A Unifying Framework for Action-Conditional Self-Predictive Reinforcement Learning Khimya Khetarpal, Zhaohan Daniel Guo, Bernardo Avila Pires, Yunhao Tang, Clare Lyle, Mark Rowland, Nicolas Heess, Diana L Borsa, Arthur Guez, Will Dabney
JMLR 2024 An Analysis of Quantile Temporal-Difference Learning Mark Rowland, Rémi Munos, Mohammad Gheshlaghi Azar, Yunhao Tang, Georg Ostrovski, Anna Harutyunyan, Karl Tuyls, Marc G. Bellemare, Will Dabney
ICML 2024 Generalized Preference Optimization: A Unified Approach to Offline Alignment Yunhao Tang, Zhaohan Daniel Guo, Zeyu Zheng, Daniele Calandriello, Remi Munos, Mark Rowland, Pierre Harvey Richemond, Michal Valko, Bernardo Avila Pires, Bilal Piot
ICML 2024 Human Alignment of Large Language Models Through Online Preference Optimisation Daniele Calandriello, Zhaohan Daniel Guo, Remi Munos, Mark Rowland, Yunhao Tang, Bernardo Avila Pires, Pierre Harvey Richemond, Charline Le Lan, Michal Valko, Tianqi Liu, Rishabh Joshi, Zeyu Zheng, Bilal Piot
AAAI 2024 Learning Uncertainty-Aware Temporally-Extended Actions Joongkyu Lee, Seung Joon Park, Yunhao Tang, Min-hwan Oh
ICML 2024 Nash Learning from Human Feedback Remi Munos, Michal Valko, Daniele Calandriello, Mohammad Gheshlaghi Azar, Mark Rowland, Zhaohan Daniel Guo, Yunhao Tang, Matthieu Geist, Thomas Mesnard, Côme Fiegel, Andrea Michi, Marco Selvi, Sertan Girgin, Nikola Momchev, Olivier Bachem, Daniel J Mankowitz, Doina Precup, Bilal Piot
NeurIPS 2024 Near-Minimax-Optimal Distributional Reinforcement Learning with a Generative Model Mark Rowland, Li Kevin Wenliang, Rémi Munos, Clare Lyle, Yunhao Tang, Will Dabney
NeurIPS 2024 On Scalable Oversight with Weak LLMs Judging Strong LLMs Zachary Kenton, Noah Y. Siegel, János Kramár, Jonah Brown-Cohen, Samuel Albanie, Jannis Bulian, Rishabh Agarwal, David Lindner, Yunhao Tang, Noah D. Goodman, Rohin Shah
ICML 2023 DoMo-AC: Doubly Multi-Step Off-Policy Actor-Critic Algorithm Yunhao Tang, Tadashi Kozuno, Mark Rowland, Anna Harutyunyan, Remi Munos, Bernardo Avila Pires, Michal Valko
ICML 2023 Fast Rates for Maximum Entropy Exploration Daniil Tiapkin, Denis Belomestny, Daniele Calandriello, Eric Moulines, Remi Munos, Alexey Naumov, Pierre Perrault, Yunhao Tang, Michal Valko, Pierre Menard
ICML 2023 Quantile Credit Assignment Thomas Mesnard, Wenqi Chen, Alaa Saade, Yunhao Tang, Mark Rowland, Theophane Weber, Clare Lyle, Audrunas Gruslys, Michal Valko, Will Dabney, Georg Ostrovski, Eric Moulines, Remi Munos
ICML 2023 Regularization and Variance-Weighted Regression Achieves Minimax Optimality in Linear MDPs: Theory and Practice Toshinori Kitamura, Tadashi Kozuno, Yunhao Tang, Nino Vieillard, Michal Valko, Wenhao Yang, Jincheng Mei, Pierre Menard, Mohammad Gheshlaghi Azar, Remi Munos, Olivier Pietquin, Matthieu Geist, Csaba Szepesvari, Wataru Kumagai, Yutaka Matsuo
ICML 2023 Representations and Exploration for Deep Reinforcement Learning Using Singular Value Decomposition Yash Chandak, Shantanu Thakoor, Zhaohan Daniel Guo, Yunhao Tang, Remi Munos, Will Dabney, Diana L Borsa
ICML 2023 The Edge of Orthogonality: A Simple View of What Makes BYOL Tick Pierre Harvey Richemond, Allison Tam, Yunhao Tang, Florian Strub, Bilal Piot, Felix Hill
ICML 2023 The Statistical Benefits of Quantile Temporal-Difference Learning for Value Estimation Mark Rowland, Yunhao Tang, Clare Lyle, Remi Munos, Marc G Bellemare, Will Dabney
ICML 2023 Towards a Better Understanding of Representation Dynamics Under TD-Learning Yunhao Tang, Remi Munos
NeurIPSW 2023 Uncertainty-Aware Action Repeating Options Joongkyu Lee, Seung Joon Park, Yunhao Tang, Min-hwan Oh
ICML 2023 Understanding Self-Predictive Learning for Reinforcement Learning Yunhao Tang, Zhaohan Daniel Guo, Pierre Harvey Richemond, Bernardo Avila Pires, Yash Chandak, Remi Munos, Mark Rowland, Mohammad Gheshlaghi Azar, Charline Le Lan, Clare Lyle, András György, Shantanu Thakoor, Will Dabney, Bilal Piot, Daniele Calandriello, Michal Valko
ICML 2023 VA-Learning as a More Efficient Alternative to Q-Learning Yunhao Tang, Remi Munos, Mark Rowland, Michal Valko
AISTATS 2022 Marginalized Operators for Off-Policy Reinforcement Learning Yunhao Tang, Mark Rowland, Remi Munos, Michal Valko
NeurIPS 2022 BYOL-Explore: Exploration by Bootstrapped Prediction Zhaohan Guo, Shantanu Thakoor, Miruna Pislar, Bernardo Avila Pires, Florent Altché, Corentin Tallec, Alaa Saade, Daniele Calandriello, Jean-Bastien Grill, Yunhao Tang, Michal Valko, Remi Munos, Mohammad Gheshlaghi Azar, Bilal Piot
ICML 2022 Biased Gradient Estimate with Drastic Variance Reduction for Meta Reinforcement Learning Yunhao Tang
ICML 2022 From Dirichlet to Rubin: Optimistic Exploration in RL Without Bonuses Daniil Tiapkin, Denis Belomestny, Eric Moulines, Alexey Naumov, Sergey Samsonov, Yunhao Tang, Michal Valko, Pierre Menard
NeurIPS 2022 The Nature of Temporal Difference Errors in Multi-Step Distributional Reinforcement Learning Yunhao Tang, Remi Munos, Mark Rowland, Bernardo Avila Pires, Will Dabney, Marc Bellemare
AISTATS 2021 Hindsight Expectation Maximization for Goal-Conditioned Reinforcement Learning Yunhao Tang, Alp Kucukelbir
ICML 2021 Revisiting Peng’s Q($λ$) for Modern Reinforcement Learning Tadashi Kozuno, Yunhao Tang, Mark Rowland, Remi Munos, Steven Kapturowski, Will Dabney, Michal Valko, David Abel
ICML 2021 Taylor Expansion of Discount Factors Yunhao Tang, Mark Rowland, Remi Munos, Michal Valko
NeurIPS 2021 Unifying Gradient Estimators for Meta-Reinforcement Learning via Off-Policy Evaluation Yunhao Tang, Tadashi Kozuno, Mark Rowland, Remi Munos, Michal Valko
AISTATS 2020 Discrete Action On-Policy Learning with Action-Value Critic Yuguang Yue, Yunhao Tang, Mingzhang Yin, Mingyuan Zhou
AAAI 2020 Discretizing Continuous Action Space for On-Policy Optimization Yunhao Tang, Shipra Agrawal
ICLR 2020 ES-MAML: Simple Hessian-Free Meta Learning Xingyou Song, Wenbo Gao, Yuxiang Yang, Krzysztof Choromanski, Aldo Pacchiano, Yunhao Tang
ICML 2020 Learning to Score Behaviors for Guided Policy Optimization Aldo Pacchiano, Jack Parker-Holder, Yunhao Tang, Krzysztof Choromanski, Anna Choromanska, Michael Jordan
ICML 2020 Monte-Carlo Tree Search as Regularized Policy Optimization Jean-Bastien Grill, Florent Altché, Yunhao Tang, Thomas Hubert, Michal Valko, Ioannis Antonoglou, Remi Munos
AISTATS 2020 Practical Nonisotropic Monte Carlo Sampling in High Dimensions via Determinantal Point Processes Krzysztof Choromanski, Aldo Pacchiano, Jack Parker-Holder, Yunhao Tang
ICML 2020 Reinforcement Learning for Integer Programming: Learning to Cut Yunhao Tang, Shipra Agrawal, Yuri Faenza
NeurIPS 2020 Self-Imitation Learning via Generalized Lower Bound Q-Learning Yunhao Tang
ICML 2020 Taylor Expansion Policy Optimization Yunhao Tang, Michal Valko, Remi Munos
AISTATS 2020 Variance Reduction for Evolution Strategies via Structured Control Variates Yunhao Tang, Krzysztof Choromanski, Alp Kucukelbir
NeurIPS 2019 From Complexity to Simplicity: Adaptive ES-Active Subspaces for Blackbox Optimization Krzysztof M Choromanski, Aldo Pacchiano, Jack Parker-Holder, Yunhao Tang, Vikas Sindhwani
AISTATS 2019 KAMA-NNs: Low-Dimensional Rotation Based Neural Networks Krzysztof Choromanski, Aldo Pacchiano, Jeffrey Pennington, Yunhao Tang
AISTATS 2019 Orthogonal Estimation of Wasserstein Distances Mark Rowland, Jiri Hron, Yunhao Tang, Krzysztof Choromanski, Tamas Sarlos, Adrian Weller
CoRL 2019 Provably Robust Blackbox Optimization for Reinforcement Learning Krzysztof Choromanski, Aldo Pacchiano, Jack Parker-Holder, Yunhao Tang, Deepali Jain, Yuxiang Yang, Atil Iscen, Jasmine Hsu, Vikas Sindhwani
IJCAI 2018 Exploration by Distributional Reinforcement Learning Yunhao Tang, Shipra Agrawal