Tang, Yunhao

49 publications

AISTATS 2025 A Unifying Framework for Action-Conditional Self-Predictive Reinforcement Learning Khimya Khetarpal, Zhaohan Daniel Guo, Bernardo Avila Pires, Yunhao Tang, Clare Lyle, Mark Rowland, Nicolas Heess, Diana L Borsa, Arthur Guez, Will Dabney

NeurIPS 2025 Asymmetric REINFORCE for Off-Policy Reinforcement Learning: Balancing Positive and Negative Rewards Charles Arnal, Gaëtan Narozniak, Vivien Cabannes, Yunhao Tang, Julia Kempe, Remi Munos

NeurIPS 2025 Beyond Verifiable Rewards: Scaling Reinforcement Learning in Language Models to Unverifiable Data Yunhao Tang, Sid Wang, Lovish Madaan, Remi Munos

ICML 2025 Categorical Distributional Reinforcement Learning with Kullback-Leibler Divergence: Convergence and Asymptotics Tyler Kastner, Mark Rowland, Yunhao Tang, Murat A Erdogdu, Amir-Massoud Farahmand

ICML 2025 Optimizing Language Models for Inference Time Objectives Using Reinforcement Learning Yunhao Tang, Kunhao Zheng, Gabriel Synnaeve, Remi Munos

ICML 2024 A Distributional Analogue to the Successor Representation Harley Wiltzer, Jesse Farebrother, Arthur Gretton, Yunhao Tang, Andre Barreto, Will Dabney, Marc G Bellemare, Mark Rowland

NeurIPSW 2024 A Unifying Framework for Action-Conditional Self-Predictive Reinforcement Learning Khimya Khetarpal, Zhaohan Daniel Guo, Bernardo Avila Pires, Yunhao Tang, Clare Lyle, Mark Rowland, Nicolas Heess, Diana L Borsa, Arthur Guez, Will Dabney

JMLR 2024 An Analysis of Quantile Temporal-Difference Learning Mark Rowland, Rémi Munos, Mohammad Gheshlaghi Azar, Yunhao Tang, Georg Ostrovski, Anna Harutyunyan, Karl Tuyls, Marc G. Bellemare, Will Dabney

ICML 2024 Generalized Preference Optimization: A Unified Approach to Offline Alignment Yunhao Tang, Zhaohan Daniel Guo, Zeyu Zheng, Daniele Calandriello, Remi Munos, Mark Rowland, Pierre Harvey Richemond, Michal Valko, Bernardo Avila Pires, Bilal Piot

ICML 2024 Human Alignment of Large Language Models Through Online Preference Optimisation Daniele Calandriello, Zhaohan Daniel Guo, Remi Munos, Mark Rowland, Yunhao Tang, Bernardo Avila Pires, Pierre Harvey Richemond, Charline Le Lan, Michal Valko, Tianqi Liu, Rishabh Joshi, Zeyu Zheng, Bilal Piot

AAAI 2024 Learning Uncertainty-Aware Temporally-Extended Actions Joongkyu Lee, Seung Joon Park, Yunhao Tang, Min-hwan Oh

ICML 2024 Nash Learning from Human Feedback Remi Munos, Michal Valko, Daniele Calandriello, Mohammad Gheshlaghi Azar, Mark Rowland, Zhaohan Daniel Guo, Yunhao Tang, Matthieu Geist, Thomas Mesnard, Côme Fiegel, Andrea Michi, Marco Selvi, Sertan Girgin, Nikola Momchev, Olivier Bachem, Daniel J Mankowitz, Doina Precup, Bilal Piot

NeurIPS 2024 Near-Minimax-Optimal Distributional Reinforcement Learning with a Generative Model Mark Rowland, Li Kevin Wenliang, Rémi Munos, Clare Lyle, Yunhao Tang, Will Dabney

NeurIPS 2024 On Scalable Oversight with Weak LLMs Judging Strong LLMs Zachary Kenton, Noah Y. Siegel, János Kramár, Jonah Brown-Cohen, Samuel Albanie, Jannis Bulian, Rishabh Agarwal, David Lindner, Yunhao Tang, Noah D. Goodman, Rohin Shah

ICML 2023 DoMo-AC: Doubly Multi-Step Off-Policy Actor-Critic Algorithm Yunhao Tang, Tadashi Kozuno, Mark Rowland, Anna Harutyunyan, Remi Munos, Bernardo Avila Pires, Michal Valko

ICML 2023 Fast Rates for Maximum Entropy Exploration Daniil Tiapkin, Denis Belomestny, Daniele Calandriello, Eric Moulines, Remi Munos, Alexey Naumov, Pierre Perrault, Yunhao Tang, Michal Valko, Pierre Menard

ICML 2023 Quantile Credit Assignment Thomas Mesnard, Wenqi Chen, Alaa Saade, Yunhao Tang, Mark Rowland, Theophane Weber, Clare Lyle, Audrunas Gruslys, Michal Valko, Will Dabney, Georg Ostrovski, Eric Moulines, Remi Munos

ICML 2023 Regularization and Variance-Weighted Regression Achieves Minimax Optimality in Linear MDPs: Theory and Practice Toshinori Kitamura, Tadashi Kozuno, Yunhao Tang, Nino Vieillard, Michal Valko, Wenhao Yang, Jincheng Mei, Pierre Menard, Mohammad Gheshlaghi Azar, Remi Munos, Olivier Pietquin, Matthieu Geist, Csaba Szepesvari, Wataru Kumagai, Yutaka Matsuo

ICML 2023 Representations and Exploration for Deep Reinforcement Learning Using Singular Value Decomposition Yash Chandak, Shantanu Thakoor, Zhaohan Daniel Guo, Yunhao Tang, Remi Munos, Will Dabney, Diana L Borsa

ICML 2023 The Edge of Orthogonality: A Simple View of What Makes BYOL Tick Pierre Harvey Richemond, Allison Tam, Yunhao Tang, Florian Strub, Bilal Piot, Felix Hill

ICML 2023 The Statistical Benefits of Quantile Temporal-Difference Learning for Value Estimation Mark Rowland, Yunhao Tang, Clare Lyle, Remi Munos, Marc G Bellemare, Will Dabney

ICML 2023 Towards a Better Understanding of Representation Dynamics Under TD-Learning Yunhao Tang, Remi Munos

NeurIPSW 2023 Uncertainty-Aware Action Repeating Options Joongkyu Lee, Seung Joon Park, Yunhao Tang, Min-hwan Oh

ICML 2023 Understanding Self-Predictive Learning for Reinforcement Learning Yunhao Tang, Zhaohan Daniel Guo, Pierre Harvey Richemond, Bernardo Avila Pires, Yash Chandak, Remi Munos, Mark Rowland, Mohammad Gheshlaghi Azar, Charline Le Lan, Clare Lyle, András György, Shantanu Thakoor, Will Dabney, Bilal Piot, Daniele Calandriello, Michal Valko

ICML 2023 VA-Learning as a More Efficient Alternative to Q-Learning Yunhao Tang, Remi Munos, Mark Rowland, Michal Valko

AISTATS 2022 Marginalized Operators for Off-Policy Reinforcement Learning Yunhao Tang, Mark Rowland, Remi Munos, Michal Valko

NeurIPS 2022 BYOL-Explore: Exploration by Bootstrapped Prediction Zhaohan Guo, Shantanu Thakoor, Miruna Pislar, Bernardo Avila Pires, Florent Altché, Corentin Tallec, Alaa Saade, Daniele Calandriello, Jean-Bastien Grill, Yunhao Tang, Michal Valko, Remi Munos, Mohammad Gheshlaghi Azar, Bilal Piot

ICML 2022 Biased Gradient Estimate with Drastic Variance Reduction for Meta Reinforcement Learning Yunhao Tang

ICML 2022 From Dirichlet to Rubin: Optimistic Exploration in RL Without Bonuses Daniil Tiapkin, Denis Belomestny, Eric Moulines, Alexey Naumov, Sergey Samsonov, Yunhao Tang, Michal Valko, Pierre Menard

NeurIPS 2022 The Nature of Temporal Difference Errors in Multi-Step Distributional Reinforcement Learning Yunhao Tang, Remi Munos, Mark Rowland, Bernardo Avila Pires, Will Dabney, Marc Bellemare

AISTATS 2021 Hindsight Expectation Maximization for Goal-Conditioned Reinforcement Learning Yunhao Tang, Alp Kucukelbir

ICML 2021 Revisiting Peng’s Q($λ$) for Modern Reinforcement Learning Tadashi Kozuno, Yunhao Tang, Mark Rowland, Remi Munos, Steven Kapturowski, Will Dabney, Michal Valko, David Abel

ICML 2021 Taylor Expansion of Discount Factors Yunhao Tang, Mark Rowland, Remi Munos, Michal Valko

NeurIPS 2021 Unifying Gradient Estimators for Meta-Reinforcement Learning via Off-Policy Evaluation Yunhao Tang, Tadashi Kozuno, Mark Rowland, Remi Munos, Michal Valko

AISTATS 2020 Discrete Action On-Policy Learning with Action-Value Critic Yuguang Yue, Yunhao Tang, Mingzhang Yin, Mingyuan Zhou

AAAI 2020 Discretizing Continuous Action Space for On-Policy Optimization Yunhao Tang, Shipra Agrawal

ICLR 2020 ES-MAML: Simple Hessian-Free Meta Learning Xingyou Song, Wenbo Gao, Yuxiang Yang, Krzysztof Choromanski, Aldo Pacchiano, Yunhao Tang

ICML 2020 Learning to Score Behaviors for Guided Policy Optimization Aldo Pacchiano, Jack Parker-Holder, Yunhao Tang, Krzysztof Choromanski, Anna Choromanska, Michael Jordan

ICML 2020 Monte-Carlo Tree Search as Regularized Policy Optimization Jean-Bastien Grill, Florent Altché, Yunhao Tang, Thomas Hubert, Michal Valko, Ioannis Antonoglou, Remi Munos

AISTATS 2020 Practical Nonisotropic Monte Carlo Sampling in High Dimensions via Determinantal Point Processes Krzysztof Choromanski, Aldo Pacchiano, Jack Parker-Holder, Yunhao Tang

ICML 2020 Reinforcement Learning for Integer Programming: Learning to Cut Yunhao Tang, Shipra Agrawal, Yuri Faenza

NeurIPS 2020 Self-Imitation Learning via Generalized Lower Bound Q-Learning Yunhao Tang

ICML 2020 Taylor Expansion Policy Optimization Yunhao Tang, Michal Valko, Remi Munos

AISTATS 2020 Variance Reduction for Evolution Strategies via Structured Control Variates Yunhao Tang, Krzysztof Choromanski, Alp Kucukelbir

NeurIPS 2019 From Complexity to Simplicity: Adaptive ES-Active Subspaces for Blackbox Optimization Krzysztof M Choromanski, Aldo Pacchiano, Jack Parker-Holder, Yunhao Tang, Vikas Sindhwani

AISTATS 2019 KAMA-NNs: Low-Dimensional Rotation Based Neural Networks Krzysztof Choromanski, Aldo Pacchiano, Jeffrey Pennington, Yunhao Tang

AISTATS 2019 Orthogonal Estimation of Wasserstein Distances Mark Rowland, Jiri Hron, Yunhao Tang, Krzysztof Choromanski, Tamas Sarlos, Adrian Weller

CoRL 2019 Provably Robust Blackbox Optimization for Reinforcement Learning Krzysztof Choromanski, Aldo Pacchiano, Jack Parker-Holder, Yunhao Tang, Deepali Jain, Yuxiang Yang, Atil Iscen, Jasmine Hsu, Vikas Sindhwani

IJCAI 2018 Exploration by Distributional Reinforcement Learning Yunhao Tang, Shipra Agrawal