Zhou, Runlong

12 publications

NeurIPS 2025 Sharp Gap-Dependent Variance-Aware Regret Bounds for Tabular MDPs Shulun Chen, Runlong Zhou, Zihan Zhang, Maryam Fazel, Simon Shaolei Du

ICLR 2025 The Crucial Role of Samplers in Online Direct Preference Optimization Ruizhe Shi, Runlong Zhou, Simon Shaolei Du

ICLR 2024 Free from Bellman Completeness: Trajectory Stitching via Model-Based Return-Conditioned Supervised Learning Zhaoyi Zhou, Chuning Zhu, Runlong Zhou, Qiwen Cui, Abhishek Gupta, Simon Shaolei Du

NeurIPSW 2024 The Crucial Role of Samplers in Online Direct Preference Optimization Ruizhe Shi, Runlong Zhou, Simon Shaolei Du

NeurIPSW 2024 The Crucial Role of Samplers in Online Direct Preference Optimization Ruizhe Shi, Runlong Zhou, Simon Shaolei Du

NeurIPSW 2024 Transformers Are Efficient Compilers, Provably Xiyu Zhai, Runlong Zhou, Liao Zhang, Simon Shaolei Du

NeurIPSW 2023 Free from Bellman Completeness: Trajectory Stitching via Model-Based Return-Conditioned Supervised Learning Zhaoyi Zhou, Chuning Zhu, Runlong Zhou, Qiwen Cui, Abhishek Gupta, Simon Shaolei Du

ICML 2023 Horizon-Free and Variance-Dependent Reinforcement Learning for Latent Markov Decision Processes Runlong Zhou, Ruosong Wang, Simon Shaolei Du

ICML 2023 Sharp Variance-Dependent Bounds in Reinforcement Learning: Best of Both Worlds in Stochastic and Deterministic Environments Runlong Zhou, Zhang Zihan, Simon Shaolei Du

TMLR 2023 Understanding Curriculum Learning in Policy Optimization for Online Combinatorial Optimization Runlong Zhou, Zelin He, Yuandong Tian, Yi Wu, Simon Shaolei Du

NeurIPSW 2022 Understanding Curriculum Learning in Policy Optimization for Online Combinatorial Optimization Runlong Zhou, Yuandong Tian, Yi Wu, Simon Shaolei Du

NeurIPS 2021 Stochastic Shortest Path: Minimax, Parameter-Free and Towards Horizon-Free Regret Jean Tarbouriech, Runlong Zhou, Simon S Du, Matteo Pirotta, Michal Valko, Alessandro Lazaric