Dong, Hanze

25 publications

ICLR 2025 Automatic Curriculum Expert Iteration for Reliable LLM Reasoning Zirui Zhao, Hanze Dong, Amrita Saha, Caiming Xiong, Doyen Sahoo
ICLRW 2025 BOLT: Bootstrap Long Chain-of-Thought in Language Models Without Distillation Bo Pang, Hanze Dong, Jiacheng Xu, Silvio Savarese, Yingbo Zhou, Caiming Xiong
TMLR 2025 Entropy-Regularized Process Reward Model Hanning Zhang, Pengcheng Wang, Shizhe Diao, Yong Lin, Rui Pan, Hanze Dong, Dylan Zhang, Pavlo Molchanov, Tong Zhang
ICLRW 2025 Offline Reinforcement Learning for LLM Multi-Step Reasoning Huaijie Wang, Shibo Hao, Hanze Dong, Shenao Zhang, Yilin Bao, Ziran Yang, Yi Wu
NeurIPS 2025 Optimizing Chain-of-Thought Reasoners via Gradient Variance Minimization in Rejection Sampling and RL Jiarui Yao, Yifan Hao, Hanning Zhang, Hanze Dong, Wei Xiong, Nan Jiang, Tong Zhang
ICML 2025 Reward-Guided Speculative Decoding for Efficient LLM Reasoning Baohao Liao, Yuhui Xu, Hanze Dong, Junnan Li, Christof Monz, Silvio Savarese, Doyen Sahoo, Caiming Xiong
ICLR 2025 ThinK: Thinner Key Cache by Query-Driven Pruning Yuhui Xu, Zhanming Jie, Hanze Dong, Lei Wang, Xudong Lu, Aojun Zhou, Amrita Saha, Caiming Xiong, Doyen Sahoo
COLT 2024 Faster Sampling Without Isoperimetry via Diffusion-Based Monte Carlo Xunpeng Huang, Difan Zou, Hanze Dong, Yi-An Ma, Tong Zhang
ICML 2024 Faster Sampling via Stochastic Gradient Proximal Sampler Xunpeng Huang, Difan Zou, Hanze Dong, Yian Ma, Tong Zhang
ICML 2024 Iterative Preference Learning from Human Feedback: Bridging Theory and Practice for RLHF Under KL-Constraint Wei Xiong, Hanze Dong, Chenlu Ye, Ziqi Wang, Han Zhong, Heng Ji, Nan Jiang, Tong Zhang
NeurIPS 2024 Online Iterative Reinforcement Learning from Human Feedback with General Preference Model Chenlu Ye, Wei Xiong, Yuheng Zhang, Hanze Dong, Nan Jiang, Tong Zhang
JMLR 2024 PAPAL: A Provable PArticle-Based Primal-Dual ALgorithm for Mixed Nash Equilibrium Shihong Ding, Hanze Dong, Cong Fang, Zhouchen Lin, Tong Zhang
TMLR 2024 RLHF Workflow: From Reward Modeling to Online RLHF Hanze Dong, Wei Xiong, Bo Pang, Haoxiang Wang, Han Zhao, Yingbo Zhou, Nan Jiang, Doyen Sahoo, Caiming Xiong, Tong Zhang
ICLR 2024 Reverse Diffusion Monte Carlo Xunpeng Huang, Hanze Dong, Yifan Hao, Yian Ma, Tong Zhang
NeurIPS 2024 Reverse Transition Kernel: A Flexible Framework to Accelerate Diffusion Inference Xunpeng Huang, Difan Zou, Hanze Dong, Yi Zhang, Yian Ma, Tong Zhang
ICMLW 2024 Reverse Transition Kernel: A Flexible Framework to Accelerate Diffusion Inference Xunpeng Huang, Difan Zou, Hanze Dong, Yi Zhang, Yian Ma, Tong Zhang
ICLR 2024 Spurious Feature Diversification Improves Out-of-Distribution Generalization Lin Yong, Lu Tan, Yifan Hao, Ho Nam Wong, Hanze Dong, Weizhong Zhang, Yujiu Yang, Tong Zhang
AISTATS 2023 Catalyst Acceleration of Error Compensated Methods Leads to Better Communication Complexity Xun Qian, Hanze Dong, Tong Zhang, Peter Richtarik
ICLR 2023 Particle-Based Variational Inference with Preconditioned Functional Gradient Flow Hanze Dong, Xi Wang, Lin Yong, Tong Zhang
TMLR 2023 RAFT: Reward rAnked FineTuning for Generative Foundation Model Alignment Hanze Dong, Wei Xiong, Deepanshu Goyal, Yihan Zhang, Winnie Chow, Rui Pan, Shizhe Diao, Jipeng Zhang, KaShun Shum, Tong Zhang
CVPR 2022 Bayesian Invariant Risk Minimization Yong Lin, Hanze Dong, Hao Wang, Tong Zhang
NeurIPSW 2022 How Powerful Is Implicit Denoising in Graph Neural Networks Songtao Liu, Zhitao Ying, Hanze Dong, Lu Lin, Jinghui Chen, Dinghao Wu
ICML 2022 Local Augmentation for Graph Neural Networks Songtao Liu, Rex Ying, Hanze Dong, Lanqing Li, Tingyang Xu, Yu Rong, Peilin Zhao, Junzhou Huang, Dinghao Wu
NeurIPSW 2022 Particle-Based Variational Inference with Preconditioned Functional Gradient Flow Hanze Dong, Xi Wang, Lin Yong, Tong Zhang
JMLR 2022 Weakly Supervised Disentangled Generative Causal Representation Learning Xinwei Shen, Furui Liu, Hanze Dong, Qing Lian, Zhitang Chen, Tong Zhang