Zhang, Qichao
6 publications
AAAI
2025
In-Dataset Trajectory Return Regularization for Offline Preference-Based Reinforcement Learning
NeurIPS
2025
Learning When to Think: Shaping Adaptive Reasoning in R1-Style Models via Multi-Stage RL
CoRL
2025
ReasonPlan: Unified Scene Prediction and Decision Reasoning for Closed-Loop Autonomous Driving
ICLR
2025
Unsupervised Zero-Shot Reinforcement Learning via Dual-Value Forward-Backward Representation