Diao, Shizhe
21 publications
ICLR
2026
Fast-dLLM: Training-Free Acceleration of Diffusion LLM by Enabling KV Cache and Parallel Decoding
NeurIPS
2025
GPAS: Accelerating Convergence of LLM Pretraining via Gradient-Preserving Activation Scaling
ICLR
2025
LongMamba: Enhancing Mamba's Long-Context Capabilities via Training-Free Receptive Field Enlargement
NeurIPS
2025
Nemotron-CLIMB: Clustering-Based Iterative Data Mixture Bootstrapping for Language Model Pre-Training
NeurIPS
2025
ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Models
ICML
2025
UGPhysics: A Comprehensive Benchmark for Undergraduate Physics Reasoning with Large Language Models