Guo, Minyi
21 publications
NeurIPS
2025
ClusterFusion: Expanding Operator Fusion Scope for LLM Inference via Cluster-Level Collective Primitive
NeurIPS
2025
Communication-Efficient Diffusion Denoising Parallelization via Reuse-Then-Predict Mechanism
NeurIPS
2025
Yggdrasil: Bridging Dynamic Speculation and Static Runtime for Latency-Optimal Tree-Based LLM Decoding