Duanmu, Haojie

4 publications

ICLR 2026 FlexLinearAttention: Compiling a Unified Abstraction into Scalable Kernels for Linear Attention Haojie Duanmu, Size Zheng, Ningxin Zheng, Jianqiao Lu, Xuegui Zheng, Xingcheng Zhang, Li-Wen Chang, Xin Liu, Dahua Lin
ICLR 2026 Scaling Large Vision-Language Model RL Training via Efficient Load Balancing Zerui Wang, Qinghao Hu, Chang Chen, Jiecheng Zhou, Haojie Duanmu, Xingcheng Zhang, Peng Sun, Dahua Lin
ICML 2025 MxMoE: Mixed-Precision Quantization for MoE with Accuracy and Performance Co-Design Haojie Duanmu, Xiuhong Li, Zhihang Yuan, Size Zheng, Jiangfei Duan, Xingcheng Zhang, Dahua Lin
ICML 2024 MuxServe: Flexible Spatial-Temporal Multiplexing for Multiple LLM Serving Jiangfei Duan, Runyu Lu, Haojie Duanmu, Xiuhong Li, Xingcheng Zhang, Dahua Lin, Ion Stoica, Hao Zhang