Xi, Haocheng

12 publications

ICLR 2025 COAT: Compressing Optimizer States and Activations for Memory-Efficient FP8 Training Haocheng Xi, Han Cai, Ligeng Zhu, Yao Lu, Kurt Keutzer, Jianfei Chen, Song Han
NeurIPS 2025 Jet-Nemotron: Efficient Language Model with Post Neural Architecture Search Yuxian Gu, Qinghao Hu, Haocheng Xi, Junyu Chen, Shang Yang, Song Han, Han Cai
CVPR 2025 NVILA: Efficient Frontier Visual Language Models Zhijian Liu, Ligeng Zhu, Baifeng Shi, Zhuoyang Zhang, Yuming Lou, Shang Yang, Haocheng Xi, Shiyi Cao, Yuxian Gu, Dacheng Li, Xiuyu Li, Haotian Tang, Yunhao Fang, Yukang Chen, Cheng-Yu Hsieh, De-An Huang, An-Chieh Cheng, Jinyi Hu, Sifei Liu, Ranjay Krishna, Pavlo Molchanov, Jan Kautz, Hongxu Yin, Song Han, Yao Lu
ICML 2025 Oscillation-Reduced MXFP4 Training for Vision Transformers Yuxiang Chen, Haocheng Xi, Jun Zhu, Jianfei Chen
ICML 2025 QuantSpec: Self-Speculative Decoding with Hierarchical Quantized KV Cache Rishabh Tiwari, Haocheng Xi, Aditya Tomar, Coleman Richard Charles Hooper, Sehoon Kim, Maxwell Horton, Mahyar Najibi, Michael W. Mahoney, Kurt Keutzer, Amir Gholami
NeurIPS 2025 Radial Attention: $\mathcal{O}(n\log N)$ Sparse Attention with Energy Decay for Long Video Generation Xingyang Li, Muyang Li, Tianle Cai, Haocheng Xi, Shuo Yang, Yujun Lin, Lvmin Zhang, Songlin Yang, Jinbo Hu, Kelly Peng, Maneesh Agrawala, Ion Stoica, Kurt Keutzer, Song Han
ICML 2025 SpargeAttention: Accurate and Training-Free Sparse Attention Accelerating Any Model Inference Jintao Zhang, Chendong Xiang, Haofeng Huang, Jia Wei, Haocheng Xi, Jun Zhu, Jianfei Chen
ICLRW 2025 SpargeAttn: Training-Free Sparse Attention Accelerating Any Model Inference Jintao Zhang, Chendong Xiang, Haofeng Huang, Jia Wei, Haocheng Xi, Jun Zhu, Jianfei Chen
ICML 2025 Sparse Video-Gen: Accelerating Video Diffusion Transformers with Spatial-Temporal Sparsity Haocheng Xi, Shuo Yang, Yilong Zhao, Chenfeng Xu, Muyang Li, Xiuyu Li, Yujun Lin, Han Cai, Jintao Zhang, Dacheng Li, Jianfei Chen, Ion Stoica, Kurt Keutzer, Song Han
NeurIPS 2025 Sparse VideoGen2: Accelerate Video Generation with Sparse Attention via Semantic-Aware Permutation Shuo Yang, Haocheng Xi, Yilong Zhao, Muyang Li, Jintao Zhang, Han Cai, Yujun Lin, Xiuyu Li, Chenfeng Xu, Kelly Peng, Jianfei Chen, Song Han, Kurt Keutzer, Ion Stoica
ICML 2024 Jetfire: Efficient and Accurate Transformer Pretraining with INT8 Data Flow and Per-Block Quantization Haocheng Xi, Yuxiang Chen, Kang Zhao, Kai Jun Teh, Jianfei Chen, Jun Zhu
NeurIPS 2023 Training Transformers with 4-Bit Integers Haocheng Xi, ChangHao Li, Jianfei Chen, Jun Zhu