Wu, Kai-Chiang

8 publications

ICLR 2025 Palu: KV-Cache Compression with Low-Rank Projection Chi-Chih Chang, Wei-Cheng Lin, Chien-Yu Lin, Chong-Yan Chen, Yu-Fang Hu, Pei-Shuo Wang, Ning-Chi Huang, Luis Ceze, Mohamed S. Abdelfattah, Kai-Chiang Wu
ICML 2025 Quamba2: A Robust and Scalable Post-Training Quantization Framework for Selective State Space Models Hung-Yueh Chiang, Chi-Chih Chang, Natalia Frumkin, Kai-Chiang Wu, Mohamed S. Abdelfattah, Diana Marculescu
ICLR 2025 Quamba: A Post-Training Quantization Recipe for Selective State Space Models Hung-Yueh Chiang, Chi-Chih Chang, Natalia Frumkin, Kai-Chiang Wu, Diana Marculescu
NeurIPS 2025 Speculate Deep and Accurate: Lossless and Training-Free Acceleration for Offloaded LLMs via Substitute Speculative Decoding Pei-Shuo Wang, Jian-Jia Chen, Chun-Che Yang, Chi-Chih Chang, Ning-Chi Huang, Mohamed S. Abdelfattah, Kai-Chiang Wu
CVPRW 2024 ELSA: Exploiting Layer-Wise N: M Sparsity for Vision Transformer Acceleration Ning-Chi Huang, Chi-Chih Chang, Wei-Cheng Lin, Endri Taka, Diana Marculescu, Kai-Chiang Wu
WACV 2024 FLORA: Fine-Grained Low-Rank Architecture Search for Vision Transformer Chi-Chih Chang, Yuan-Yao Sung, Shixing Yu, Ning-Chi Huang, Diana Marculescu, Kai-Chiang Wu
ICCVW 2021 FOX-NAS: Fast, On-Device and Explainable Neural Architecture Search Chia-Hsiang Liu, Yu-Shin Han, Yuan-Yao Sung, Yi Lee, Hung-Yueh Chiang, Kai-Chiang Wu
ICLR 2020 Efficient Systolic Array Based on Decomposable MAC for Quantized Deep Neural Networks Ning-Chi Huang, Huan-Jan Chou, Kai-Chiang Wu