Tang, Jiaming

4 publications

ICLR 2025 DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Heads Guangxuan Xiao, Jiaming Tang, Jingwei Zuo, Junxian Guo, Shang Yang, Haotian Tang, Yao Fu, Song Han
ICCV 2025 SparseVILA: Decoupling Visual Sparsity for Efficient VLM Inference Samir Khaki, Junxian Guo, Jiaming Tang, Shang Yang, Yukang Chen, Konstantinos N. Plataniotis, Yao Lu, Song Han, Zhijian Liu
NeurIPS 2025 Twilight: Adaptive Attention Sparsity with Hierarchical Top-$p$ Pruning Chaofan Lin, Jiaming Tang, Shuo Yang, Hanshuo Wang, Tian Tang, Boyu Tian, Ion Stoica, Song Han, Mingyu Gao
ICML 2024 QUEST: Query-Aware Sparsity for Efficient Long-Context LLM Inference Jiaming Tang, Yilong Zhao, Kan Zhu, Guangxuan Xiao, Baris Kasikci, Song Han