Ahn, Surin

5 publications

ICML 2025 MMInference: Accelerating Pre-Filling for Long-Context Visual Language Models via Modality-Aware Permutation Sparse Attention Yucheng Li, Huiqiang Jiang, Chengruidong Zhang, Qianhui Wu, Xufang Luo, Surin Ahn, Amir H. Abdi, Dongsheng Li, Jianfeng Gao, Yuqing Yang, Lili Qiu
ICLRW 2025 MMInference: Accelerating Pre-Filling for Long-Context Visual Language Models via Modality-Aware Permutation Sparse Attention Yucheng Li, Huiqiang Jiang, Chengruidong Zhang, Qianhui Wu, Xufang Luo, Surin Ahn, Amir H. Abdi, Dongsheng Li, Jianfeng Gao, Yuqing Yang, Lili Qiu
ICLR 2025 SCBench: A KV Cache-Centric Analysis of Long-Context Methods Yucheng Li, Huiqiang Jiang, Qianhui Wu, Xufang Luo, Surin Ahn, Chengruidong Zhang, Amir H. Abdi, Dongsheng Li, Jianfeng Gao, Yuqing Yang, Lili Qiu
NeurIPS 2024 MInference 1.0: Accelerating Pre-Filling for Long-Context LLMs via Dynamic Sparse Attention Huiqiang Jiang, Yucheng Li, Chengruidong Zhang, Qianhui Wu, Xufang Luo, Surin Ahn, Zhenhua Han, Amir H. Abdi, Dongsheng Li, Chin-Yew Lin, Yuqing Yang, Lili Qiu
ICMLW 2024 MInference: Accelerating Pre-Filling for Long-Context LLMs via Dynamic Sparse Attention Huiqiang Jiang, Yucheng Li, Chengruidong Zhang, Qianhui Wu, Xufang Luo, Surin Ahn, Zhenhua Han, Amir H. Abdi, Dongsheng Li, Chin-Yew Lin, Yuqing Yang, Lili Qiu