Kim, Sehoon

16 publications

NeurIPS 2025 Multipole Attention for Efficient Long Context Reasoning Coleman Richard Charles Hooper, Sebastian Zhao, Luca Manolache, Sehoon Kim, Michael W. Mahoney, Sophia Shao, Kurt Keutzer, Amir Gholami
ICML 2025 Plan-and-Act: Improving Planning of Agents for Long-Horizon Tasks Lutfi Eren Erdogan, Nicholas Lee, Sehoon Kim, Suhong Moon, Hiroki Furuta, Gopala Anumanchipalli, Kurt Keutzer, Amir Gholami
ICML 2025 QuantSpec: Self-Speculative Decoding with Hierarchical Quantized KV Cache Rishabh Tiwari, Haocheng Xi, Aditya Tomar, Coleman Richard Charles Hooper, Sehoon Kim, Maxwell Horton, Mahyar Najibi, Michael W. Mahoney, Kurt Keutzer, Amir Gholami
ICMLW 2024 AdaNF: Quantization Group Adaptive NormalFloat for Low Bit Fine-Tuning of LLMs Yeojoon Youn, Sehoon Kim, Suhong Moon, Sang Keun Choe, Ce Zhang
ICML 2024 An LLM Compiler for Parallel Function Calling Sehoon Kim, Suhong Moon, Ryan Tabrizi, Nicholas Lee, Michael W. Mahoney, Kurt Keutzer, Amir Gholami
ICMLW 2024 Characterizing Prompt Compression Methods for Long Context Inference Siddharth Jha, Lutfi Eren Erdogan, Sehoon Kim, Kurt Keutzer, Amir Gholami
NeurIPS 2024 KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache Quantization Coleman Hooper, Sehoon Kim, Hiva Mohammadzadeh, Michael W. Mahoney, Yakun Sophia Shao, Kurt Keutzer, Amir Gholami
ICMLW 2024 Learned Best-Effort LLM Serving Siddharth Jha, Coleman Richard Charles Hooper, Xiaoxuan Liu, Sehoon Kim, Kurt Keutzer
ICML 2024 SqueezeLLM: Dense-and-Sparse Quantization Sehoon Kim, Coleman Richard Charles Hooper, Amir Gholami, Zhen Dong, Xiuyu Li, Sheng Shen, Michael W. Mahoney, Kurt Keutzer
NeurIPS 2023 Speculative Decoding with Big Little Decoder Sehoon Kim, Karttikeya Mangalam, Suhong Moon, Jitendra Malik, Michael W. Mahoney, Amir Gholami, Kurt Keutzer
NeurIPS 2022 A Fast Post-Training Pruning Framework for Transformers Woosuk Kwon, Sehoon Kim, Michael W. Mahoney, Joseph Hassoun, Kurt Keutzer, Amir Gholami
ECCV 2022 BigColor: Colorization Using a Generative Color Prior for Natural Images Geonung Kim, Kyoungkook Kang, Seongtae Kim, Hwayoon Lee, Sehoon Kim, Jonghyun Kim, Seung-Hwan Baek, Sunghyun Cho
WACV 2022 Hessian-Aware Pruning and Optimal Neural Implant Shixing Yu, Zhewei Yao, Amir Gholami, Zhen Dong, Sehoon Kim, Michael W. Mahoney, Kurt Keutzer
NeurIPS 2022 Squeezeformer: An Efficient Transformer for Automatic Speech Recognition Sehoon Kim, Amir Gholami, Albert Shaw, Nicholas Lee, Karttikeya Mangalam, Jitendra Malik, Michael W. Mahoney, Kurt Keutzer
ICML 2021 I-BERT: Integer-Only BERT Quantization Sehoon Kim, Amir Gholami, Zhewei Yao, Michael W. Mahoney, Kurt Keutzer
NeurIPS 2021 Terra: Imperative-Symbolic Co-Execution of Imperative Deep Learning Programs Taebum Kim, Eunji Jeong, Geon-Woo Kim, Yunmo Koo, Sehoon Kim, Gyeongin Yu, Byung-Gon Chun