ML Anthology
Authors
Search
About
Kim, Sehoon
16 publications
NeurIPS
2025
Multipole Attention for Efficient Long Context Reasoning
Coleman Richard Charles Hooper
,
Sebastian Zhao
,
Luca Manolache
,
Sehoon Kim
,
Michael W. Mahoney
,
Sophia Shao
,
Kurt Keutzer
,
Amir Gholami
ICML
2025
Plan-and-Act: Improving Planning of Agents for Long-Horizon Tasks
Lutfi Eren Erdogan
,
Nicholas Lee
,
Sehoon Kim
,
Suhong Moon
,
Hiroki Furuta
,
Gopala Anumanchipalli
,
Kurt Keutzer
,
Amir Gholami
ICML
2025
QuantSpec: Self-Speculative Decoding with Hierarchical Quantized KV Cache
Rishabh Tiwari
,
Haocheng Xi
,
Aditya Tomar
,
Coleman Richard Charles Hooper
,
Sehoon Kim
,
Maxwell Horton
,
Mahyar Najibi
,
Michael W. Mahoney
,
Kurt Keutzer
,
Amir Gholami
ICMLW
2024
AdaNF: Quantization Group Adaptive NormalFloat for Low Bit Fine-Tuning of LLMs
Yeojoon Youn
,
Sehoon Kim
,
Suhong Moon
,
Sang Keun Choe
,
Ce Zhang
ICML
2024
An LLM Compiler for Parallel Function Calling
Sehoon Kim
,
Suhong Moon
,
Ryan Tabrizi
,
Nicholas Lee
,
Michael W. Mahoney
,
Kurt Keutzer
,
Amir Gholami
ICMLW
2024
Characterizing Prompt Compression Methods for Long Context Inference
Siddharth Jha
,
Lutfi Eren Erdogan
,
Sehoon Kim
,
Kurt Keutzer
,
Amir Gholami
NeurIPS
2024
KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache Quantization
Coleman Hooper
,
Sehoon Kim
,
Hiva Mohammadzadeh
,
Michael W. Mahoney
,
Yakun Sophia Shao
,
Kurt Keutzer
,
Amir Gholami
ICMLW
2024
Learned Best-Effort LLM Serving
Siddharth Jha
,
Coleman Richard Charles Hooper
,
Xiaoxuan Liu
,
Sehoon Kim
,
Kurt Keutzer
ICML
2024
SqueezeLLM: Dense-and-Sparse Quantization
Sehoon Kim
,
Coleman Richard Charles Hooper
,
Amir Gholami
,
Zhen Dong
,
Xiuyu Li
,
Sheng Shen
,
Michael W. Mahoney
,
Kurt Keutzer
NeurIPS
2023
Speculative Decoding with Big Little Decoder
Sehoon Kim
,
Karttikeya Mangalam
,
Suhong Moon
,
Jitendra Malik
,
Michael W. Mahoney
,
Amir Gholami
,
Kurt Keutzer
NeurIPS
2022
A Fast Post-Training Pruning Framework for Transformers
Woosuk Kwon
,
Sehoon Kim
,
Michael W. Mahoney
,
Joseph Hassoun
,
Kurt Keutzer
,
Amir Gholami
ECCV
2022
BigColor: Colorization Using a Generative Color Prior for Natural Images
Geonung Kim
,
Kyoungkook Kang
,
Seongtae Kim
,
Hwayoon Lee
,
Sehoon Kim
,
Jonghyun Kim
,
Seung-Hwan Baek
,
Sunghyun Cho
WACV
2022
Hessian-Aware Pruning and Optimal Neural Implant
Shixing Yu
,
Zhewei Yao
,
Amir Gholami
,
Zhen Dong
,
Sehoon Kim
,
Michael W. Mahoney
,
Kurt Keutzer
NeurIPS
2022
Squeezeformer: An Efficient Transformer for Automatic Speech Recognition
Sehoon Kim
,
Amir Gholami
,
Albert Shaw
,
Nicholas Lee
,
Karttikeya Mangalam
,
Jitendra Malik
,
Michael W. Mahoney
,
Kurt Keutzer
ICML
2021
I-BERT: Integer-Only BERT Quantization
Sehoon Kim
,
Amir Gholami
,
Zhewei Yao
,
Michael W. Mahoney
,
Kurt Keutzer
NeurIPS
2021
Terra: Imperative-Symbolic Co-Execution of Imperative Deep Learning Programs
Taebum Kim
,
Eunji Jeong
,
Geon-Woo Kim
,
Yunmo Koo
,
Sehoon Kim
,
Gyeongin Yu
,
Byung-Gon Chun