Gholami, Amir

25 publications

NeurIPS 2025 Multipole Attention for Efficient Long Context Reasoning Coleman Richard Charles Hooper, Sebastian Zhao, Luca Manolache, Sehoon Kim, Michael W. Mahoney, Sophia Shao, Kurt Keutzer, Amir Gholami
ICML 2025 Plan-and-Act: Improving Planning of Agents for Long-Horizon Tasks Lutfi Eren Erdogan, Nicholas Lee, Sehoon Kim, Suhong Moon, Hiroki Furuta, Gopala Anumanchipalli, Kurt Keutzer, Amir Gholami
ICML 2025 QuantSpec: Self-Speculative Decoding with Hierarchical Quantized KV Cache Rishabh Tiwari, Haocheng Xi, Aditya Tomar, Coleman Richard Charles Hooper, Sehoon Kim, Maxwell Horton, Mahyar Najibi, Michael W. Mahoney, Kurt Keutzer, Amir Gholami
ICML 2024 An LLM Compiler for Parallel Function Calling Sehoon Kim, Suhong Moon, Ryan Tabrizi, Nicholas Lee, Michael W. Mahoney, Kurt Keutzer, Amir Gholami
ICMLW 2024 Characterizing Prompt Compression Methods for Long Context Inference Siddharth Jha, Lutfi Eren Erdogan, Sehoon Kim, Kurt Keutzer, Amir Gholami
NeurIPS 2024 KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache Quantization Coleman Hooper, Sehoon Kim, Hiva Mohammadzadeh, Michael W. Mahoney, Yakun Sophia Shao, Kurt Keutzer, Amir Gholami
ICML 2024 SqueezeLLM: Dense-and-Sparse Quantization Sehoon Kim, Coleman Richard Charles Hooper, Amir Gholami, Zhen Dong, Xiuyu Li, Sheng Shen, Michael W. Mahoney, Kurt Keutzer
NeurIPSW 2023 Rapid Fitting of Band-Excitation Piezoresponse Force Microscopy Using Physics Constrained Unsupervised Neural Networks Alibek T Kaliyev, Ryan F Forelli, Shuyu Qin, Yichen Guo, Seda Memik, Michael W. Mahoney, Amir Gholami, Nhan Tran, Philip Harris, Martin Takáč, Joshua Agar
NeurIPS 2023 Speculative Decoding with Big Little Decoder Sehoon Kim, Karttikeya Mangalam, Suhong Moon, Jitendra Malik, Michael W. Mahoney, Amir Gholami, Kurt Keutzer
NeurIPS 2023 Towards Foundation Models for Scientific Machine Learning: Characterizing Scaling and Transfer Behavior Shashank Subramanian, Peter Harrington, Kurt Keutzer, Wahid Bhimji, Dmitriy Morozov, Michael W. Mahoney, Amir Gholami
NeurIPS 2022 A Fast Post-Training Pruning Framework for Transformers Woosuk Kwon, Sehoon Kim, Michael W. Mahoney, Joseph Hassoun, Kurt Keutzer, Amir Gholami
WACV 2022 Hessian-Aware Pruning and Optimal Neural Implant Shixing Yu, Zhewei Yao, Amir Gholami, Zhen Dong, Sehoon Kim, Michael W. Mahoney, Kurt Keutzer
NeurIPS 2022 Squeezeformer: An Efficient Transformer for Automatic Speech Recognition Sehoon Kim, Amir Gholami, Albert Shaw, Nicholas Lee, Karttikeya Mangalam, Jitendra Malik, Michael W. Mahoney, Kurt Keutzer
AAAI 2021 ADAHESSIAN: An Adaptive Second Order Optimizer for Machine Learning Zhewei Yao, Amir Gholami, Sheng Shen, Mustafa Mustafa, Kurt Keutzer, Michael W. Mahoney
NeurIPS 2021 Characterizing Possible Failure Modes in Physics-Informed Neural Networks Aditi Krishnapriyan, Amir Gholami, Shandian Zhe, Robert Kirby, Michael W. Mahoney
ICML 2021 HAWQ-V3: Dyadic Neural Network Quantization Zhewei Yao, Zhen Dong, Zhangcheng Zheng, Amir Gholami, Jiali Yu, Eric Tan, Leyuan Wang, Qijing Huang, Yida Wang, Michael Mahoney, Kurt Keutzer
ICML 2021 I-BERT: Integer-Only BERT Quantization Sehoon Kim, Amir Gholami, Zhewei Yao, Michael W. Mahoney, Kurt Keutzer
NeurIPS 2020 Boundary Thickness and Robustness in Learning Models Yaoqing Yang, Rajiv Khanna, Yaodong Yu, Amir Gholami, Kurt Keutzer, Joseph E Gonzalez, Kannan Ramchandran, Michael W. Mahoney
NeurIPS 2020 HAWQ-V2: Hessian Aware Trace-Weighted Quantization of Neural Networks Zhen Dong, Zhewei Yao, Daiyaan Arfeen, Amir Gholami, Michael W. Mahoney, Kurt Keutzer
AAAI 2020 Inefficiency of K-FAC for Large Batch Size Training Linjian Ma, Gabe Montague, Jiayu Ye, Zhewei Yao, Amir Gholami, Kurt Keutzer, Michael W. Mahoney
ICML 2020 PowerNorm: Rethinking Batch Normalization in Transformers Sheng Shen, Zhewei Yao, Amir Gholami, Michael Mahoney, Kurt Keutzer
AAAI 2020 Q-BERT: Hessian Based Ultra Low Precision Quantization of BERT Sheng Shen, Zhen Dong, Jiayu Ye, Linjian Ma, Zhewei Yao, Amir Gholami, Michael W. Mahoney, Kurt Keutzer
NeurIPS 2019 ANODEV2: A Coupled Neural ODE Framework Tianjun Zhang, Zhewei Yao, Amir Gholami, Joseph E Gonzalez, Kurt Keutzer, Michael W. Mahoney, George Biros
NeurIPS 2018 Hessian-Based Analysis of Large Batch Training and Robustness to Adversaries Zhewei Yao, Amir Gholami, Qi Lei, Kurt Keutzer, Michael W. Mahoney
CVPRW 2018 SqueezeNext: Hardware-Aware Neural Network Design Amir Gholami, Kiseok Kwon, Bichen Wu, Zizheng Tai, Xiangyu Yue, Peter H. Jin, Sicheng Zhao, Kurt Keutzer