Kim, Taesu
10 publications
NeurIPS
2024
Mixture of Scales: Memory-Efficient Token-Adaptive Binarization for Large Language Models
ICML
2024
SLEB: Streamlining LLMs Through Redundancy Verification and Elimination of Transformer Blocks
10 publications