Kundu, Souvik
47 publications
NeurIPS
2025
Top-H Decoding: Adapting the Creativity and Coherence with Bounded Entropy in Text Generation
TMLR
2024
Bit-by-Bit: Investigating the Vulnerabilities of Binary Neural Networks to Adversarial Bit Flipping
NeurIPSW
2024
CITER: Collaborative Inference for Efficient Large Language Model Decoding with Token-Level Routing
NeurIPS
2024
ShiftAddLLM: Accelerating Pretrained LLMs via Post-Training Multiplication-Less Reparameterization