Shrivastava, Anshumali
58 publications
NeurIPS
2025
70% Size, 100% Accuracy: Lossless LLM Compression for Efficient GPU Inference via Dynamic-Length Float (DFloat11)
NeurIPS
2025
Breaking the Frozen Subspace: Importance Sampling for Low-Rank Optimization in LLM Pretraining
ICLR
2025
LeanQuant: Accurate and Scalable Large Language Model Quantization with Loss-Error-Aware Grid
NeurIPS
2024
KV Cache Is 1 Bit per Channel: Efficient Large Language Model Inference with Coupled Quantization
AISTATS
2023
A Tale of Two Efficient Value Iteration Algorithms for Solving Linear MDPs with Large Action Space
NeurIPS
2023
Scissorhands: Exploiting the Persistence of Importance Hypothesis for LLM KV Cache Compression at Test Time
NeurIPS
2022
The Trade-Offs of Model Size in Large Recommendation Models : 100GB to 10MB Criteo-Tb DLRM Model
NeurIPS
2021
Raw Nav-Merge Seismic Data to Subsurface Properties with MLP Based Multi-Modal Information Unscrambler
AAAI
2020
FourierSAT: A Fourier Expansion-Based Algebraic Framework for Solving Hybrid Boolean Constraints
NeurIPS
2019
Extreme Classification in Log Memory Using Count-Min Sketch: A Case Study of Amazon Search with 50m Products