Nrusimha, Aniruddha

1 publications

NeurIPS 2024 Reducing Transformer Key-Value Cache Size with Cross-Layer Attention William Brandon, Mayank Mishra, Aniruddha Nrusimha, Rameswar Panda, Jonathan Ragan-Kelley