Brandon, William

2 publications

ICML 2025 Ladder-Residual: Parallelism-Aware Architecture for Accelerating Large Model Inference with Communication Overlapping Muru Zhang, Mayank Mishra, Zhongzhu Zhou, William Brandon, Jue Wang, Yoon Kim, Jonathan Ragan-Kelley, Shuaiwen Leon Song, Ben Athiwaratkun, Tri Dao
NeurIPS 2024 Reducing Transformer Key-Value Cache Size with Cross-Layer Attention William Brandon, Mayank Mishra, Aniruddha Nrusimha, Rameswar Panda, Jonathan Ragan-Kelley