Somasundaram, Aditya

2 publications

TMLR 2025 Learning Using a Single Forward Pass Aditya Somasundaram, Pushkal Mishra, Ayon Borthakur
NeurIPS 2025 Small Batch Size Training for Language Models: When Vanilla SGD Works, and Why Gradient Accumulation Is Wasteful Martin Marek, Sanae Lotfi, Aditya Somasundaram, Andrew Gordon Wilson, Micah Goldblum