Belilovsky, Eugene
73 publications
NeurIPS
2025
ACCO: Accumulate While You Communicate for Communication-Overlapped Sharded LLM Training
CoLLAs
2025
Beyond Cosine Decay: On the Effectiveness of Infinite Learning Rate Schedule for Continual Pre-Training
NeurIPSW
2024
ACCO: Accumulate While You Communicate, Hiding Communications in Distributed LLM Training
ICMLW
2024
Model Breadcrumbs: Scalable Upcycling of Finetuned Foundation Models via Sparse Task Vectors Merging
NeurIPS
2023
$\textbf{A}^2\textbf{CiD}^2$: Accelerating Asynchronous Communication in Decentralized Deep Learning