Gosal, Gurpreet

3 publications

NeurIPS 2025 Power Lines: Scaling Laws for Weight Decay and Batch Size in LLM Pre-Training Shane Bergsma, Nolan Simran Dey, Gurpreet Gosal, Gavia Gray, Daria Soboleva, Joel Hestness
ICLR 2025 Straight to Zero: Why Linearly Decaying the Learning Rate to Zero Works Best for LLMs Shane Bergsma, Nolan Simran Dey, Gurpreet Gosal, Gavia Gray, Daria Soboleva, Joel Hestness
ICMLW 2024 Bilingual Adaptation of Monolingual Foundation Models Gurpreet Gosal, Yishi Xu, Gokulakrishnan Ramakrishnan, Rituraj Joshi, Avraham Sheinin, Zhiming Chen, Biswajit Mishra, Sunil Kumar Sahu, Neha Sengupta, Natalia Vassilieva, Joel Hestness, Samujjwal Ghosh, Bokang Jia, Onkar Arun Pandit, Satheesh Katipomu, Samta Kamboj, Rahul Pal, Parvez Mullah, Soundar Balaji Doraiswamy, Karim Chami, Preslav Nakov