Gupta, Kshitij

7 publications

TMLR 2024 Simple and Scalable Strategies to Continually Pre-Train Large Language Models Adam Ibrahim, Benjamin Thérien, Kshitij Gupta, Mats Leon Richter, Quentin Gregory Anthony, Eugene Belilovsky, Timothée Lesort, Irina Rish
NeurIPSW 2023 ARB: Advanced Reasoning Benchmark for Large Language Models Tomohiro Sawada, Daniel Paleka, Alexander Havrilla, Pranav Tadepalli, Paula Vidas, Alexander Kranias, John Nay, Kshitij Gupta, Aran Komatsuzaki
ICLR 2023 Broken Neural Scaling Laws Ethan Caballero, Kshitij Gupta, Irina Rish, David Krueger
ICLRW 2023 Broken Neural Scaling Laws Ethan Caballero, Kshitij Gupta, Irina Rish, David Krueger
ICMLW 2023 Continual Pre-Training of Large Language Models: How to Re-Warm Your Model? Kshitij Gupta, Benjamin Thérien, Adam Ibrahim, Mats Leon Richter, Quentin Gregory Anthony, Eugene Belilovsky, Irina Rish, Timothée Lesort
NeurIPSW 2022 Broken Neural Scaling Laws Ethan Caballero, Kshitij Gupta, Irina Rish, David Krueger
NeurIPS 2022 Temporal Latent Bottleneck: Synthesis of Fast and Slow Processing Mechanisms in Sequence Learning Aniket Didolkar, Kshitij Gupta, Anirudh Goyal, Nitesh Bharadwaj Gundavarapu, Alex M Lamb, Nan Rosemary Ke, Yoshua Bengio