Paul, Mansheej

16 publications

ICML 2025 $\mathrmμ$nit Scaling: Simple and Scalable FP8 LLM Training Saaketh Narayan, Abhay Gupta, Mansheej Paul, Davis Blalock
ICLR 2025 Perplexed by Perplexity: Perplexity-Based Data Pruning with Small Reference Models Zachary Ankner, Cody Blakeney, Kartik Sreenivasan, Max Marion, Matthew L Leavitt, Mansheej Paul
ICLR 2025 Scaling Laws for Precision Tanishq Kumar, Zachary Ankner, Benjamin Frederick Spector, Blake Bordelon, Niklas Muennighoff, Mansheej Paul, Cengiz Pehlevan, Christopher Re, Aditi Raghunathan
NeurIPSW 2024 Critique-Out-Loud Reward Models Zachary Ankner, Mansheej Paul, Brandon Cui, Jonathan Daniel Chang, Prithviraj Ammanabrolu
ICMLW 2024 Does Your Data Spark Joy? Performance Gains from Domain Upsampling at the End of Training Cody Blakeney, Mansheej Paul, Brett W. Larsen, Sean Owen, Jonathan Frankle
TMLR 2024 LoRA Learns Less and Forgets Less Dan Biderman, Jacob Portes, Jose Javier Gonzalez Ortiz, Mansheej Paul, Philip Greengard, Connor Jennings, Daniel King, Sam Havens, Vitaliy Chiley, Jonathan Frankle, Cody Blakeney, John Patrick Cunningham
ICLRW 2024 Perplexed by Perplexity: Perplexity-Based Pruning with Small Reference Models Zachary Ankner, Cody Blakeney, Kartik Sreenivasan, Max Marion, Matthew L Leavitt, Mansheej Paul
ICMLW 2023 Predicting Task Forgetting in Large Language Models Anat Kleiman, Jonathan Frankle, Sham M. Kakade, Mansheej Paul
NeurIPS 2023 Pretraining Task Diversity and the Emergence of Non-Bayesian In-Context Learning for Regression Allan Raventós, Mansheej Paul, Feng Chen, Surya Ganguli
ICLRW 2023 The Effects of Pretraining Task Diversity on In-Context Learning of Ridge Regression Allan Raventos, Mansheej Paul, Feng Chen, Surya Ganguli
ICLR 2023 Unmasking the Lottery Ticket Hypothesis: What's Encoded in a Winning Ticket's Mask? Mansheej Paul, Feng Chen, Brett W. Larsen, Jonathan Frankle, Surya Ganguli, Gintare Karolina Dziugaite
NeurIPS 2022 Lottery Tickets on a Data Diet: Finding Initializations with Sparse Trainable Networks Mansheej Paul, Brett Larsen, Surya Ganguli, Jonathan Frankle, Gintare Karolina Dziugaite
ICMLW 2022 Pre-Training on a Data Diet: Identifying Sufficient Examples for Early Training Mansheej Paul, Brett W Larsen, Surya Ganguli, Jonathan Frankle, Gintare Karolina Dziugaite
NeurIPSW 2022 Unmasking the Lottery Ticket Hypothesis: Efficient Adaptive Pruning for Finding Winning Tickets Mansheej Paul, Feng Chen, Brett W. Larsen, Jonathan Frankle, Surya Ganguli, Gintare Karolina Dziugaite
NeurIPS 2021 Deep Learning on a Data Diet: Finding Important Examples Early in Training Mansheej Paul, Surya Ganguli, Gintare Karolina Dziugaite
NeurIPS 2020 Deep Learning Versus Kernel Learning: An Empirical Study of Loss Landscape Geometry and the Time Evolution of the Neural Tangent Kernel Stanislav Fort, Gintare Karolina Dziugaite, Mansheej Paul, Sepideh Kharaghani, Daniel M. Roy, Surya Ganguli