Simig, Daniel

3 publications

NeurIPS 2023 D4: Improving LLM Pretraining via Document De-Duplication and Diversification Kushal Tirumala, Daniel Simig, Armen Aghajanyan, Ari Morcos
NeurIPS 2023 MEGABYTE: Predicting Million-Byte Sequences with Multiscale Transformers Lili Yu, Daniel Simig, Colin Flaherty, Armen Aghajanyan, Luke Zettlemoyer, Mike Lewis
ICLRW 2023 SemDeDup: Data-Efficient Learning at Web-Scale Through Semantic Deduplication Amro Kamal Mohamed Abbas, Kushal Tirumala, Daniel Simig, Surya Ganguli, Ari S. Morcos