ML Anthology
Authors
Search
About
Simig, Daniel
3 publications
NeurIPS
2023
D4: Improving LLM Pretraining via Document De-Duplication and Diversification
Kushal Tirumala
,
Daniel Simig
,
Armen Aghajanyan
,
Ari Morcos
NeurIPS
2023
MEGABYTE: Predicting Million-Byte Sequences with Multiscale Transformers
Lili Yu
,
Daniel Simig
,
Colin Flaherty
,
Armen Aghajanyan
,
Luke Zettlemoyer
,
Mike Lewis
ICLRW
2023
SemDeDup: Data-Efficient Learning at Web-Scale Through Semantic Deduplication
Amro Kamal Mohamed Abbas
,
Kushal Tirumala
,
Daniel Simig
,
Surya Ganguli
,
Ari S. Morcos