Wang, Pingjie

1 publications

ICLR 2025 Combatting Dimensional Collapse in LLM Pre-Training Data via Submodular File Selection Ziqing Fan, Siyuan Du, Shengchao Hu, Pingjie Wang, Li Shen, Ya Zhang, Dacheng Tao, Yanfeng Wang