Chai, Chengliang
4 publications
ICLR
2026
Not All Documents Are What You Need for Extracting Instruction Tuning Data
Chi Zhang, Huaping Zhong, Hongtao Li, Chengliang Chai, Hongjiawei, Yu-Ping Wang, Yuhao Deng, Jiacheng Wang, Yizhou Yan, Qiu Jiantao, Conghui He, Lei Cao ICLR
2025
Harnessing Diversity for Important Data Selection in Pretraining Large Language Models
Chi Zhang, Huaping Zhong, Kuan Zhang, Chengliang Chai, Rui Wang, Xinlin Zhuang, Tianyi Bai, Qiu Jiantao, Lei Cao, Ju Fan, Ye Yuan, Guoren Wang, Conghui He