Towards Safer Pretraining: Analyzing and Filtering Harmful Content in Webscale Datasets for Responsible LLMs

Cite

Text

Mendu et al. "Towards Safer Pretraining: Analyzing and Filtering Harmful Content in Webscale Datasets for Responsible LLMs." International Joint Conference on Artificial Intelligence, 2025. doi:10.24963/IJCAI.2025/53

Markdown

[Mendu et al. "Towards Safer Pretraining: Analyzing and Filtering Harmful Content in Webscale Datasets for Responsible LLMs." International Joint Conference on Artificial Intelligence, 2025.](https://mlanthology.org/ijcai/2025/mendu2025ijcai-safer/) doi:10.24963/IJCAI.2025/53

BibTeX

@inproceedings{mendu2025ijcai-safer,
  title     = {{Towards Safer Pretraining: Analyzing and Filtering Harmful Content in Webscale Datasets for Responsible LLMs}},
  author    = {Mendu, Sai Krishna and Yenala, Harish and Gulati, Aditi and Kumar, Shanu and Agrawal, Parag},
  booktitle = {International Joint Conference on Artificial Intelligence},
  year      = {2025},
  pages     = {466-474},
  doi       = {10.24963/IJCAI.2025/53},
  url       = {https://mlanthology.org/ijcai/2025/mendu2025ijcai-safer/}
}