ML Anthology
Authors
Search
About
Agarwal, Sahil
3 publications
NeurIPSW
2024
Efficacy of the SAGE-RT Dataset for Model Safety Alignment: A Comparative Study
Tanay Baswa
,
Nitin Aravind Birur
,
Divyanshu Kumar
,
Jatan Loya
,
Anurakt Kumar
,
Prashanth Harshangi
,
Sahil Agarwal
NeurIPSW
2024
Investigating Implicit Bias in Large Language Models: A Large-Scale Study of over 50 LLMs
Divyanshu Kumar
,
Umang Jain
,
Sahil Agarwal
,
Prashanth Harshangi
NeurIPSW
2024
SAGE-RT: Synthetic Alignment Data Generation for Safety Evaluation and Red Teaming
Anurakt Kumar
,
Divyanshu Kumar
,
Jatan Loya
,
Nitin Aravind Birur
,
Tanay Baswa
,
Sahil Agarwal
,
Prashanth Harshangi