Mahdi, Salsabila

1 publications

NeurIPSW 2024 Steering Without Side Effects: Improving Post-Deployment Control of Language Models Asa Cooper Stickland, Alexander Lyzhov, Jacob Pfau, Salsabila Mahdi, Samuel R. Bowman