ML Anthology
Authors
Search
About
Haimes, Jacob
2 publications
NeurIPS
2025
Noise Injection Reveals Hidden Capabilities of Sandbagging Language Models
Cameron Tice
,
Philipp Alexander Kreer
,
Nathan Helm-Burger
,
Prithviraj Singh Shahani
,
Fedor Ryzhenkov
,
Fabien Roger
,
Clement Neo
,
Jacob Haimes
,
Felix Hofstätter
,
Teun van der Weij
NeurIPSW
2024
Sandbag Detection Through Model Impairment
Cameron Tice
,
Philipp Alexander Kreer
,
Nathan Helm-Burger
,
Prithviraj Singh Shahani
,
Fedor Ryzhenkov
,
Teun van der Weij
,
Felix Hofstätter
,
Jacob Haimes