ML Anthology
Authors
Search
About
Kaunismaa, Jackson
2 publications
ICLR
2026
Eliciting Harmful Capabilities by Fine-Tuning on Safeguarded Outputs
Jackson Kaunismaa
,
John Hughes
,
Christina Q Knight
,
Avery Griffin
,
Mrinank Sharma
,
Erik Jones
ICLRW
2025
A Benchmark for Scalable Oversight Mechanisms
Abhimanyu Pallavi Sudhir
,
Jackson Kaunismaa
,
Arjun Panickssery