ML Anthology
Authors
Search
About
Jones, Erik
13 publications
ICML
2025
Adversaries Can Misuse Combinations of Safe Models
Erik Jones
,
Anca Dragan
,
Jacob Steinhardt
NeurIPS
2025
Best-of-N Jailbreaking
John Hughes
,
Sara Price
,
Aengus Lynch
,
Rylan Schaeffer
,
Fazl Barez
,
Arushi Somani
,
Sanmi Koyejo
,
Henry Sleight
,
Erik Jones
,
Ethan Perez
,
Mrinank Sharma
ICML
2025
How Do Large Language Monkeys Get Their Power (Laws)?
Rylan Schaeffer
,
Joshua Kazdan
,
John Hughes
,
Jordan Juravsky
,
Sara Price
,
Aengus Lynch
,
Erik Jones
,
Robert Kirk
,
Azalia Mirhoseini
,
Sanmi Koyejo
NeurIPS
2025
LLM Layers Immediately Correct Each Other
Arjun Patrawala
,
Jiahai Feng
,
Erik Jones
,
Jacob Steinhardt
ICLR
2025
Uncovering Gaps in How Humans and LLMs Interpret Subjective Language
Erik Jones
,
Arjun Patrawala
,
Jacob Steinhardt
ICLR
2024
Attention Satisfies: A Constraint-Satisfaction Lens on Factual Errors of Language Models
Mert Yuksekgonul
,
Varun Chandrasekaran
,
Erik Jones
,
Suriya Gunasekar
,
Ranjita Naik
,
Hamid Palangi
,
Ece Kamar
,
Besmira Nushi
ICML
2024
Feedback Loops with Language Models Drive In-Context Reward Hacking
Alexander Pan
,
Erik Jones
,
Meena Jagadeesan
,
Jacob Steinhardt
ICLR
2024
Teaching Language Models to Hallucinate Less with Synthetic Tasks
Erik Jones
,
Hamid Palangi
,
Clarisse Simões Ribeiro
,
Varun Chandrasekaran
,
Subhabrata Mukherjee
,
Arindam Mitra
,
Ahmed Hassan Awadallah
,
Ece Kamar
ICML
2023
Automatically Auditing Large Language Models via Discrete Optimization
Erik Jones
,
Anca Dragan
,
Aditi Raghunathan
,
Jacob Steinhardt
NeurIPS
2023
Mass-Producing Failures of Multimodal Systems with Language Models
Shengbang Tong
,
Erik Jones
,
Jacob Steinhardt
NeurIPS
2022
Capturing Failures of Large Language Models via Human Cognitive Biases
Erik Jones
,
Jacob Steinhardt
ICLR
2021
Selective Classification Can Magnify Disparities Across Groups
Erik Jones
,
Shiori Sagawa
,
Pang Wei Koh
,
Ananya Kumar
,
Percy Liang
NeurIPSW
2020
Selective Classification Can Magnify Disparities Across Groups
Erik Jones
,
Shiori Sagawa
,
Pang Wei Koh
,
Ananya Kumar
,
Percy Liang