ML Anthology
Authors
Search
About
Minder, Julian
5 publications
ICLR
2026
Narrow Finetuning Leaves Clearly Readable Traces in Activation Differences
Julian Minder
,
Clément Dumas
,
Stewart Slocum
,
Helena Casademunt
,
Cameron Holmes
,
Robert West
,
Neel Nanda
ICLR
2025
Controllable Context Sensitivity and the Knob Behind It
Julian Minder
,
Kevin Du
,
Niklas Stoehr
,
Giovanni Monea
,
Chris Wendler
,
Robert West
,
Ryan Cotterell
NeurIPS
2025
Overcoming Sparsity Artifacts in Crosscoders to Interpret Chat-Tuning
Julian Minder
,
Clément Dumas
,
Caden Juang
,
Bilal Chughtai
,
Neel Nanda
ICLRW
2025
Robustly Identifying Concepts Introduced During Chat Fine-Tuning Using Crosscoders
Julian Minder
,
Clément Dumas
,
Bilal Chughtai
,
Neel Nanda
NeurIPS
2025
The Non-Linear Representation Dilemma: Is Causal Abstraction Enough for Mechanistic Interpretability?
Denis Sutter
,
Julian Minder
,
Thomas Hofmann
,
Tiago Pimentel