Satyanarayan, Arvind

4 publications

ICLR 2026 Semantic Regexes: Auto-Interpreting LLM Features with a Structured Language Angie Boggust, Donghao Ren, Yannick Assogba, Dominik Moritz, Arvind Satyanarayan, Fred Hohman

ECCVW 2024 Explanation Alignment: Quantifying the Correctness of Model Reasoning at Scale Hyemin Bang, Angie W. Boggust, Arvind Satyanarayan

AAAI 2022 Teaching Humans When to Defer to a Classifier via Exemplars Hussein Mozannar, Arvind Satyanarayan, David A. Sontag

Distill 2018 The Building Blocks of Interpretability Chris Olah, Arvind Satyanarayan, Ian Johnson, Shan Carter, Ludwig Schubert, Katherine Ye, Alexander Mordvintsev