Kim, Been

40 publications

TMLR 2025 Getting Aligned on Representational Alignment Ilia Sucholutsky, Lukas Muttenthaler, Adrian Weller, Andi Peng, Andreea Bobu, Been Kim, Bradley C. Love, Christopher J Cueva, Erin Grant, Iris Groen, Jascha Achterberg, Joshua B. Tenenbaum, Katherine M. Collins, Katherine Hermann, Kerem Oktar, Klaus Greff, Martin N Hebart, Nathan Cloos, Nikolaus Kriegeskorte, Nori Jacoby, Qiuyi Zhang, Raja Marjieh, Robert Geirhos, Sherol Chen, Simon Kornblith, Sunayana Rane, Talia Konkle, Thomas O'Connell, Thomas Unterthiner, Andrew Kyle Lampinen, Klaus Robert Muller, Mariya Toneva, Thomas L. Griffiths
ICLR 2025 How New Data Permeates LLM Knowledge and How to Dilute It Chen Sun, Renat Aksitov, Andrey Zhmoginov, Nolan Andrew Miller, Max Vladymyrov, Ulrich Rueckert, Been Kim, Mark Sandler
ICML 2025 Position: We Can’t Understand AI Using Our Existing Vocabulary John Hewitt, Robert Geirhos, Been Kim
ICML 2025 Proactive Agents for Multi-Turn Text-to-Image Generation Under Uncertainty Meera Hahn, Wenjun Zeng, Nithish Kannen, Rich Galt, Kartikeya Badola, Been Kim, Zi Wang
NeurIPS 2025 QuestBench: Can LLMs Ask the Right Question to Acquire Information in Reasoning Tasks? Belinda Z. Li, Been Kim, Zi Wang
ICLRW 2024 Can Generative Multimodal Models Count to Ten? Sunayana Rane, Alexander Ku, Jason Michael Baldridge, Ian Tenney, Thomas L. Griffiths, Been Kim
ICML 2024 Don’t Trust Your Eyes: On the (un)reliability of Feature Visualizations Robert Geirhos, Roland S. Zimmermann, Blair Bilodeau, Wieland Brendel, Been Kim
NeurIPSW 2024 How New Data Pollutes LLM Knowledge and How to Dilute It Chen Sun, Renat Aksitov, Andrey Zhmoginov, Nolan Andrew Miller, Max Vladymyrov, Ulrich Rueckert, Been Kim, Mark Sandler
NeurIPS 2023 Does Localization Inform Editing? Surprising Differences in Causality-Based Localization vs. Knowledge Editing in Language Models Peter Hase, Mohit Bansal, Been Kim, Asma Ghandeharioun
ICMLW 2023 Don't Trust Your Eyes: On the (un)reliability of Feature Visualizations Robert Geirhos, Roland S. Zimmermann, Blair Bilodeau, Wieland Brendel, Been Kim
NeurIPS 2023 Gaussian Process Probes (GPP) for Uncertainty-Aware Probing Zi Wang, Alexander Ku, Jason Baldridge, Tom Griffiths, Been Kim
ICML 2023 On the Relationship Between Explanation and Prediction: A Causal View Amir-Hossein Karimi, Krikamol Muandet, Simon Kornblith, Bernhard Schölkopf, Been Kim
NeurIPSW 2023 On the Relationship Between Explanation and Prediction: A Causal View Amir-Hossein Karimi, Krikamol Muandet, Simon Kornblith, Bernhard Schölkopf, Been Kim
NeurIPS 2023 State2Explanation: Concept-Based Explanations to Benefit Agent Learning and User Understanding Devleena Das, Sonia Chernova, Been Kim
TMLR 2023 TabCBM: Concept-Based Interpretable Neural Networks for Tabular Data Mateo Espinosa Zarlenga, Zohreh Shams, Michael Edward Nelson, Been Kim, Mateja Jamnik
ICMLW 2023 TabCBM: Concept-Based Interpretable Neural Networks for Tabular Data Mateo Espinosa Zarlenga, Zohreh Shams, Michael Edward Nelson, Been Kim, Mateja Jamnik
NeurIPS 2022 Beyond Rewards: A Hierarchical Perspective on Offline Multiagent Behavioral Analysis Shayegan Omidshafiei, Andrei Kapishnikov, Yannick Assogba, Lucas Dixon, Been Kim
NeurIPSW 2022 Concept-Based Understanding of Emergent Multi-Agent Behavior Niko Grupen, Natasha Jaques, Been Kim, Shayegan Omidshafiei
ICLR 2022 DISSECT: Disentangled Simultaneous Explanations via Concept Traversals Asma Ghandeharioun, Been Kim, Chun-Liang Li, Brendan Jou, Brian Eoff, Rosalind Picard
ICLR 2022 Post Hoc Explanations May Be Ineffective for Detecting Unknown Spurious Correlation Julius Adebayo, Michael Muelly, Harold Abelson, Been Kim
ICLRW 2022 Saliency Maps Contain Network "Fingerprints" Amy Widdicombe, Simon Julier, Been Kim
NeurIPSW 2021 Advanced Methods for Connectome-Based Predictive Modeling of Human Intelligence: A Novel Approach Based on Individual Differences in Cortical Topography Evan Anderson, Ramsey Wilcox, Anuj Nayak, Christopher Zwilling, Pablo Robles-Granda, Lav R. Varshney, Been Kim, Aron Barbey
ICML 2020 Concept Bottleneck Models Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, Percy Liang
NeurIPS 2020 Debugging Tests for Model Explanations Julius Adebayo, Michael Muelly, Ilaria Liccardi, Been Kim
NeurIPS 2020 On Completeness-Aware Concept-Based Explanations in Deep Neural Networks Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, Pradeep K. Ravikumar
NeurIPS 2019 A Benchmark for Interpretability Methods in Deep Neural Networks Sara Hooker, Dumitru Erhan, Pieter-Jan Kindermans, Been Kim
AISTATS 2019 Interpreting Black Box Predictions Using Fisher Kernels Rajiv Khanna, Been Kim, Joydeep Ghosh, Sanmi Koyejo
NeurIPS 2019 Towards Automatic Concept-Based Explanations Amirata Ghorbani, James Wexler, James Y Zou, Been Kim
NeurIPS 2019 Visualizing and Measuring the Geometry of BERT Emily Reif, Ann Yuan, Martin Wattenberg, Fernanda B Viegas, Andy Coenen, Adam Pearce, Been Kim
NeurIPS 2018 Human-in-the-Loop Interpretability Prior Isaac Lage, Andrew Ross, Samuel J Gershman, Been Kim, Finale Doshi-Velez
ICML 2018 Interpretability Beyond Feature Attribution: Quantitative Testing with Concept Activation Vectors (TCAV) Been Kim, Martin Wattenberg, Justin Gilmer, Carrie Cai, James Wexler, Fernanda Viegas, Rory Sayres
ICLR 2018 Learning How to Explain Neural Networks: PatternNet and PatternAttribution Pieter-Jan Kindermans, Kristof T. Schütt, Maximilian Alber, Klaus-Robert Müller, Dumitru Erhan, Been Kim, Sven Dähne
NeurIPS 2018 Sanity Checks for Saliency Maps Julius Adebayo, Justin Gilmer, Michael Muelly, Ian Goodfellow, Moritz Hardt, Been Kim
NeurIPS 2018 To Trust or Not to Trust a Classifier Heinrich Jiang, Been Kim, Melody Guan, Maya Gupta
NeurIPS 2016 Examples Are Not Enough, Learn to Criticize! Criticism for Interpretability Been Kim, Rajiv Khanna, Oluwasanmi O Koyejo
JAIR 2015 Inferring Team Task Plans from Human Meetings: A Generative Modeling Approach with Logic-Based Prior Been Kim, Caleb M. Chacha, Julie A. Shah
NeurIPS 2015 Mind the Gap: A Generative Approach to Interpretable Feature Selection and Extraction Been Kim, Julie A Shah, Finale Doshi-Velez
AAAI 2015 Scalable and Interpretable Data Representation for High-Dimensional, Complex Data Been Kim, Kayur Patel, Afshin Rostamizadeh, Julie A. Shah
NeurIPS 2014 The Bayesian Case Model: A Generative Approach for Case-Based Reasoning and Prototype Classification Been Kim, Cynthia Rudin, Julie A Shah
AAAI 2013 Inferring Robot Task Plans from Human Team Meetings: A Generative Modeling Approach with Logic-Based Prior Been Kim, Caleb M. Chacha, Julie A. Shah