Lakkaraju, Himabindu

70 publications

ICLRW 2025 Building Bridges, Not Walls: Advancing Interpretability by Unifying Feature, Data, and Model Component Attribution Shichang Zhang, Tessa Han, Usha Bhalla, Himabindu Lakkaraju
NeurIPS 2025 EvoLM: In Search of Lost Training Dynamics for Language Model Reasoning Zhenting Qi, Fan Nie, Alexandre Alahi, James Zou, Himabindu Lakkaraju, Yilun Du, Eric P. Xing, Sham M. Kakade, Hanlin Zhang
ICLR 2025 Follow My Instruction and Spill the Beans: Scalable Data Extraction from Retrieval-Augmented Generation Systems Zhenting Qi, Hanlin Zhang, Eric P. Xing, Sham M. Kakade, Himabindu Lakkaraju
NeurIPS 2025 Inference-Time Reward Hacking in Large Language Models Hadi Khalaf, Claudio Mayrink Verdun, Alex Oesterling, Himabindu Lakkaraju, Flavio Calmon
NeurIPS 2025 Measuring the Faithfulness of Thinking Drafts in Large Reasoning Models Zidi Xiong, Shan Chen, Zhenting Qi, Himabindu Lakkaraju
ICLR 2025 More RLHF, More Trust? on the Impact of Preference Alignment on Trustworthiness Aaron Jiaxun Li, Satyapriya Krishna, Himabindu Lakkaraju
ICLR 2025 Quantifying Generalization Complexity for Large Language Models Zhenting Qi, Hongyin Luo, Xuliang Huang, Zhuokai Zhao, Yibo Jiang, Xiangjun Fan, Himabindu Lakkaraju, James R. Glass
ICMLW 2024 All Roads Lead to Rome? Exploring Representational Similarities Between Latent Spaces of Generative Image Models Charumathi Badrinath, Usha Bhalla, Alex Oesterling, Suraj Srinivas, Himabindu Lakkaraju
ICMLW 2024 All Roads Lead to Rome? Exploring Representational Similarities Between Latent Spaces of Generative Image Models Charumathi Badrinath, Usha Bhalla, Alex Oesterling, Suraj Srinivas, Himabindu Lakkaraju
ICMLW 2024 All Roads Lead to Rome? Exploring Representational Similarities Between Latent Spaces of Generative Image Models Charumathi Badrinath, Usha Bhalla, Alex Oesterling, Suraj Srinivas, Himabindu Lakkaraju
UAI 2024 Characterizing Data Point Vulnerability as Average-Case Robustness Tessa Han, Suraj Srinivas, Himabindu Lakkaraju
ICMLW 2024 Explaining the Model, Protecting Your Data: Revealing and Mitigating the Data Privacy Risks of Post-Hoc Model Explanations via Membership Inference Catherine Huang, Martin Pawelczyk, Himabindu Lakkaraju
AISTATS 2024 Fair Machine Unlearning: Data Removal While Mitigating Disparities Alex Oesterling, Jiaqi Ma, Flavio Calmon, Himabindu Lakkaraju
ICLRW 2024 Follow My Instruction and Spill the Beans: Scalable Data Extraction from Retrieval-Augmented Generation Systems Zhenting Qi, Hanlin Zhang, Eric P. Xing, Sham M. Kakade, Himabindu Lakkaraju
ICML 2024 In-Context Unlearning: Language Models as Few-Shot Unlearners Martin Pawelczyk, Seth Neel, Himabindu Lakkaraju
NeurIPS 2024 Interpreting CLIP with Sparse Linear Concept Embeddings (SpLiCE) Usha Bhalla, Alex Oesterling, Suraj Srinivas, Flavio P. Calmon, Himabindu Lakkaraju
NeurIPS 2024 MedSafetyBench: Evaluating and Improving the Medical Safety of Large Language Models Tessa Han, Aounon Kumar, Chirag Agarwal, Himabindu Lakkaraju
ICMLW 2024 On the Difficulty of Faithful Chain-of-Thought Reasoning in Large Language Models Sree Harsha Tanneru, Dan Ley, Chirag Agarwal, Himabindu Lakkaraju
ICMLW 2024 On the Privacy Risks of Post-Hoc Explanations of Foundation Models Catherine Huang, Martin Pawelczyk, Himabindu Lakkaraju
AISTATS 2024 Quantifying Uncertainty in Natural Language Explanations of Large Language Models Sree Harsha Tanneru, Chirag Agarwal, Himabindu Lakkaraju
TMLR 2024 The Disagreement Problem in Explainable Machine Learning: A Practitioner’s Perspective Satyapriya Krishna, Tessa Han, Alex Gu, Steven Wu, Shahin Jabbari, Himabindu Lakkaraju
ICMLW 2024 Towards Safe Large Language Models for Medicine Tessa Han, Aounon Kumar, Chirag Agarwal, Himabindu Lakkaraju
ICMLW 2024 Towards Safe Large Language Models for Medicine Tessa Han, Aounon Kumar, Chirag Agarwal, Himabindu Lakkaraju
ICMLW 2024 Towards Safe Large Language Models for Medicine Tessa Han, Aounon Kumar, Chirag Agarwal, Himabindu Lakkaraju
ICML 2024 Understanding the Effects of Iterative Prompting on Truthfulness Satyapriya Krishna, Chirag Agarwal, Himabindu Lakkaraju
NeurIPS 2023 $\mathcal{M}^4$: A Unified XAI Benchmark for Faithfulness Evaluation of Feature Attribution Methods Across Metrics, Modalities and Models Xuhong Li, Mengnan Du, Jiamin Chen, Yekun Chai, Himabindu Lakkaraju, Haoyi Xiong
NeurIPSW 2023 A Study on the Calibration of In-Context Learning Hanlin Zhang, YiFan Zhang, Yaodong Yu, Dhruv Madeka, Dean Foster, Eric P. Xing, Himabindu Lakkaraju, Sham M. Kakade
ICMLW 2023 Accurate, Explainable, and Private Models: Providing Recourse While Minimizing Training Data Leakage Catherine Huang, Chelse Swoopes, Christina Xiao, Jiaqi Ma, Himabindu Lakkaraju
ICMLW 2023 Analyzing Chain-of-Thought Prompting in Large Language Models via Gradient-Based Feature Attributions Skyler Wu, Eric Meng Shen, Charumathi Badrinath, Jiaqi Ma, Himabindu Lakkaraju
NeurIPSW 2023 Are Large Language Models Post Hoc Explainers? Nicholas Kroeger, Dan Ley, Satyapriya Krishna, Chirag Agarwal, Himabindu Lakkaraju
NeurIPSW 2023 Are Large Language Models Post Hoc Explainers? Nicholas Kroeger, Dan Ley, Satyapriya Krishna, Chirag Agarwal, Himabindu Lakkaraju
ICMLW 2023 Consistent Explanations in the Face of Model Indeterminacy via Ensembling Dan Ley, Leonard Tang, Matthew Nazari, Hongjin Lin, Suraj Srinivas, Himabindu Lakkaraju
NeurIPS 2023 Discriminative Feature Attributions: Bridging Post Hoc Explainability and Inherent Interpretability Usha Bhalla, Suraj Srinivas, Himabindu Lakkaraju
ICMLW 2023 Efficient Estimation of Local Robustness of Machine Learning Models Tessa Han, Suraj Srinivas, Himabindu Lakkaraju
NeurIPSW 2023 Investigating the Fairness of Large Language Models for Predictions on Tabular Data Yanchen Liu, Srishti Gautam, Jiaqi Ma, Himabindu Lakkaraju
UAI 2023 On Minimizing the Impact of Dataset Shifts on Actionable Explanations Anna P. Meyer, Dan Ley, Suraj Srinivas, Himabindu Lakkaraju
ICML 2023 On the Impact of Algorithmic Recourse on Social Segregation Ruijiang Gao, Himabindu Lakkaraju
AISTATS 2023 On the Privacy Risks of Algorithmic Recourse Martin Pawelczyk, Himabindu Lakkaraju, Seth Neel
NeurIPS 2023 Post Hoc Explanations of Language Models Can Improve Language Models Satyapriya Krishna, Jiaqi Ma, Dylan Slack, Asma Ghandeharioun, Sameer Singh, Himabindu Lakkaraju
ICLR 2023 Probabilistically Robust Recourse: Navigating the Trade-Offs Between Costs and Robustness in Algorithmic Recourse Martin Pawelczyk, Teresa Datta, Johan Van den Heuvel, Gjergji Kasneci, Himabindu Lakkaraju
NeurIPSW 2023 Quantifying Uncertainty in Natural Language Explanations of Large Language Models Sree Harsha Tanneru, Chirag Agarwal, Himabindu Lakkaraju
ICML 2023 Towards Bridging the Gaps Between the Right to Explanation and the Right to Be Forgotten Satyapriya Krishna, Jiaqi Ma, Himabindu Lakkaraju
ICMLW 2023 Verifiable Feature Attributions: A Bridge Between Post Hoc Explainability and Inherent Interpretability Usha Bhalla, Suraj Srinivas, Himabindu Lakkaraju
TMLR 2023 When Does Uncertainty Matter?: Understanding the Impact of Predictive Uncertainty in ML Assisted Decision Making Sean McGrath, Parth Mehta, Alexandra Zytek, Isaac Lage, Himabindu Lakkaraju
NeurIPS 2023 Which Models Have Perceptually-Aligned Gradients? an Explanation via Off-Manifold Robustness Suraj Srinivas, Sebastian Bordt, Himabindu Lakkaraju
ICMLW 2023 Which Models Have Perceptually-Aligned Gradients? an Explanation via Off-Manifold Robustness Suraj Srinivas, Sebastian Bordt, Himabindu Lakkaraju
ICMLW 2023 Word-Level Explanations for Analyzing Bias in Text-to-Image Models Alexander Lin, Lucas Monteiro Paes, Sree Harsha Tanneru, Suraj Srinivas, Himabindu Lakkaraju
AISTATS 2022 Exploring Counterfactual Explanations Through the Lens of Adversarial Examples: A Theoretical and Empirical Analysis Martin Pawelczyk, Chirag Agarwal, Shalmali Joshi, Sohini Upadhyay, Himabindu Lakkaraju
AISTATS 2022 Probing GNN Explainers: A Rigorous Theoretical and Empirical Analysis of GNN Explanation Methods Chirag Agarwal, Marinka Zitnik, Himabindu Lakkaraju
ICLRW 2022 Data Poisoning Attacks on Off-Policy Policy Evaluation Algorithms Elita Lobo, Harvineet Singh, Marek Petrik, Cynthia Rudin, Himabindu Lakkaraju
UAI 2022 Data Poisoning Attacks on Off-Policy Policy Evaluation Methods Elita Lobo, Harvineet Singh, Marek Petrik, Cynthia Rudin, Himabindu Lakkaraju
NeurIPS 2022 Efficient Training of Low-Curvature Neural Networks Suraj Srinivas, Kyle Matoba, Himabindu Lakkaraju, François Fleuret
NeurIPSW 2022 On the Impact of Adversarially Robust Models on Algorithmic Recourse Satyapriya Krishna, Chirag Agarwal, Himabindu Lakkaraju
NeurIPS 2022 OpenXAI: Towards a Transparent Evaluation of Model Explanations Chirag Agarwal, Satyapriya Krishna, Eshika Saxena, Martin Pawelczyk, Nari Johnson, Isha Puri, Marinka Zitnik, Himabindu Lakkaraju
ICLRW 2022 Rethinking Stability for Attribution-Based Explanations Chirag Agarwal, Nari Johnson, Martin Pawelczyk, Satyapriya Krishna, Eshika Saxena, Marinka Zitnik, Himabindu Lakkaraju
NeurIPSW 2022 TalkToModel: Explaining Machine Learning Models with Interactive Natural Language Conversations Dylan Z Slack, Satyapriya Krishna, Himabindu Lakkaraju, Sameer Singh
NeurIPS 2022 Which Explanation Should I Choose? a Function Approximation Perspective to Characterizing Post Hoc Explanations Tessa Han, Suraj Srinivas, Himabindu Lakkaraju
NeurIPS 2021 Counterfactual Explanations Can Be Manipulated Dylan Slack, Anna Hilgard, Himabindu Lakkaraju, Sameer Singh
AAAI 2021 Fair Influence Maximization: A Welfare Optimization Approach Aida Rahmattalabi, Shahin Jabbari, Himabindu Lakkaraju, Phebe Vayanos, Max Izenberg, Ryan Brown, Eric Rice, Milind Tambe
NeurIPS 2021 Learning Models for Actionable Recourse Alexis Ross, Himabindu Lakkaraju, Osbert Bastani
NeurIPS 2021 Reliable Post Hoc Explanations: Modeling Uncertainty in Explainability Dylan Slack, Anna Hilgard, Sameer Singh, Himabindu Lakkaraju
NeurIPS 2021 Towards Robust and Reliable Algorithmic Recourse Sohini Upadhyay, Shalmali Joshi, Himabindu Lakkaraju
UAI 2021 Towards a Unified Framework for Fair and Stable Graph Representation Learning Chirag Agarwal, Himabindu Lakkaraju, Marinka Zitnik
ICML 2021 Towards the Unification and Robustness of Perturbation and Gradient Based Explanations Sushant Agarwal, Shahin Jabbari, Chirag Agarwal, Sohini Upadhyay, Steven Wu, Himabindu Lakkaraju
NeurIPS 2020 Beyond Individualized Recourse: Interpretable and Interactive Summaries of Actionable Recourses Kaivalya Rawal, Himabindu Lakkaraju
NeurIPS 2020 Incorporating Interpretable Output Constraints in Bayesian Neural Networks Wanqian Yang, Lars Lorch, Moritz Graule, Himabindu Lakkaraju, Finale Doshi-Velez
ICML 2020 Robust and Stable Black Box Explanations Himabindu Lakkaraju, Nino Arsov, Osbert Bastani
AAAI 2017 Identifying Unknown Unknowns in the Open World: Representations and Policies for Guided Exploration Himabindu Lakkaraju, Ece Kamar, Rich Caruana, Eric Horvitz
AISTATS 2017 Learning Cost-Effective and Interpretable Treatment Regimes Himabindu Lakkaraju, Cynthia Rudin
NeurIPS 2016 Confusions over Time: An Interpretable Bayesian Model to Characterize Trends in Decision Making Himabindu Lakkaraju, Jure Leskovec