Joint Distribution–Informed Shapley Values for Sparse Counterfactual Explanations

Abstract

Counterfactual explanations (CE) aim to reveal how small input changes flip a model’s prediction, yet many methods modify more features than necessary, reducing clarity and actionability. We introduce COLA, a model- and generator-agnostic post-hoc framework that refines any given CE by computing a coupling via optimal transport (OT) between factual and counterfactual sets and using it to drive a Shapley-based attribution p-SHAP that selects a minimal set of edits while preserving the target effect. Theoretically, OT minimizes an upper bound on the $W_1$ divergence between factual and counterfactual outcomes and that, under mild conditions, refined counterfactuals are guaranteed not to move farther from the factuals than the originals. Empirically, across four datasets, twelve models, and five CE generators, COLA achieves the same target effects with only 26–45% of the original feature edits. On a small-scale benchmark, COLA shows near-optimality.

Cite

Text

You et al. "Joint Distribution–Informed Shapley Values for Sparse Counterfactual Explanations." International Conference on Learning Representations, 2026.

Markdown

[You et al. "Joint Distribution–Informed Shapley Values for Sparse Counterfactual Explanations." International Conference on Learning Representations, 2026.](https://mlanthology.org/iclr/2026/you2026iclr-joint/)

BibTeX

@inproceedings{you2026iclr-joint,
  title     = {{Joint Distribution–Informed Shapley Values for Sparse Counterfactual Explanations}},
  author    = {You, Lei and Bian, Yijun and Cao, Lele},
  booktitle = {International Conference on Learning Representations},
  year      = {2026},
  url       = {https://mlanthology.org/iclr/2026/you2026iclr-joint/}
}