Enabling the Visualization of Distributional Shift Using Shapley Values

Abstract

In streaming data, distributional shifts can appear both in the univariate dimensions and in the joint distributions with the labels. However, in many real-time scenarios, labels are often either missing or delayed; Unsupervised drift detection methods are desired in those applications. We design slidSHAPs, a novel representation method for unlabelled data streams. Commonly known in machine learning models, Shapley values offer a way to exploit correlation dependencies among random variables; We develop an unsuper- vised sliding Shapley value series for categorical time series representing the data stream in a newly defined latent space and track the feature correlation changes. Transforming the original time series to the slidSHAPs allows us to track how distributional shifts affect the correlations among the input variables; the approach is independent of any kind of labeling. We show how abrupt distributional shifts in the input variables are transformed into smoother changes in the slidSHAPs; Moreover, slidSHAP allows for intuitive visualization of the shifts when they are not observable in the original data.

Cite

Text

Li et al. "Enabling the Visualization of Distributional Shift Using Shapley Values." NeurIPS 2022 Workshops: DistShift, 2022.

Markdown

[Li et al. "Enabling the Visualization of Distributional Shift Using Shapley Values." NeurIPS 2022 Workshops: DistShift, 2022.](https://mlanthology.org/neuripsw/2022/li2022neuripsw-enabling/)

BibTeX

@inproceedings{li2022neuripsw-enabling,
  title     = {{Enabling the Visualization of Distributional Shift Using Shapley Values}},
  author    = {Li, Bin and Balestra, Chiara and Müller, Emmanuel},
  booktitle = {NeurIPS 2022 Workshops: DistShift},
  year      = {2022},
  url       = {https://mlanthology.org/neuripsw/2022/li2022neuripsw-enabling/}
}