Debiasing Global Workspace: A Cognitive Neural Framework for Learning Debiased and Interpretable Representations

Abstract

Deep Neural Networks (DNNs) often make predictions based on "spurious" attributes when trained on biased datasets, where most samples have features spuriously correlated with the target labels. This can be problematic if irrelevant features are easier for the model to learn than the truly relevant ones. Existing debiasing methods require predefined bias labels and entail computational complexity with additional networks. We propose an alternative approach inspired by cognitive science, called Debiasing Global Workspace (DGW). DGW consists of specialized modules and a shared workspace, allowing for increased modularity and improved debiasing performance. Additionally, our method enhances the transparency of decision-making processes through attention masks. We validate DGW across various biased datasets, proving its effectiveness in better debiasing performance.

Cite

Text

Hong et al. "Debiasing Global Workspace: A Cognitive Neural Framework for Learning Debiased and Interpretable Representations." NeurIPS 2024 Workshops: Behavioral_ML, 2024.

Markdown

[Hong et al. "Debiasing Global Workspace: A Cognitive Neural Framework for Learning Debiased and Interpretable Representations." NeurIPS 2024 Workshops: Behavioral_ML, 2024.](https://mlanthology.org/neuripsw/2024/hong2024neuripsw-debiasing/)

BibTeX

@inproceedings{hong2024neuripsw-debiasing,
  title     = {{Debiasing Global Workspace: A Cognitive Neural Framework for Learning Debiased and Interpretable Representations}},
  author    = {Hong, Jinyung and Jeon, Eun Som and Kim, Changhoon and Park, Keun Hee and Nath, Utkarsh and Yang, Yezhou and Turaga, Pavan K. and Pavlic, Theodore P.},
  booktitle = {NeurIPS 2024 Workshops: Behavioral_ML},
  year      = {2024},
  url       = {https://mlanthology.org/neuripsw/2024/hong2024neuripsw-debiasing/}
}