Interpretable Debiasing of Vectorized Language Representations with Iterative Orthogonalization

Aboagye, Prince Osei; Zheng, Yan; Shunn, Jack; Yeh, Chin-Chia Michael; Wang, Junpeng; Zhuang, Zhongfang; Chen, Huiyuan; Wang, Liang; Zhang, Wei; Phillips, Jeff

Interpretable Debiasing of Vectorized Language Representations with Iterative Orthogonalization

Prince Osei Aboagye, Yan Zheng, Jack Shunn, Chin-Chia Michael Yeh, Junpeng Wang, Zhongfang Zhuang, Huiyuan Chen, Liang Wang, Wei Zhang, Jeff Phillips

ICLR 2023

/iclr/2023/aboagye2023iclr-interpretable/

Abstract

We propose a new mechanism to augment a word vector embedding representation that offers improved bias removal while retaining the key information—resulting in improved interpretability of the representation. Rather than removing the information associated with a concept that may induce bias, our proposed method identifies two concept subspaces and makes them orthogonal. The resulting representation has these two concepts uncorrelated. Moreover, because they are orthogonal, one can simply apply a rotation on the basis of the representation so that the resulting subspace corresponds with coordinates. This explicit encoding of concepts to coordinates works because they have been made fully orthogonal, which previous approaches do not achieve. Furthermore, we show that this can be extended to multiple subspaces. As a result, one can choose a subset of concepts to be represented transparently and explicitly, while the others are retained in the mixed but extremely expressive format of the representation.

PDF ICLR Semantic Scholar

Cite

Text

Aboagye et al. "Interpretable Debiasing of Vectorized Language Representations with Iterative Orthogonalization." International Conference on Learning Representations, 2023.

Markdown

[Aboagye et al. "Interpretable Debiasing of Vectorized Language Representations with Iterative Orthogonalization." International Conference on Learning Representations, 2023.](https://mlanthology.org/iclr/2023/aboagye2023iclr-interpretable/)

BibTeX

@inproceedings{aboagye2023iclr-interpretable,
  title     = {{Interpretable Debiasing of Vectorized Language Representations with Iterative Orthogonalization}},
  author    = {Aboagye, Prince Osei and Zheng, Yan and Shunn, Jack and Yeh, Chin-Chia Michael and Wang, Junpeng and Zhuang, Zhongfang and Chen, Huiyuan and Wang, Liang and Zhang, Wei and Phillips, Jeff},
  booktitle = {International Conference on Learning Representations},
  year      = {2023},
  url       = {https://mlanthology.org/iclr/2023/aboagye2023iclr-interpretable/}
}