Properties of Alternative Data for Fairer Credit Risk Predictions
Abstract
In the consumer lending market, women tend to have lower access to credit than men, despite evidence suggesting that women are better at repaying their debts. This study explores the potential impact of leveraging alternative data, which traditionally has not been used by financial institutions, on credit risk predictions between men and women. By leveraging unique data on individuals’ credit card default behaviors and their purchase behaviors at a supermarket, we simulate a credit card issuer’s credit scoring process. In the absence of supermarket data, the algorithm’s predictive accuracy for women is about 2.3% lower than that for men. We then integrate data from each of the 410 product markets within the supermarket into the algorithm and measure the changes in the gender gap in predictive accuracy. We find a wide variation in both direction and magnitude in the incremental gender gap, ranging from -142% to 70% compared to the baseline. These findings highlight that leveraging alternative data from a non-financial domain can lead to fairer credit outcomes, but only under certain conditions. We characterize the conditions by identifying two data properties: the capacity to proxy gender and the relative amount of creditworthiness signals data provide for each gender.
Cite
Text
Lee and Yang. "Properties of Alternative Data for Fairer Credit Risk Predictions." Data-centric Machine Learning Research, 2024.Markdown
[Lee and Yang. "Properties of Alternative Data for Fairer Credit Risk Predictions." Data-centric Machine Learning Research, 2024.](https://mlanthology.org/dmlr/2024/lee2024dmlr-properties/)BibTeX
@article{lee2024dmlr-properties,
title = {{Properties of Alternative Data for Fairer Credit Risk Predictions}},
author = {Lee, Jung Youn and Yang, Joonhyuk},
journal = {Data-centric Machine Learning Research},
year = {2024},
pages = {1-27},
volume = {2},
url = {https://mlanthology.org/dmlr/2024/lee2024dmlr-properties/}
}