Estimating Car Insurance Premia: A Case Study in High-Dimensional Data Inference

Abstract

Estimating insurance premia from data is a difficult regression problem for several reasons: the large number of variables, many of which are .discrete, and the very peculiar shape of the noise distri(cid:173) bution, asymmetric with fat tails, with a large majority zeros and a few unreliable and very large values. We compare several machine learning methods for estimating insurance premia, and test them on a large data base of car insurance policies. We find that func(cid:173) tion approximation methods that do not optimize a squared loss, like Support Vector Machines regression, do not work well in this context. Compared methods include decision trees and generalized linear models. The best results are obtained with a mixture of experts, which better identifies the least and most risky contracts, and allows to reduce the median premium by charging more to the most risky customers.

Cite

Text

Chapados et al. "Estimating Car Insurance Premia: A Case Study in High-Dimensional Data Inference." Neural Information Processing Systems, 2001.

Markdown

[Chapados et al. "Estimating Car Insurance Premia: A Case Study in High-Dimensional Data Inference." Neural Information Processing Systems, 2001.](https://mlanthology.org/neurips/2001/chapados2001neurips-estimating/)

BibTeX

@inproceedings{chapados2001neurips-estimating,
  title     = {{Estimating Car Insurance Premia: A Case Study in High-Dimensional Data Inference}},
  author    = {Chapados, Nicolas and Bengio, Yoshua and Vincent, Pascal and Ghosn, Joumana and Dugas, Charles and Takeuchi, Ichiro and Meng, Linyan},
  booktitle = {Neural Information Processing Systems},
  year      = {2001},
  pages     = {1369-1376},
  url       = {https://mlanthology.org/neurips/2001/chapados2001neurips-estimating/}
}