MoAT: Meta-Evaluation of Anti-Malware Trustworthiness
Abstract
Many studies have proposed methods for the automated detection of malware. The benchmarks used for evaluating these methods often vary, hindering a trustworthy comparative analysis of models. We analyzed the evaluation criteria of over 100 malware detection methods from 2018-2022 in order to understand the current state of malware detection. From our study, we devised several criteria for evaluating future malware detection methods. Our findings indicate that a finer-grained class balance in datasets is necessary to ensure the robustness of models. In addition, a metric robust to distribution shifts, e.g. PR-AUC, should be used in future studies to prevent the inflation of results in unrealistic distribution regimes. The composition of datasets should also be disclosed in order to ensure a fair comparison of models. To our knowledge, this study is the first to assess the trustworthiness of evaluations from multi-domain malware detection methods.
Cite
Text
Lin and Paar. "MoAT: Meta-Evaluation of Anti-Malware Trustworthiness." NeurIPS 2022 Workshops: MLSW, 2022.Markdown
[Lin and Paar. "MoAT: Meta-Evaluation of Anti-Malware Trustworthiness." NeurIPS 2022 Workshops: MLSW, 2022.](https://mlanthology.org/neuripsw/2022/lin2022neuripsw-moat/)BibTeX
@inproceedings{lin2022neuripsw-moat,
title = {{MoAT: Meta-Evaluation of Anti-Malware Trustworthiness}},
author = {Lin, Sharon and Paar, Christof},
booktitle = {NeurIPS 2022 Workshops: MLSW},
year = {2022},
url = {https://mlanthology.org/neuripsw/2022/lin2022neuripsw-moat/}
}