MoAT: Meta-Evaluation of Anti-Malware Trustworthiness

Abstract

Many studies have proposed methods for the automated detection of malware. The benchmarks used for evaluating these methods often vary, hindering a trustworthy comparative analysis of models. We analyzed the evaluation criteria of over 100 malware detection methods from 2018-2022 in order to understand the current state of malware detection. From our study, we devised several criteria for evaluating future malware detection methods. Our findings indicate that a finer-grained class balance in datasets is necessary to ensure the robustness of models. In addition, a metric robust to distribution shifts, e.g. PR-AUC, should be used in future studies to prevent the inflation of results in unrealistic distribution regimes. The composition of datasets should also be disclosed in order to ensure a fair comparison of models. To our knowledge, this study is the first to assess the trustworthiness of evaluations from multi-domain malware detection methods.

Cite

Text

Lin and Paar. "MoAT: Meta-Evaluation of Anti-Malware Trustworthiness." NeurIPS 2022 Workshops: MLSW, 2022.

Markdown

[Lin and Paar. "MoAT: Meta-Evaluation of Anti-Malware Trustworthiness." NeurIPS 2022 Workshops: MLSW, 2022.](https://mlanthology.org/neuripsw/2022/lin2022neuripsw-moat/)

BibTeX

@inproceedings{lin2022neuripsw-moat,
  title     = {{MoAT: Meta-Evaluation of Anti-Malware Trustworthiness}},
  author    = {Lin, Sharon and Paar, Christof},
  booktitle = {NeurIPS 2022 Workshops: MLSW},
  year      = {2022},
  url       = {https://mlanthology.org/neuripsw/2022/lin2022neuripsw-moat/}
}