Toward More Generalized Malicious URL Detection Models
Abstract
This paper reveals a data bias issue that can profoundly hinder the performance of machine learning models in malicious URL detection. We describe how such bias can be diagnosed using interpretable machine learning techniques and further argue that such biases naturally exist in the real world security data for training a classification model. To counteract these challenges, we propose a debiased training strategy that can be applied to most deep-learning based models to alleviate the negative effects of the biased features. The solution is based on the technique of adversarial training to train deep neural networks learning invariant embedding from biased data. Through extensive experimentation, we substantiate that our innovative strategy fosters superior generalization capabilities across both CNN-based and RNN-based detection models. The findings presented in this work not only expose a latent issue in the field but also provide an actionable remedy, marking a significant step forward in the pursuit of more reliable and robust malicious URL detection.
Cite
Text
Tsai et al. "Toward More Generalized Malicious URL Detection Models." AAAI Conference on Artificial Intelligence, 2024. doi:10.1609/AAAI.V38I19.30161Markdown
[Tsai et al. "Toward More Generalized Malicious URL Detection Models." AAAI Conference on Artificial Intelligence, 2024.](https://mlanthology.org/aaai/2024/tsai2024aaai-more/) doi:10.1609/AAAI.V38I19.30161BibTeX
@inproceedings{tsai2024aaai-more,
title = {{Toward More Generalized Malicious URL Detection Models}},
author = {Tsai, Yun-Da and Liow, Cayon and Siang, Yin Sheng and Lin, Shou-De},
booktitle = {AAAI Conference on Artificial Intelligence},
year = {2024},
pages = {21628-21636},
doi = {10.1609/AAAI.V38I19.30161},
url = {https://mlanthology.org/aaai/2024/tsai2024aaai-more/}
}