Accelerated Combinatorial Search for Outlier Detection with Provable Bound on Sub-Optimality
Abstract
Outliers negatively affect the accuracy of data analysis. In this paper we are concerned with their influence on the accuracy of Principal Component Analysis (PCA). Algorithms that attempt to detect outliers and remove them from the data prior to applying PCA are sometimes called Robust PCA, or Robust Subspace Recovery algorithms. We propose a new algorithm for outlier detection that combines two ideas. The first is "chunk recursive elimination" that was used effectively to accelerate feature selection, and the second is combinatorial search, in a setting similar to A*. Our main result is showing how to combine these two ideas. One variant of our algorithm is guaranteed to compute an optimal solution according to some natural criteria, but its running time makes it impractical for large datasets. Other variants are much faster and come with provable bounds on sub-optimality. Experimental results show the effectiveness of the proposed approach.
Cite
Text
Wan and Schweitzer. "Accelerated Combinatorial Search for Outlier Detection with Provable Bound on Sub-Optimality." AAAI Conference on Artificial Intelligence, 2021. doi:10.1609/AAAI.V35I14.17475Markdown
[Wan and Schweitzer. "Accelerated Combinatorial Search for Outlier Detection with Provable Bound on Sub-Optimality." AAAI Conference on Artificial Intelligence, 2021.](https://mlanthology.org/aaai/2021/wan2021aaai-accelerated/) doi:10.1609/AAAI.V35I14.17475BibTeX
@inproceedings{wan2021aaai-accelerated,
title = {{Accelerated Combinatorial Search for Outlier Detection with Provable Bound on Sub-Optimality}},
author = {Wan, Guihong and Schweitzer, Haim},
booktitle = {AAAI Conference on Artificial Intelligence},
year = {2021},
pages = {12436-12444},
doi = {10.1609/AAAI.V35I14.17475},
url = {https://mlanthology.org/aaai/2021/wan2021aaai-accelerated/}
}