INCOLLECTION

Partial K-means with M outliers: Mathematical programs and complexity results

Communications in Computer and Information Science | pages 287--303, 2023

Author

Dupin, Nicolas and Nielsen, Frank

Abstract

A well-known bottleneck of Min-Sum-of-Square Clustering (MSSC, the celebrated k-means problem) is to tackle the presence of outliers. In this paper, we propose a Partial clustering variant termed PMSSC which considers a fixed number of outliers to remove. We solve PMSSC by Integer Programming formulations and complexity results extending the ones from MSSC are studied. PMSSC is NP-hard in Euclidean space when the dimension or the number of clusters is greater than 2. Finally, one-dimensional cases are studied: Unweighted PMSSC is polynomial in that case and solved with a dynamic programming algorithm, extending the optimality property of MSSC with interval clustering. This result holds also for unweighted k-medoids with outliers. A weaker optimality property holds for weighted PMSSC, but NP-hardness or not remains an open question in dimension one.

Related Members