ARTICLE

Leveraging k-NN for generic classification boosting

Neurocomputing | Vol.80, pages 3-9, mar, 2012

Author

Piro, Paolo and Nock, Richard and Nielsen, Frank and Barlaud, Michel

Abstract

Voting rules relying on k-nearest neighbors (k-NN) are an effective tool in countless many machine learning techniques. Thanks to its simplicity, k-NN classification is very attractive to practitioners, as it enables very good performances in several practical applications. However, it suffers from various drawbacks, like sensitivity to ``noisy'' instances and poor generalization properties when dealing with sparse high-dimensional data.In this paper, we tackle the k-NN classification problem at its core by providing a novel k-NN boosting approach. Namely, we propose a supervised learning algorithm, called Universal Nearest Neighbors (UNN), that induces a leveraged k-NN rule by globally minimizing a surrogate risk upper bounding the empirical misclassification rate over training data. Interestingly, this surrogate risk can be arbitrary chosen from a class of Bregman loss functions, including the familiar exponential, logistic and squared losses. Furthermore, we show that UNN allows to efficiently filter a dataset of instances by keeping only a small fraction of data.Experimental results on the synthetic Ripley's dataset show that such a filtering strategy is able to reject ``noisy'' examples, and yields a classification error close to the optimal Bayes error. Experiments on standard UCI datasets show significant improvements over the current state of the art.

Related Members