INCOLLECTION

Fast approximate optimization in high dimensions with core-sets and fast dimension reduction

Introduction to HPC with MPI for Data Science | pages 231-244, 2016

Author

Nielsen, Frank

Abstract

Often one is interested to solve optimization problems on very large size data-sets. It can be computationally interesting not to solve for the exact optimal solution (or one of the optimal solutions when several such optimal solutions exist) but rather seek for a guaranteed approximation in faster time. For example, consider the k-means clustering problem: we seek to minimize the k-means cost function (i.e., weighted sum of cluster variances, see Chap. 7). It makes sense to think that under some stability hypothesis to point perturbations of the optimal clustering, an approximation of the minimization of the cost function (instead of the regular k-means objective function) will end up with a good (if not an optimal) clustering.

Related Members