[Back]


Talks and Poster Presentations (without Proceedings-Entry):

P. Filzmoser, S. Brodinova, T. Ortner, C. Breiteneder, M. Rohm:
"Robust k-means-based clustering for high-dimensional data";
Talk: International Conference on Robust Statistics (ICORS 2019), Guayaquil (invited); 2019-05-28 - 2019-05-31.



English abstract:
We introduce a robust k-means-based clustering method for high-dimensional data where not only outliers but also a large number of noise variables are very likely to be present. Although Kondo et al. [2] already addressed such an application scenario, our approach
goes even further. Firstly, the introduced method is designed to identify clusters, informative variables, and outliers simultaneously. Secondly, the proposed clustering technique
additionally aims at optimizing required parameters, e.g. the number of clusters. This is a great advantage over most existing methods. Moreover, the robustness aspect is achieved through a robust initialization [3] and a proposed weighting function using the
Local Outlier Factor [1]. The weighting function provides a valuable source of information about the outlyingness of each observation for a subsequent outlier detection. In order to reveal both clusters and informative variables properly, the approach uses a lasso-type
penalty [4]. The method has thoroughly been tested on simulated as well as on real highdimensional datasets. The conducted experiments demonstrated a great ability of the clustering method to identify clusters, outliers, and informative variables.

Keywords:
k-means, Outliers, High-dimensional data


Electronic version of the publication:
https://publik.tuwien.ac.at/files/publik_282590.pdf


Created from the Publication Database of the Vienna University of Technology.