[Back]


Talks and Poster Presentations (with Proceedings-Entry):

S. Bandyapadhyay, F. Fomin, P. Golovach, K. Simonov:
"Parameterized Complexity of Feature Selection for Categorical Data Clustering";
Talk: International Symposium on Mathematical Foundations of Computer Science (MFCS), Tallinn (Estonia); 2021-08-23 - 2021-08-27; in: "46th International Symposium on Mathematical Foundations of Computer Science (MFCS 2021)", LIPICS, 202 (2021), ISBN: 978-3-95977-201-3; 1 - 14.



English abstract:
We develop new algorithmic methods with provable guarantees for feature selection in regard to
categorical data clustering. While feature selection is one of the most common approaches to reduce
dimensionality in practice, most of the known feature selection methods are heuristics. We study
the following mathematical model. We assume that there are some inadvertent (or undesirable)
features of the input data that unnecessarily increase the cost of clustering. Consequently, we want
to select a subset of the original features from the data such that there is a small-cost clustering
on the selected features. More precisely, for given integers ℓ (the number of irrelevant features)
and k (the number of clusters), budget B, and a set of n categorical data points (represented by
m-dimensional vectors whose elements belong to a finite set of values Σ), we want to select m − ℓ
relevant features such that the cost of any optimal k-clustering on these features does not exceed
B. Here the cost of a cluster is the sum of Hamming distances (ℓ0-distances) between the selected
features of the elements of the cluster and its center. The clustering cost is the total sum of the
costs of the clusters.
We use the framework of parameterized complexity to identify how the complexity of the problem
depends on parameters k, B, and |Σ|. Our main result is an algorithm that solves the Feature
Selection problem in time f(k,B, |Σ|) mg(k,|Σ|) n2 for some functions f and g. In other words,
the problem is fixed-parameter tractable parameterized by B when |Σ| and k are constants. Our
algorithm for Feature Selection is based on a solution to a more general problem, Constrained
Clustering with Outliers. In this problem, we want to delete a certain number of outliers such
that the remaining points could be clustered around centers satisfying specific constraints. One
interesting fact about Constrained Clustering with Outliers is that besides Feature Selection, it
encompasses many other fundamental problems regarding categorical data such as Robust Clustering,
Binary and Boolean Low-rank Matrix Approximation with Outliers, and Binary Robust Projective
Clustering. Thus as a byproduct of our theorem, we obtain algorithms for all these problems. We
also complement our algorithmic findings with complexity lower bounds.


"Official" electronic version of the publication (accessed through its Digital Object Identifier - DOI)
http://dx.doi.org/10.4230/LIPIcs.MFCS.2021.14

Electronic version of the publication:
https://publik.tuwien.ac.at/files/publik_300323.pdf


Created from the Publication Database of the Vienna University of Technology.