[Back]


Talks and Poster Presentations (with Proceedings-Entry):

P. Filzmoser, P. Brito, A. Pedro Duarte Silva:
"Outlier detection in interval data";
Talk: COMPSTAT 2014 - 21st International Conference on Computational Statistics hosting the 5th IASC World Conference, Genf; 2014-08-19 - 2014-08-22; in: "Proceedings of COMPSTAT 2014", M. Gilli, G. Gonzalez-Rodriguez, A. Nieto-Reyes (ed.); (2014), ISBN: 978-2-8399-1347-8; 11.



English abstract:
The aim is to identify outliers in multivariate observations that are consisting of interval data. Interval data occur in multiple different
situations, e.g., when describing ranges of variable values, as daily stock prices, or from the aggregation of huge data bases, when
real values describing the individual observations result in intervals in the description of the aggregated data. Each variable of the
multivariate interval data information can be represented by a mid-point and a range. Parametric models have been proposed which
rely on multivariate normal or skew-normal distributions for the mid-points and log-ranges of the interval-valued variables. Different
parameterizations of the joint variance-covariance matrix allow taking into account the relation that might or might not exist between
mid-points and log-ranges of the same or different variables. Here we use the estimates for the joint mean and covariance for
multivariate outlier detection. The Mahalanobis distances based on these estimates provide information on how different individual
multivariate interval data are from the mean with respect to the overall covariance structure. A critical value based on the chisquare
distribution allows distinguishing outliers from regular observations. The outlier diagnostics is particularly interesting when
the covariance between the mid-points and the log-ranges in restricted to be zero. Then, Mahalanobis distances can be computed
separately for mid-points and log-ranges, and the resulting distance-distance plot identifies outliers that can be due to deviations with
respect to the mid-point, or with respect to the range of the interval data, or both.


Electronic version of the publication:
http://compstat2014.org/program-schedule04.html


Created from the Publication Database of the Vienna University of Technology.