×

zbMATH — the first resource for mathematics

Outlier detection for compositional data using robust methods. (English) Zbl 1135.62040
Summary: Outlier detection based on the Mahalanobis distance (MD) requires an appropriate transformation in the case of compositional data. For the family of log-ratio transformations (additive, centered and isometric log-ratio transformations) it is shown that the MDs based on classical estimates are invariant to these transformations, and that the MDs based on affine equivariant estimators of location and covariance are the same for additive and isometric log-ratio transformations. Moreover, for three-dimensional compositions the data structure can be visualized by contour lines. In higher dimension the MDs of closed and opened data give an impression of the multivariate data behavior.

MSC:
62H05 Characterization and structure theory for multivariate probability distributions; copulas
62F35 Robustness and adaptive procedures (parametric inference)
86A32 Geostatistics
Software:
R; robustbase
PDF BibTeX XML Cite
Full Text: DOI
References:
[1] Aitchison J (1986) The statistical analysis of compositional data. Monographs on statistics and applied probability. Chapman & Hall, London, 416 p · Zbl 0688.62004
[2] Aitchison J (1992) On criteria for measures of compositional difference. Math Geol 24(4):365–379 · Zbl 0970.86531 · doi:10.1007/BF00891269
[3] Aitchison J, Egozcue JJ (2005) Compositional data analysis: where are we and where should we be heading? Math Geol 37(7):829–850 · Zbl 1177.86017 · doi:10.1007/s11004-005-7383-7
[4] Barceló C, Pawlowsky V, Grunsky E (1996) Some aspects of transformations of compositional data and the identification of outliers. Math Geol 28(4):501–518 · doi:10.1007/BF02083658
[5] Barceló-Vidal CB, Martín-Fernandez JA, Pawlowsky-Glahn V (1999) Comment on ”Singularity and nonnormality in the classification of compositional data” by Bohling GC, Davis JC, Olea RA, Harff J (Letter to the editor). Math Geol 31(5):581–585 · doi:10.1023/A:1007520124870
[6] Bohling GC, Davis JC, Olea RA, Harff J (1998) Singularity and nonnormality in the classification of compositional data. Math Geol 30(1):5–20 · doi:10.1023/A:1021705120065
[7] Coakley JP, Rust BR (1968) Sedimentation in an Arctic lake. J Sed Pet 38(4):1290–1300. Quoted in Aitchison (1986), the statistical analysis of compositional data. Chapman & Hall, London, 416 p
[8] Egozcue JJ, Pawlowsky-Glahn V, Mateu-Figueras G, Barceló-Vidal C (2003) Isometric logratio transformations for compositional data analysis. Math Geol 35(3):279–300 · Zbl 1302.86024 · doi:10.1023/A:1023818214614
[9] Filzmoser P, Garrett RG, Reimann C (2005) Multivariate outlier detection in exploration geochemistry. Comput Geosci 31:579–587 · doi:10.1016/j.cageo.2004.11.013
[10] Gnanadesikan R, Kettenring JR (1972) Robust estimates, residuals, and outlier detection with multiresponse data. Biometrics 28:81–124 · doi:10.2307/2528963
[11] Hardin J, Rocke DM (2005) The distribution of robust distances. J Comput Graph Stat 14:928–946 · doi:10.1198/106186005X77685
[12] Harville DA (1997) Matrix algebra from a statistician’s perspective. Springer, New York, 630 p · Zbl 0881.15001
[13] Maronna R, Zamar R (2002) Robust estimates of location and dispersion for high-dimensional data sets. Technometrics 44(4):307–317 · doi:10.1198/004017002188618509
[14] Maronna R, Martin RD, Yohai VJ (2006) Robust statistics: theory and methods. Wiley, New York, 436 p · Zbl 1094.62040
[15] Martín-Fernández JA, Barceló-Vidal C, Pawlowsky-Glahn V (2003) Dealing with zeros and missing values in compositional data sets using nonparametric imputation. Math Geol 35(3):253–278 · Zbl 1302.86027 · doi:10.1023/A:1023866030544
[16] Peña D, Prieto F (2001) Multivariate outlier detection and robust covariance matrix estimation. Technometrics 43(3):286–310 · doi:10.1198/004017001316975899
[17] R development core team, 2006, R: A language and environment for statistical computing. Vienna. http://www.r-project.org
[18] Reimann C, Äyräs M, Chekushin V, Bogatyrev I, Boyd R, Caritat P. d., Dutter R, Finne T, Halleraker J, Jæger O, Kashulina G, Lehto O, Niskavaara H, Pavlov V, Räisänen M, Strand T, Volden T (1998) Environmental geochemical atlas of the Central Barents Region: Geological Survey of Norway (NGU), Geological Survey of Finland (GTK), and Central Kola Expedition (CKE), Special Publication, Trondheim, Espoo, Monchegorsk, 745 p
[19] Rousseeuw PJ, Leroy AM (2003) Robust regression and outlier detection. Wiley, New York, 360 p
[20] Rousseeuw P, Van Driessen K (1999) A fast algorithm for the minimum covariance determinant estimator. Technometrics 41:212–223 · doi:10.2307/1270566
[21] Rousseeuw PJ, Van Zomeren BC (1990) Unmasking multivariate outliers and leverage points. J Am Stat Assoc 85(411):633–651 · doi:10.2307/2289995
[22] Thompson RN, Esson J, Duncan AC (1972) Major element chemical variation in the Eocene lavas of the Isle of Skye Scotland. J Petrol 13(2):219–253. Quoted in Aitchison, J., 1986, The statistical analysis of compositional data. Chapman & Hall, London, 416 p
[23] Visuri S, Koivunen V, Oja H (2000) Sign and rank covariance matrices. J Stat Plan Inference 91:557–575 · Zbl 0965.62049 · doi:10.1016/S0378-3758(00)00199-3
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.