×

zbMATH — the first resource for mathematics

Discriminant analysis for compositional data and robust parameter estimation. (English) Zbl 1304.65033
Summary: Compositional data, i.e. data including only relative information, need to be transformed prior to applying the standard discriminant analysis methods that are designed for the Euclidean space. Here, it is investigated for linear, quadratic, and Fisher discriminant analysis, which of the transformations lead to invariance of the resulting discriminant rules. Moreover, it is shown that for robust parameter estimation not only an appropriate transformation, but also affine equivariant estimators of location and covariance are needed. An example and simulated data demonstrate the effects of working in an inappropriate space for discriminant analysis.

MSC:
65C60 Computational problems in statistics (MSC2010)
Software:
robustbase
PDF BibTeX XML Cite
Full Text: DOI
References:
[1] Aitchison J (1986) The statistical analysis of compositional data. Chapman and Hall, London · Zbl 0688.62004
[2] Aitchison J, Greenacre M (2002) Biplots of compositional data. Appl Stat 51: 375–392 · Zbl 1111.62300
[3] Barceló-Vidal C, Martín-Fernández J, Pawlowsky-Glahn V (1999) Comment on ’Singularity and nonnormality in the classification of compositional data’ by G. C. Bohling, J.C. Davis, R.A. Olea, and J. Harff. Math Geol 31(5): 581–585 · doi:10.1023/A:1007520124870
[4] Bohling G, Davis J, Olea R, J H (1998) Singularity and nonnormality in the classification of compositional data. Math Geol 30(1): 5–20 · doi:10.1023/A:1021705120065
[5] Croux C, Dehon C (2001) Robust linear discriminant analysis using S-estimators. Can J Stat 29: 473–492 · Zbl 0987.62044 · doi:10.2307/3316042
[6] Croux C, Filzmoser P, Joossens K (2008) Classification efficiencies for robust linear discriminant analysis. Stat Sin (18): 581–599 · Zbl 1135.62051
[7] Egozcue J, Pawlowsky-Glahn V (2005) Groups of parts and their balances in compositional data analysis. Math Geol 37(7): 795–828 · Zbl 1177.86018 · doi:10.1007/s11004-005-7381-9
[8] Egozcue J, Pawlowsky-Glahn V (2006) Simplicial geometry for compositional data. In: Buccianti A, Mateu-Figueras G, Pawlowsky-Glahn V (eds) Compositional data analysis in the geosciences: from theory to practice. Geological Society, London, pp 145–160 · Zbl 1156.86307
[9] Egozcue J, Pawlowsky-Glahn V, Mateu-Figueras G, Barceló-Vidal C (2003) Isometric logratio transformations for compositional data analysis. Math Geol 35(3): 279–300 · Zbl 1302.86024 · doi:10.1023/A:1023818214614
[10] Filzmoser P, Hron K (2008) Outlier detection for compositional data using robust methods. Math Geosci 40(3): 233–248 · Zbl 1135.62040 · doi:10.1007/s11004-007-9141-5
[11] Filzmoser P, Hron K (2009) Correlation analysis for compositional data. Math Geosci 41: 905–919 · Zbl 1178.86019 · doi:10.1007/s11004-008-9196-y
[12] Filzmoser P, Hron K, Reimann C (2009) Principal component analysis for compositional data with outliers. Environmetrics 20: 621–632 · doi:10.1002/env.966
[13] Fisher RA (1938) The statistical utilization of multiple measurements. Ann Eugen 8: 376–386 · Zbl 0019.35703 · doi:10.1111/j.1469-1809.1938.tb02189.x
[14] Gorelikova N, Tolosana-Delgado R, Pawlowsky-Glahn V, Khanchuk A, Gonevchuk V (2006) Discriminating geodynamical regimes of tin ore formation using trace element composition of cassiterite: the Sikhote’Alin case (Far Eastern Russia). In: Buccianti A, Mateu-Figueras G, Pawlowsky-Glahn V (eds) Compositional data analysis in the geosciences: from theory to practice. Geological Society, London, pp 43–57
[15] Hawkins D, McLachlan G (1997) High-breakdown linear discriminant analysis. J Am Stat Assoc 92: 136–143 · Zbl 0889.62052 · doi:10.1080/01621459.1997.10473610
[16] He X, Fung W (2000) High breakdown estimation for multiple populations with applications to discriminant analysis. J Multivar Stat 72: 151–162 · Zbl 0969.62045 · doi:10.1006/jmva.1999.1857
[17] Hron K, Templ M, Filzmoser P (2010) Imputation of missing values for compositional data using classical and robust methods. Comput Stat Data Anal 54(12): 3095–3107 · Zbl 1284.62049 · doi:10.1016/j.csda.2009.11.023
[18] Hubert M, Van Driessen K (2004) Fast and robust discriminant analysis. Comput Stat Data Anal (45): 301–320 · Zbl 1429.62247 · doi:10.1016/S0167-9473(02)00299-2
[19] Johnson RA, Wichern DW (2007) Applied multivariate statistical analysis, 6th edn.. Prentice Hall, New York · Zbl 1269.62044
[20] Kovács L, Kovács G, Martín-Fernández J, Barceló-Vidal C (2006) Major-oxide compositional discrimination in Cenozoic volcanites of Hungary. In: Buccianti A, Mateu-Figueras G, Pawlowsky-Glahn V (eds) Compositional data analysis in the geosciences: from theory to practice. Geological Society, London, pp 11–23
[21] Maronna R, Martin R, Yohai V (2006) Robust statistics: theory and methods. Wiley, New York · Zbl 1094.62040
[22] Martín-Fernández J, Barceló-Vidal C, Pawlowsky-Glahn V, Kovács L, Kovács G (2005) Subcompositional patterns in Cenozoic volcanic rocks of Hungary. Math Geol 37(7): 729–752 · doi:10.1007/s11004-005-7377-5
[23] Mateu-Figueras G, Pawlowsky-Glahn V (2008) A critical approach to probability laws in geochemistry. Math Geosci 40: 489–502 · Zbl 1153.86338 · doi:10.1007/s11004-008-9169-1
[24] Pawlowsky-Glahn V, Egozcue J (2001) Geometric approach to statistical analysis on the simplex. Stoch Environ Res Risk Assess 15(5): 384–398 · Zbl 0987.62001 · doi:10.1007/s004770100077
[25] Pawlowsky-Glahn V, Egozcue J, Tolosana-Delgado R (2008) Lecture notes on compositional data analysis. Universitat de Girona. http://hdl.handle.net/10256/297
[26] Pearson K (1897) Mathematical contributions to the theory of evolution. On a form of spurious correlation which may arise when indices are used in the measurement of organs. Proc R Soc London 60: 489–502 · JFM 28.0209.02 · doi:10.1098/rspl.1896.0076
[27] Rao CR (1948) The utilization of multiple measurements in problems of biological classification. J R Stat Soc B 10: 159–203 · Zbl 0034.07902
[28] Reimann C, Filzmoser P, Garrett R, Dutter R (2008) Statistical data analysis explained. Applied Environmental Statistics with R. John Wiley, Chichester
[29] Thomas C, Aitchison J (2005) Compositional data analysis of geological variability and process: a case study. Math Geol 37(7): 753–772 · Zbl 1151.86307 · doi:10.1007/s11004-005-7378-4
[30] Thomas C, Aitchison J (2006) Log-ratios and geochemical discrimination of Scottish Dalradian limestones: a case study. In: Buccianti A, Mateu-Figueras G, Pawlowsky-Glahn V (eds) Compositional data analysis in the geosciences: from theory to practice. Geological Society, London, pp 25–41
[31] Von Eynatten H, Barceló-Vidal C, Pawlowsky-Glahn V (2003) Composition and discrimination of sandstones: a statistical evaluation of different analytical methods. J Sediment Res 73(1): 47–57 · doi:10.1306/070102730047
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.