×

zbMATH — the first resource for mathematics

Advances in principal balances for compositional data. (English) Zbl 1407.62219
Summary: Compositional data analysis requires selecting an orthonormal basis with which to work on coordinates. In most cases this selection is based on a data driven criterion. Principal component analysis provides bases that are, in general, functions of all the original parts, each with a different weight hindering their interpretation. For interpretative purposes, it would be better to have each basis component as a ratio or balance of the geometric means of two groups of parts, leaving irrelevant parts with a zero weight. This is the role of principal balances, defined as a sequence of orthonormal balances which successively maximize the explained variance in a data set. The new algorithm to compute principal balances requires an exhaustive search along all the possible sets of orthonormal balances. To reduce computational time, the sets of possible partitions for up to 15 parts are stored. Two other suboptimal, but feasible, algorithms are also introduced: (i) a new search for balances following a constrained principal component approach and (ii) the hierarchical cluster analysis of variables. The latter is a new approach based on the relation between the variation matrix and the Aitchison distance. The properties and performance of these three algorithms are illustrated using a typical data set of geochemical compositions and a simulation exercise.

MSC:
62H25 Factor analysis and principal components; correspondence analysis
62H11 Directional data; spatial statistics
86A32 Geostatistics
Software:
PMA; R; zCompositions
PDF BibTeX XML Cite
Full Text: DOI
References:
[1] Aitchison, J, The statistical analysis of compositional data (with discussion), J R Stat Soc B Methodol, 44, 139-177, (1982) · Zbl 0491.62017
[2] Aitchison, J, Principal component analysis of compositional data, Biometrika, 70, 57-65, (1983) · Zbl 0515.62057
[3] Aitchison J (1986) The statistical analysis of compositional data. Monographs on statistics and applied probability. Chapman & Hall Ltd., London. (Reprinted in 2003 with additional material by The Blackburn Press)
[4] Aitchison, J; Greenacre, M, Biplots for compositional data, J R Stat Soc C Appl, 51, 375-392, (2002) · Zbl 1111.62300
[5] Barceló-Vidal, C; Martín-Fernández, JA, The mathematics of compositional analysis, Austrian J Stat, 45, 57-71, (2016)
[6] Chipman, HA; Gu, H, Interpretable dimension reduction, J Appl Stat, 32, 969-987, (2005) · Zbl 1121.62347
[7] Cox TF, Arnold DS (2016) Simple components. J App Stat. https://doi.org/10.1080/02664763.2016.1268104
[8] Enki, HA; Trendafilov, NT; Jolliffe, IT, A clustering approach to interpretable principal components, J Appl Stat, 40, 583-599, (2013)
[9] Egozcue, JJ; Pawlowsky-Glahn, V, Groups of parts and their balances in compositional data analysis, Math Geol, 37, 795-828, (2005) · Zbl 1177.86018
[10] Egozcue, JJ; Pawlowsky-Glahn, V, Simplicial geometry for compositional data, Geol Soc Spec Pub, 264, 145-159, (2006) · Zbl 1156.86307
[11] Egozcue, JJ; Pawlowsky-Glahn, V; Mateu-Figueras, G; Barceló-Vidal, C, Isometric logratio transformations for compositional data analysis, Math Geol, 35, 279-300, (2003) · Zbl 1302.86024
[12] Everitt BS, Landau S, Leese M, Stahl D (2011) Cluster analysis. Wiley, Chichester · Zbl 1274.62003
[13] Gallo, M; Trendafilov, NT; Buccianti, A, Sparse PCA and investigation of multi-elements compositional repositories: theory and applications, Environ Ecol Stat, 23, 421-434, (2016)
[14] Hotelling, H, Analysis of a complex of statistical variables into principal components, J Educ Psychol, 24, 417-441, (1933) · JFM 59.1183.01
[15] Izenman AJ (2008) Modern multivariate statistical techniques: regression, classification, and manifold learning. Springer, New York · Zbl 1155.62040
[16] Jolliffe IT (2002) Principal component analysis, 2nd edn. Springer series in statistics. Springer, New York
[17] Jolliffe, IT; Trendafilov, NT; Uddin, M, A modified principal component technique based on the LASSO, J Comput Graph Stat, 12, 531-547, (2003)
[18] Lovell, D; Pawlowsky-Glahn, V; Egozcue, JJ; Marguerat, S; Bähler, J, Proportionality: a valid alternative to correlation for relative data, PLoS Comput Biol, 11, e1004075, (2015)
[19] Mateu-Figueras, G; Pawlowsky-Glahn, V; Egozcue, JJ; Pawlowsky-Glahn, V (ed.); Buccianti, A (ed.), The principle of working on coordinates, 31-42, (2011), Chichester
[20] Mert, MC; Filzmoser, P; Hron, K, Sparse principal balances, Stat Model, 15, 159-174, (2015)
[21] Palarea-Albaladejo, J; Martín-Fernández, JA; Soto, JA, Dealing with distances and transformations for fuzzy C-means clustering of compositional data, J Classif, 29, 144-169, (2012) · Zbl 1360.62347
[22] Palarea-Albaladejo, J; Martín-Fernández, JA, Zcompositions—R package for multivariate imputation of nondetects and zeros in compositional data sets, Chemom Intell Lab, 143, 85-96, (2015)
[23] Pawlowsky-Glahn, V; Egozcue, JJ, Geometric approach to statistical analysis on the simplex, Stoch Environ Res Risk Assess, 15, 384-398, (2001) · Zbl 0987.62001
[24] Pawlowsky-Glahn, V; Egozcue, JJ, Exploring compositional data with the coda-dendrogram, Austrian J Stat, 40, 103-113, (2011)
[25] Pawlowsky-Glahn V, Egozcue JJ, Tolosana-Delgado R (2011) Principal balances. In Egozcue JJ, Tolosana-Delgado R, Ortego M (eds) Proceedings of the 4th international workshop on compositional data analysis, Girona, Spain, pp 1-10
[26] Pawlowsky-Glahn V, Egozcue JJ, Tolosana-Delgado R (2015) Modeling and analysis of compositional data. Statistics in practice. Wiley, Chichester
[27] Podani, J, Simulation of random dendrograms and comparison tests: some comments, J Classif, 17, 123-142, (2000) · Zbl 0962.91069
[28] Prados, F; Boada, I; Prats, A; Martín-Fernández, JA; Feixas, M; Blasco, G; Puig, J; Pedraza, S, Analysis of new diffusion tensor imaging anisotropy measures in the 3P-plot, J Magn Reson Imaging, 31, 1435-1444, (2010)
[29] R development core team (2015) R: a language and environment for statistical computing: Vienna. http://www.r-project.org
[30] Tolosana-Delgado, R; Eynatten, H, Simplifying compositional multiple regression: application to grain size controls on sediment geochemistry, Comput Geosci, 36, 577-589, (2010)
[31] Eynatten, H; Tolosana-Delgado, R; Karius, V, Sediment generation in modern glacial settings: grain-size and source-rock control on sediment composition, Sediment Geol, 280, 80-92, (2012)
[32] Witten, D; Tibshirani, R; Gross, S; Narasimhan, B, PMA: penalized multivariate analysis, R Package Version, 1, 8, (2011)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.