Exploratory data analysis for interval compositional data.(English)Zbl 1414.62211

Summary: Compositional data are considered as data where relative contributions of parts on a whole, conveyed by (log-)ratios between them, are essential for the analysis. In Symbolic Data Analysis (SDA), we are in the framework of interval data when elements are characterized by variables whose values are intervals on $$\mathbb{R}$$ representing inherent variability. In this paper, we address the special problem of the analysis of interval compositions, i.e., when the interval data are obtained by the aggregation of compositions. It is assumed that the interval information is represented by the respective midpoints and ranges, and both sources of information are considered as compositions. In this context, we introduce the representation of interval data as three-way data. In the framework of the log-ratio approach from compositional data analysis, it is outlined how interval compositions can be treated in an exploratory context. The goal of the analysis is to represent the compositions by coordinates which are interpretable in terms of the original compositional parts. This is achieved by summarizing all relative information (logratios) about each part into one coordinate from the coordinate system. Based on an example from the European Union Statistics on Income and Living Conditions (EU-SILC), several possibilities for an exploratory data analysis approach for interval compositions are outlined and investigated.

MSC:

 62H25 Factor analysis and principal components; correspondence analysis 62H99 Multivariate analysis

laeken; SODAS
Full Text:

References:

 [1] Aitchison J (1986) The statistical analysis of compositional data. Chapman and Hall, London · Zbl 0688.62004 [2] Aitchison, J.; Greenacre, M., Biplots for compositional data, J R Stat Soc Ser C (Appl Stat), 51, 375-392, (2002) · Zbl 1111.62300 [3] Aitchison, J.; Ng, KW, The role of perturbation in compositional data analysis, Stat Model, 5, 173-185, (2005) · Zbl 1069.62003 [4] Alfons, A.; Templ, M., Estimation of social exclusion indicators from complex surveys: the R package laeken, J Stat Softw, 54, 1-25, (2013) [5] Billheimer, D.; Guttorp, P.; Fagan, W., Statistical interpretation of species composition, J Am Stat Assoc, 96, 1205-1214, (2001) · Zbl 1073.62573 [6] Billard, L.; Diday, E., From the statistics of data to the statistics of knowledge: symbolic data analysis, J Am Stat Assoc, 98, 470-487, (2003) [7] Bock H-H, Diday E (eds) (2000) Analysis of symbolic data, exploratory methods for extracting statistical information from complex data. Springer, Heidelberg · Zbl 1039.62501 [8] Brito, P.; Duarte Silva, AP, Modelling interval data with Normal and Skew-Normal distributions, J Appl Stat, 39, 3-20, (2012) [9] Bro, R., PARAFAC. Tutorial and applications, Chemometr Intell Lab Syst, 38, 149-171, (1997) [10] Cazes, P.; Chouakria, A.; Diday, E.; Schektman, Y., Extensions de l’Analyse en Composantes Principales à des données de type intervalle, Rev Stat Appl, 24, 5-24, (1997) [11] Chouakria, A.; Cazes, P.; Diday, E.; Bock, HH (ed.); Diday, E. (ed.), Symbolic principal component analysis, 200-212, (2000), Heidelberg · Zbl 0977.62063 [12] Diday E, Noirhomme-Fraiture M (eds) (2008) Symbolic data analysis and the SODAS software. Wiley, Chichester · Zbl 1275.62029 [13] Di Palma AM, Filzmoser P, Gallo M, Hron K (2015) A robust CP model for compositional data(Submitted) [14] Eaton ML (1983) Multivariate statistics. A vector space approach. John Wiley & Sons, New York · Zbl 0587.62097 [15] Egozcue, JJ; Pawlowsky-Glahn, V.; Mateu-Figueras, G.; Barceló-Vidal, V., Isometric logratio transformations for compositional data analysis, Math Geol, 35, 279-300, (2003) · Zbl 1302.86024 [16] Egozcue, JJ; Pawlowsky-Glahn, V., Groups of parts and their balances in compositional data analysis, Math Geol, 37, 795-828, (2005) · Zbl 1177.86018 [17] Egozcue, JJ; Pawlowsky-Glahn, V.; Buccianti, A. (ed.); Mateu-Figueras, G. (ed.); Pawlowsky-Glahn, V. (ed.), Simplicial geometry for compositional data, 145-160, (2006), London · Zbl 1156.86307 [18] Filzmoser, P.; Hron, K., Outlier detection for compositional data using robust methods, Math Geosci, 40, 233-248, (2008) · Zbl 1135.62040 [19] Filzmoser, P.; Hron, K.; Reimann, C., Principal component analysis for compositional data with outliers, Environmetrics, 20, 621-632, (2009) [20] Filzmoser, P.; Hron, K., Correlation analysis for compositional data, Math Geosci, 41, 905-919, (2009) · Zbl 1178.86019 [21] Filzmoser, P.; Hron, K.; Reimann, C., Interpretation of multivariate outliers for compositional data, Comput Geosci, 39, 77-85, (2012) [22] Filzmoser, P.; Hron, K.; Pawlowsky-Glahn, V. (ed.); Buccianti, A. (ed.), Robust statistical analysis, 59-72, (2011), Chichester [23] Fišerová, E.; Hron, K., On interpretation of orthonormal coordinates for compositional data, Math Geosci, 43, 455-468, (2011) [24] Engle, MA; Gallo, M.; Schroeder, KT; Geboy, NJ; Zupancic, JW, Three-way compositional analysis of water quality monitoring data, Environ Ecol Stat, 21, 565-581, (2014) [25] Giordani P, Kiers HAL (2006) A comparison of three methods for Principal Component Analysis of fuzzy interval data. Comput Stat Data Anal, special issue “The Fuzzy Approach to Statistical Analysis” 51(1):379-397 · Zbl 1157.62426 [26] Kojadinovic, I.; Holmes, M., Tests of independence among continuous random vectors based on Cramér-von Mises functionals of the empirical copula process, J Multivar Anal, 100, 1137-1154, (2009) · Zbl 1159.62033 [27] Kroonenberg EM (1983) Three-mode principal component analysis: theory and applications. DSWO, Leiden [28] Kroonenberg, EM; Leeuw, J., Principal component analysis of three-mode data by means of alternating least squares algorithms, Psychometrika, 45, 69-97, (1980) · Zbl 0431.62035 [29] Lauro, C.; Palumbo, F.; Vichi, M. (ed.); etal., Principal component analysis for non-precise data, 173-184, (2005), Heidelberg · Zbl 1341.62163 [30] Mateu-Figueras, G.; Pawlowsky-Glahn, V., A critical approach to probability laws in geochemistry, Math Geosci, 40, 489-502, (2008) · Zbl 1153.86338 [31] Moore RE (1966) Interval analysis. Prentice Hall, New Jersey · Zbl 0176.13301 [32] Morrison DF (1990) Multivariate statistical methods, 3rd edn. McGraw-Hill, New York · Zbl 0183.20605 [33] Neto, EAL; Carvalho, FAT, Centre and range method for fitting a linear regression model to symbolic intervalar data, Comput Stat Data Anal, 52, 1500-1515, (2008) · Zbl 1452.62493 [34] Neto, EAL; Carvalho, FAT, Constrained linear regression models for symbolic interval-valued variables, Comput Stat Data Anal, 54, 333-347, (2010) · Zbl 1464.62055 [35] Noirhomme-Fraiture, M.; Brito, P., Far beyond the classical data models: symbolic data analysis, Stat Anal Data Min, 4, 157-170, (2011) [36] Palarea-Albaladejo, J.; Martín-Fernández, JA, Dealing with distances and transformations for fuzzy c-means clustering of compositional data, J Classifi, 29, 144-169, (2012) · Zbl 1360.62347 [37] Pavlačka, O., Note on the lack of equality between fuzzy weighted average and fuzzy convex sum, Fuzzy Sets Syst, 213, 102-105, (2013) · Zbl 1291.91052 [38] Pawlowsky-Glahn, V.; Egozcue, JJ, Geometric approach to statistical analysis on the simplex, Stoch Environ Res Risk Assess, 15, 384-398, (2001) · Zbl 0987.62001 [39] Pawlowsky-Glahn V, Egozcue JJ, Tolosana-Delgado R (2015a) Modeling and analysis of compositional data. Wiley, Chichester [40] Pawlowsky-Glahn, V.; Egozcue, JJ; Lovell, D., Tools for compositional data with a total, Stat Model, 15, 175-190, (2015) [41] Rousseeuw, PJ; Ruts, I.; Tukey, JW, The bagplot: a bivariate boxplot, Am Stat, 53, 382-387, (1999) [42] Seber GAF (1984) Multivariate observations. Wiley, New York · Zbl 0627.62052 [43] Teles, P.; Brito, P., Modeling interval time series with space-time processes, Commun Stat Theory Methods, 44, 3599-3627, (2015) · Zbl 1342.37076 [44] Wang, H.; Guan, R.; Wu, J., CIPCA: complete-information-based principal component analysis for interval-valued data, Neurocomputing, 86, 158-169, (2012) [45] Zuccolotto, P., Principal components of sample estimates: an approach through symbolic data analysis, Stat Methods Appl, 16, 173-192, (2007) · Zbl 1405.62073
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.