Principal component analysis for histogram-valued data. (English) Zbl 1414.62213

Summary: This paper introduces a principal component methodology for analysing histogram-valued data under the symbolic data domain. Currently, no comparable method exists for this type of data. The proposed method uses a symbolic covariance matrix to determine the principal component space. The resulting observations on principal component space are presented as polytopes for visualization. Numerical representation of the resulting polytopes via histogram-valued output is also presented. The necessary algorithms are included. The technique is illustrated on a weather data set.


62H25 Factor analysis and principal components; correspondence analysis
60-08 Computational methods for problems pertaining to probability theory


Full Text: DOI


[1] Anderson, TW, Asymptotic theory for principal components analysis, Ann Math Stat, 34, 122-148, (1963) · Zbl 0202.49504
[2] Anderson TW (1984) An introduction to multivariate statistical analysis, 2nd edn. John Wiley, New York · Zbl 0651.62041
[3] Bertrand P and Goupil F (2000) Descriptive statistics for symbolic data. In: Bock H-H, Diday E (eds) Analysis of symbolic data: exploratory methods for extracting statistical information from complex data. Springer, Berlin, pp 103-124 · Zbl 0978.62005
[4] Billard L (2008) Sample covariance functions for complex quantitative data. In: Mizuta M, Nakano J (eds) Proceedings World Congress, International Association of Statistical Computing. Japanese Society of Computational Statistics, Japan, pp 157-163
[5] Billard, L., Brief overview of symbolic data and analytic issues, Stat Anal Data Min, 4, 149-156, (2011)
[6] Billard, L.; Diday, E., From the statistics of data to the statistics of knowledge: symbolic data analysis, J Am Stat Assoc, 98, 470-487, (2003)
[7] Billard L, Diday E (2006) Symbolic data analysis: conceptual statistics and data mining. John Wiley, Chichester · Zbl 1117.62002
[8] Billard L, Guo JH, Xu W (2011) Maximum Likelihood Estimators for Bivariate Interval-Valued Data. Technical Report, University of Georgia, Athens, GA, under revision
[9] Billard, L.; Le-Rademacher, J., Symbolic principal components for interval-valued data, Revue des Nouvelles Technologies de l’Information, 25, 31-40, (2013)
[10] Bock HH, Diday E (2000) Analysis of symbolic data: exploratory methods for extracting statistical information from complex data. Springer, Berlin · Zbl 1039.62501
[11] Cazes, P., Analyse Factorielle d’un Tableau de Lois de Probabilité, Rev Stat Appl, 50, 5-24, (2002)
[12] Cazes, P.; Chouakria, A.; Diday, E.; Schecktman, Y., Extension de l’analyse en composantes principales \(\grave{a}\) des donn\(\acute{e}\)es de type intervalle, Rev Stat Appl, 45, 5-24, (1997)
[13] Chouakria A (1998) Extension des M\(\acute{e}\)thodes d’Analyse Factorielle \(\grave{a}\) des Donn\(\acute{e}\)es de Type Intervalle. Th\(\acute{e}\)se de doctorat. Universit\(\acute{e}\) Paris Dauphine, Paris
[14] Douzal-Chouakria, A.; Billard, L.; Diday, E., Principal component analysis for interval-valued observations, Stat Anal Data Min, 4, 229-246, (2011)
[15] Ichino, M., The quantile method for symbolic principal component analysis, Stat Anal Data Min, 4, 184-198, (2011)
[16] Irpino A, Lauro C, Verde R (2003) Visualizing symbolic data by closed shapes. In: Schader M, Gaul W, Vichi M (eds) Between Data Science and Applied Data Analysis. Springer, Berlin. pp 244-251 · Zbl 05280179
[17] Johnson RA, Wichern DW (2002) Applied multivariate statistical analysis, 5th edn. Prentice Hall, New Jersey · Zbl 0745.62050
[18] Jolliffe IT (2004) Principal component analysis, 2nd edn. Springer, New York · Zbl 1011.62064
[19] Lauro, NC; Palumbo, F., Principal component analysis of interval data: a symbolic data analysis approach, Comput Stat, 15, 73-87, (2000) · Zbl 0953.62058
[20] Lauro NC, Verde R and Irpino A (2008) Principal component analysis of symbolic data described by intervals. In: Diday E, Noirhomme-Fraiture M (eds) Symbolic Data Analysis and the SODAS Software. Wiley, Chichester. pp 279-311
[21] Le-Rademacher J (2008) Principal Component Analysis for Interval-Valued and Histogram-Valued Data and Likelihood Functions and Some Maximum Likelihood Estimators for Symbolic Data. Doctoral Dissertation. University of Georgia
[22] Le-Rademacher, J.; Billard, L., Likelihood functions and some maximum likelihood estimators for symbolic data, J Stat Plan Inference, 141, 1593-1602, (2011) · Zbl 1204.62026
[23] Le-Rademacher, J.; Billard, L., Symbolic-covariance principal component analysis and visualization for interval-valued data, J Comput Graph Stat, 21, 413-432, (2012)
[24] Le-Rademacher, J.; Billard, L., Principal component histograms from interval-valued observations, Comput Stat, 28, 2117-2138, (2013) · Zbl 1306.65086
[25] Makosso-Kallyth, S.; Diday, E., Adaptation of interval PCA to symbolic histogram variables, Adv Data Anal Classif, 6, 147-159, (2012) · Zbl 1255.62173
[26] Mardia KV, Kent JT, Bibby JM (1979) Multivariate analysis. Academic Press, New York · Zbl 0432.62029
[27] Palumbo F, Lauro NC (2003) A PCA for interval-valued data based on midpoints and radii. In: Yanai H, Okada A, Shigemasu K, Kano Y, Meulman J (eds) New Developments in Psychometrics. Springer, Tokyo. pp 641-648
[28] Shapiro, AF, Fuzzy random variables, Insur Math Econ, 44, 307-314, (2009) · Zbl 1166.91018
[29] Xu W (2010) Symbolic Data Analysis: Interval-Valued Data Regression. PhD thesis, University of Georgia
[30] Zadeh, LA, Fuzzy Sets, Inf Control, 8, 338-353, (1965) · Zbl 0139.24606
[31] Zadeh, LA, Probability measures of fuzzy events, J Math Anal Appl, 23, 421-427, (1968) · Zbl 0174.49002
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.