## Dimension-reduced clustering of functional data via subspace separation.(English)Zbl 1373.62319

Summary: We propose a new method for finding an optimal cluster structure of functions as well as an optimal subspace for clustering simultaneously. The proposed method aims to minimize a distance between functional objects and their projections with the imposition of clustering penalties. It includes existing approaches to functional cluster analysis and dimension reduction, such as functional principal component $$k$$-means [the first author, Adv. Data Anal. Classif., ADAC 6, No. 3, 219–247 (2012; Zbl 1254.62077)] and functional factorial $$k$$-means [the first author and Y. Terada, Comput. Stat. Data Anal. 79, 133–148 (2014; doi:10.1016/j.csda.2014.05.010)], as special cases. We show that these existing methods can perform poorly when a disturbing structure exists and that the proposed method can overcome this drawback by using subspace separation. A novel model selection procedure has been proposed, which can also be applied to other joint analyses of dimension reduction and clustering. We apply the proposed method to artificial and real data to demonstrate its performance as compared to the extant approaches.

### MSC:

 62H30 Classification and discrimination; cluster analysis (statistical aspects) 62H25 Factor analysis and principal components; correspondence analysis 65C60 Computational problems in statistics (MSC2010)

Zbl 1254.62077

### Software:

glmnet; sparsenet; fda (R); AS 136; funHDDC
Full Text:

### References:

 [1] ARABIE, P; HUBERT, L; Bagozzi, RP (ed.), Cluster analysis in marketing research, 160-189, (1994), Oxford [2] BESSE, PC; RAMSAY, JO, Principal components analysis of sampled functions, Psychometrika, 51, 285-311, (1986) · Zbl 0623.62048 [3] BOUVEYRON, C; JACQUES, J, Model-based clustering of time series in group-specific functional subspaces, Advances in Data Analysis and Classification, 5, 281-300, (2011) · Zbl 1274.62416 [4] CALIŃSKI, T; HARABASZ, J, A dendrite method for cluster analysis, Communications in Statistics, 3, 1-27, (1974) · Zbl 0273.62010 [5] DE SOETE, G; CARROLL, JD; Diday, E (ed.); Lechevallier, Y (ed.); Schader, M (ed.); Bertrand, P (ed.); Burtschy, B (ed.), K-means clustering in a low-dimensional Euclidean space, 212-219, (1994), Heidelberg [6] DUNFORD, N., and SCHWARTZ. J.T. (1988), Linear Operators, Spectral Theory, Self Adjoint Operators in Hilbert Space, Part 2, New York: Interscience. · Zbl 0128.34803 [7] FERRATY, F., and VIEU, P. (2006), Nonparametric Functional Data Analysis, New York: Springer. · Zbl 1119.62046 [8] FRIEDMAN, J; HASTIE, T; TIBSHIRANI, R, Regularization paths for generalized linear models via coordinate descent, Journal of Statistical Software, 33, 1-22, (2010) [9] GATTONE, SA; ROCCI, R, Clustering curves on a reduced subspace, Journal of Computational and Graphical Statistics, 21, 361-379, (2012) [10] GREEN, P.J., and SILVERMAN, B.W. (1994), Nonparametric Regression and Generalized Linear Models: A Roughness Penalty Approach, London: Chapman and Hall. · Zbl 0832.62032 [11] HARTIGAN, JA; WONG, MA, Algorithm AS 136: A K-means clustering algorithm, Journal of the Royal Statistical Society, Series C, 28, 100-108, (1979) · Zbl 0447.62062 [12] HASTIE, T; BUJA, A; TIBSHIRANI, R, Penalized discriminant analysis, The Annals of Statistics, 23, 73-102, (1995) · Zbl 0821.62031 [13] HUBERT, L; ARABIE, P, Comparing partitions, Journal of Classification, 2, 193-218, (1985) · Zbl 0587.62128 [14] ILLIAN, JB; PROSSER, JI; BAKER, KL; RANGEL-CASTRO, JI, Functional principal component data analysis: A new method for analysing microbial community fingerprints, Journal of Microbiological Methods, 79, 89-95, (2009) [15] JENNRICH, RI, A simple general procedure for orthogonal rotation, Psychometrika, 66, 289-306, (2001) · Zbl 1293.62247 [16] JENNRICH, RI, A simple general procedure for oblique rotation, Psychometrika, 67, 7-20, (2002) · Zbl 1297.62232 [17] LLOYD, S, Least squares quantization in pem, IEEE Transactions on Information Theory, 28, 128-137, (1982) · Zbl 0504.94015 [18] MACQUEEN, J; Cam, LM (ed.); Neyman, J (ed.), Some methods of classification and analysis of multivariate observations, 281-297, (1967), Berkeley, CA [19] MAZUMDER, R; FRIEDMAN, J; HASTIE, T, Sparsenet: coordinate descent with nonconvex penalties, Journal of the American Statistical Association, 106, 1125-1138, (2011) · Zbl 1229.62091 [20] OCAÑA, FA; AGUILERA, AM; VALDERRAMA, MJ, Functional principal components analysis by choice of norm, Journal of Multivariate Analysis, 71, 262-276, (1999) · Zbl 0944.62059 [21] RAMSAY, J.O., and SILVERMAN, B.W. (2005), Functional Data Analysis (2nd ed.), New York: Springer-Verlag. · Zbl 1079.62006 [22] REISS, TP; OGDEN, T, Functional principal component regression and functional partial least squares, Journal of the American Statistical Association, 102, 984-996, (2007) · Zbl 1469.62237 [23] SILVERMAN, BW, Smoothed functional principal components analysis by choice of norm, The Annals of Statistics, 24, 1-24, (1996) · Zbl 0853.62044 [24] SUYUNDYKOV, R., PUECHMOREL, S., and FERRE, L. (2010), “Multivariate Functional Data Clusterization by PCA in Sobolev Space Using Wavelets”, Hyper Articles en Ligne (https://hal.archives-ouvertes.fr/): inria-00494702. [25] TIBSHIRANI, R; WALTHER, G; HASTIE, T, Estimating the number of clusters in a data set via the gap statistic, Journal of the Royal Statistical Society, Series B, 63, 411-423, (2001) · Zbl 0979.62046 [26] TIMMERMAN, ME; CEULEMANS, E; KIERS, H; VICHI, M, Factorial and reduced K-means reconsidered, Computational Statistics and Data Analysis, 54, 1858-1871, (2010) · Zbl 1284.62396 [27] VICHI, M; KIERS, HAL, Factorial K-means analysis for two-way data, Computational Statistics and Data Analysis, 37, 49-64, (2001) · Zbl 1051.62056 [28] VICHI, M; ROCCI, R; KIERS, HAL, Simultaneous component and clustering methods for three-way data: within and between approaches, Journal of Classification, 24, 71-98, (2007) · Zbl 1144.62045 [29] WANG, J, Consistent selection of the number of clusters via crossvalidation, Biometrika, 97, 893-904, (2010) · Zbl 1204.62104 [30] YAMAMOTO, M, Clustering of functional data in a low-dimensional subspace, Advances in Data Analysis and Classification, 6, 219-247, (2012) · Zbl 1254.62077 [31] YAMAMOTO, M; HWANG, H, A general formulation of cluster analysis with dimension reduction and subspace separation, Behaviormetrika, 41, 115-129, (2014) [32] YAMAMOTO, M; TERADA, Y, Functional factorial $$k$$-means analysis, Computational Statistics and Data Analysis, 79, 133-148, (2014) · Zbl 06984060
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.