×

Robust clustering for functional data based on trimming and constraints. (English) Zbl 1474.62166

Summary: Many clustering algorithms when the data are curves or functions have been recently proposed. However, the presence of contamination in the sample of curves can influence the performance of most of them. In this work we propose a robust, model-based clustering method that relies on an approximation to the “density function” for functional data. The robustness follows from the joint application of data-driven trimming, for reducing the effect of contaminated observations, and constraints on the variances, for avoiding spurious clusters in the solution. The algorithm is designed to perform clustering and outlier detection simultaneously by maximizing a trimmed “pseudo” likelihood. The proposed method has been evaluated and compared with other existing methods through a simulation study. Better performance for the proposed methodology is shown when a fraction of contaminating curves is added to a non-contaminated sample. Finally, an application to a real data set that has been previously considered in the literature is given.

MSC:

62G35 Nonparametric robustness
62H25 Factor analysis and principal components; correspondence analysis
62H30 Classification and discrimination; cluster analysis (statistical aspects)
62R10 Functional data analysis
68T10 Pattern recognition, speech recognition
PDF BibTeX XML Cite
Full Text: DOI arXiv

References:

[1] Bouveyron, C.; Jacques, J., Model-based clustering of time series in group-specific functional subspaces, Adv Data Anal Classif, 5, 281-300, (2011) · Zbl 1274.62416
[2] Bouveyron C, Jacques J (2014) funHDDC: model-based clustering in group-specific functional subspaces. R package version 1.0
[3] Cattell, RB, The scree test for the number of factors, Multivar Behav Res, 2, 245-276, (1966)
[4] Cerioli A, García-Escudero LA, Mayo-Iscar A, Riani M (2017) Finding the number of normal groups in model-based clustering via constrained likelihoods. J Comput Graph Stat
[5] Cuesta-Albertos, JA; Fraiman, R., Impartial trimmed \(k\)-means for functional data, Comput Stat Data Anal, 51, 4864-4877, (2007) · Zbl 1162.62377
[6] Cuesta-Albertos, JA; Gordaliza, A.; Matrán, C., Trimmed \(k\)-means: an attempt to robustify quantizers, Ann Stat, 25, 553-576, (1997) · Zbl 0878.62045
[7] Delaigle, A.; Hall, P., Defining probability density for a distribution of random functions, Ann Stat, 38, 1171-1193, (2010) · Zbl 1183.62061
[8] Febrero, M.; Galeano, P.; González-Manteiga, W., Outlier detection in functional data by depth measures, with application to identify abnormal \({\rm NO}x\) levels, Environmetrics, 19, 331-345, (2008)
[9] Febrero-Bande, M.; Fuente, M. Oviedo, Statistical computing in functional data analysis: the R package fda.usc, J Stat Softw, 51, 1-28, (2012)
[10] Ferraty F, Vieu P (2006) Nonparametric functional data analysis. Springer series in statistics. Springer, New York · Zbl 1119.62046
[11] Fraley, C.; Raftery, AE, Model-based clustering, discriminant analysis, and density estimation, J Am Stat Assoc, 97, 611-631, (2002) · Zbl 1073.62545
[12] Fritz, H.; García-Escudero, LA; Mayo-Iscar, A., A fast algorithm for robust constrained clustering, Comput Stat Data Anal, 61, 124-136, (2013) · Zbl 1349.62264
[13] Gallegos MT (2002) Maximum likelihood clustering with outliers. In: Classification, clustering, and data analysis (Cracow, 2002). Studies in classification, data analysis, and knowledge organization. Springer, Berlin, pp. 247-255
[14] García-Escudero, LA; Gordaliza, A., A proposal for robust curve clustering, J Classif, 22, 185-201, (2005) · Zbl 1336.62179
[15] García-Escudero, LA; Gordaliza, A.; Matrán, C.; Mayo-Iscar, A., A general trimming approach to robust cluster analysis, Ann Stat, 36, 1324-1345, (2008) · Zbl 1360.62328
[16] García-Escudero, LA; Gordaliza, A.; Matrán, C.; Mayo-Iscar, A., Avoiding spurious local maximizers in mixture modeling, Stat Comput, 25, 619-633, (2015) · Zbl 1331.62100
[17] García-Escudero, LA; Gordaliza, A.; Mayo-Iscar, A., A constrained robust proposal for mixture modeling avoiding spurious solutions, Adv Data Anal Classif, 8, 27-43, (2014)
[18] Jacques, J.; Preda, C., Funclust: a curves clustering method using functional random variables density approximation, Neurocomputing, 112, 164-171, (2013)
[19] James, GM; Sugar, CA, Clustering for sparsely sampled functional data, J Am Stat Assoc, 98, 397-408, (2003) · Zbl 1041.62052
[20] McLachlan GJ, Peel D (2000) Finite mixture models. Wiley series in probability and statistics, New York · Zbl 0963.62061
[21] Ramsay JO, Silverman BW (2005) Functional data analysis, 2nd edn. Springer series in statistics. Springer, New York · Zbl 1079.62006
[22] Ramsay JO, Wickham H, Graves S, Hooker G (2014) fda: functional data analysis. R package version 2.4.4
[23] Ritter G (2015) Robust cluster analysis and variable selection, vol 137. Monographs on statistics and applied probability. CRC Press, Boca Raton, FL · Zbl 1341.62037
[24] Sawant, P.; Billor, N.; Shin, H., Functional outlier detection with robust functional principal component analysis, Comput Stat, 27, 83-102, (2012) · Zbl 1304.65064
[25] Sguera, C.; Galeano, P.; Lillo, RE, Functional outlier detection by a local depth with application to NOx levels, Stoch Environ Res Risk Assess, 462, 1835-1851, (2015)
[26] Soueidatt M (2014) Funclustering: a package for functional data clustering. R package version 1.0.1
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.