×

A reweighting approach to robust clustering. (English) Zbl 1384.62193

Summary: An iteratively reweighted approach for robust clustering is presented in this work. The method is initialized with a very robust clustering partition based on an high trimming level. The initial partition is then refined to reduce the number of wrongly discarded observations and substantially increase efficiency. Simulation studies and real data examples indicate that the final clustering solution has both good properties in terms of robustness and efficiency and naturally adapts to the true underlying contamination level.

MSC:

62H30 Classification and discrimination; cluster analysis (statistical aspects)
62F35 Robustness and adaptive procedures (parametric inference)

Software:

TCLUST; otrimle
PDFBibTeX XMLCite
Full Text: DOI Link

References:

[1] Ballard, T.J., Kepple, A.W., Cafiero, C.: The food insecurity experience scale: developing a global standard for monitoring hunger worldwide. Technical report, Food and Agriculture Organization of the United Nations, Rome (2013)
[2] Butler, R.W., Davies, P.L., Jhun, M.: Asymptotics for the Minimum Covariance Determinant estimator. Ann. Stat. 21, 1385-1400 (1993) · Zbl 0797.62044 · doi:10.1214/aos/1176349264
[3] Cafiero, C., Melgar-Quinonez, H.R., Ballard, T.J., Kepple, A.W.: Validity and reliability of food security measures. Ann. N. Y. Acad. Sci. 1331, 230-248 (2014) · doi:10.1111/nyas.12594
[4] Cafiero, C., Nord, M., Viviani, S., del Grossi, M.E., Ballard, T.J., Kepple, A.W., Miller, M., Nwosu, C.: Methods for estimating comparable rates of food insecurity experienced by adults throughout the world. Technical report, Food and Agriculture Organization of the United Nations, Rome (2016)
[5] Cerioli, A.: Multivariate outlier detection with high-breakdown estimators. J. Am. Stat. Assoc. 105, 147-156 (2010) · Zbl 1397.62167 · doi:10.1198/jasa.2009.tm09147
[6] Cerioli, A., Farcomeni, A.: Error rates for multivariate outlier detection. Comput. Stat. Data Anal. 55, 544-553 (2011) · Zbl 1247.62192 · doi:10.1016/j.csda.2010.05.021
[7] Cerioli, A., Farcomeni, A., Riani, M.: Strong consistency and robustness of the forward search estimator of multivariate location and scatter. J. Multivar. Anal. 126, 167-183 (2014) · Zbl 1281.62135 · doi:10.1016/j.jmva.2013.12.010
[8] Coretto, P., Hennig, C.: Robust improper maximum likelihood: tuning, computation, and a comparison with other methods for robust Gaussian clustering. J. Am. Stat. Assoc. 111, 1648-1659 (2016) · doi:10.1080/01621459.2015.1100996
[9] Cuesta-Albertos, J.A., Gordaliza, A., Matrán, C.: Trimmed \[k\] k-means: an attempt to robustify quantizers. Ann. Stat. 25, 553-576 (1997) · Zbl 0878.62045 · doi:10.1214/aos/1031833664
[10] Cuesta-Albertos, J.A., Matran, C., Mayo-Iscar, A.: Robust estimation in the normal mixture model based on robust clustering. J. R. Stat. Soc. Ser. B (Stat. Methodol.) 70, 779-802 (2008) · Zbl 05563369 · doi:10.1111/j.1467-9868.2008.00657.x
[11] Farcomeni, A., Greco, L.: Robust Methods for Data Reduction. CRC Press, Boca Raton (2015) · Zbl 1311.62006 · doi:10.1201/b18358
[12] Flury, B., Riedwyl, H.: Multivariate Statistics. A Practical Approach. Chapman and Hall, London (1988) · Zbl 0495.62057 · doi:10.1007/978-94-009-1217-5
[13] Fritz, H., García-Escudero, L.A., Mayo-Iscar, A.: A fast algorithm for robust constrained clustering. Comput. Stat. Data Anal. 61, 124-136 (2013) · Zbl 1349.62264 · doi:10.1016/j.csda.2012.11.018
[14] Gallegos, M.T., Ritter, G.: A robust method for cluster analysis. Ann. Stat. 33, 347-380 (2005) · Zbl 1064.62074 · doi:10.1214/009053604000000940
[15] Gallup: Worldwide Research Methodology and Codebook. Gallup Inc, Washington (2015) · Zbl 1221.62093
[16] García-Escudero, L.A., Gordaliza, A.: The importance of the scales in heterogeneous robust clustering. Comput. Stat. Data Anal. 51, 4403-4412 (2007) · Zbl 1162.62379 · doi:10.1016/j.csda.2006.06.014
[17] García-Escudero, L.A., Gordaliza, A., Matrán, C., Mayo-Iscar, A.: A general trimming approach to robust cluster analysis. Ann. Stat. 36, 1324-1345 (2008) · Zbl 1360.62328 · doi:10.1214/07-AOS515
[18] García-Escudero, L.A., Gordaliza, A., Matrán, C., Mayo-Iscar, A.: A review of robust clustering methods. Adv. Data Anal. Classif. 4, 89-109 (2010) · Zbl 1284.62375 · doi:10.1007/s11634-010-0064-5
[19] García-Escudero, L.A., Gordaliza, A., Matrán, C., Mayo-Iscar, A.: Exploring the number of groups in robust model-based clustering. Stat. Comput. 21, 585-599 (2011) · Zbl 1221.62093 · doi:10.1007/s11222-010-9194-z
[20] Godfray, H.C.J., Beddington, J.R., Crute, I.R., Haddad, K., Lawrence, D., Muir, J.F., Pretty, J., Robinson, S., Thomas, S.M., Toulmin, C.: Food security: the challenge of feeding 9 billion people. Science 327, 812-818 (2010) · doi:10.1126/science.1185383
[21] Hardin, J., Rocke, D.M.: Outlier detection in the multiple cluster setting using the Minimum Covariance Determinant estimator. Comput. Stat. Data Anal. 44, 625-638 (2004) · Zbl 1430.62133 · doi:10.1016/S0167-9473(02)00280-3
[22] Hardin, J., Rocke, D.M.: The distribution of robust distances. J. Comput. Graph. Stat. 14, 928-946 (2005) · doi:10.1198/106186005X77685
[23] Hennig, C.: Breakdown points for maximum likelihood-estimators of location-scale mixtures. Ann. Stat. 32, 1313-1340 (2004) · Zbl 1047.62063 · doi:10.1214/009053604000000571
[24] Hennig, C.; Baier, D. (ed.); Decker, R. (ed.); Schmidt-Thieme, L. (ed.), Fuzzy and crisp Mahalanobis fixed point clusters, 47-56 (2005), Heidelberg · doi:10.1007/3-540-28397-8_6
[25] Hennig, C.: Dissolution point and isolation robustness: robustness criteria for general cluster analysis methods. J. Multivar. Anal. 99, 1154-1176 (2008) · Zbl 1141.62052
[26] Jones, A.D., Ngure, F.M., Pelto, G., Young, S.L.: What are we assessing when we measure food security? A compendium and review of current metrics. Adv. Nutr. 4, 481-505 (2013) · doi:10.3945/an.113.004119
[27] Liu, R.Y., Parelius, J.M., Singh, K.: Multivariate analysis by data depth: descriptive statistics, graphics and inference. Ann. Stat. 27, 783-858 (1999) · Zbl 0984.62037
[28] Lopuhaa, H.P.: Asymptotics of reweighted estimators of multivariate location and scatter. Ann. Stat. 27, 1638-1665 (1999) · Zbl 0957.62017 · doi:10.1214/aos/1017939145
[29] Neykov, N., Filzmoser, P., Dimova, R., Neytchev, P.: Robust fitting of mixtures using the trimmed likelihood estimator. Comput. Stat. Data Anal. 52, 299-308 (2007) · Zbl 1328.62033 · doi:10.1016/j.csda.2006.12.024
[30] Riani, M., Atkinson, A., Cerioli, A.: Finding an unknown number of multivariate outliers. J. R. Stat. Soc. Ser. B 71, 447-466 (2009) · Zbl 1248.62091 · doi:10.1111/j.1467-9868.2008.00692.x
[31] Ritter, G.: Robust Cluster Analysis and Variable Selection. CRC Press, Boca Raton (2014) · Zbl 1341.62037
[32] Rousseeuw, P.J.: Multivariate estimation with high breakdown point. Math. Stat. Appl. 8, 283-297 (1985) · Zbl 0609.62054 · doi:10.1007/978-94-009-5438-0_20
[33] Rousseeuw, P.J., Leroy, A.M.: Robust Regression and Outlier Detection. Wiley-Interscience, New York (1987) · Zbl 0711.62030 · doi:10.1002/0471725382
[34] Rousseeuw, P.J., van Driessen, K.: A fast algorithm for the minimum covariance determinant estimator. Technometrics 41, 212-223 (1999) · doi:10.1080/00401706.1999.10485670
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.