×

The \(\delta \)-machine: classification based on distances towards prototypes. (English) Zbl 1436.62296

Summary: We introduce the \(\delta \)-machine, a statistical learning tool for classification based on (dis)similarities between profiles of the observations to profiles of a representation set consisting of prototypes. In this article, we discuss the properties of the \(\delta \)-machine, propose an automatic decision rule for deciding on the number of clusters for the \(K\)-means method on the predictive perspective, and derive variable importance measures and partial dependence plots for the machine. We performed five simulation studies to investigate the properties of the \(\delta \)-machine. The first three simulation studies were conducted to investigate selection of prototypes, different (dis)similarity functions, and the definition of representation set. Results indicate that we best use the Lasso to select prototypes, that the Euclidean distance is a good dissimilarity function, and that finding a small representation set of prototypes gives sparse but competitive results. The remaining two simulation studies investigated the performance of the \(\delta \)-machine with imbalanced classes and with unequal covariance matrices for the two classes. The results obtained show that the \(\delta \)-machine is robust to class imbalances, and that the four (dis)similarity functions had the same performance regardless of the covariance matrices. We also showed the classification performance of the \(\delta \)-machine compared with three other classification methods on ten real datasets from UCI database, and discuss two empirical examples in detail.

MSC:

62H30 Classification and discrimination; cluster analysis (statistical aspects)
62J07 Ridge regression; shrinkage estimators (Lasso)
62R20 Statistics on metric spaces
68T05 Learning and adaptive systems in artificial intelligence
PDFBibTeX XMLCite
Full Text: DOI

References:

[1] Agresti, A., Categorical data analysis (2013), New Jersey: Wiley, New Jersey · Zbl 1281.62022
[2] Al-Yaseen, Wl; Othman, Za; Nazri, Mza, Multi-level hybrid support vector machine and extreme learning machine based on modified K-means for intrusion detection system, Expert Systems with Applications, 67, 296-303 (2017)
[3] Ashby, Fg, Multidimensional models of perception and cognition (2014), New York: Psychology Press, New York
[4] Ben-Israel, A.; Iyigun, C., Probabilistic D-clustering, Journal of Classification, 25, 1, 5-26 (2008) · Zbl 1260.62039
[5] Bergman, Lr; Magnusson, D., A person-oriented approach in research on developmental psychopathology, Development and Psychopathology, 9, 2, 291-319 (1997)
[6] Berk, Ra, Statistical learning from a regression perspective (2008), New York: Springer, New York · Zbl 1258.62047
[7] Beyer, K., Goldstein, J., Ramakrishnan, R., Shaft, U. (1999). When is “nearest neighbor” meaningful. In Beeri, C., & Buneman, P. (Eds.) Database theory - ICDT 99 (pp. 217-235). Springer: Berlin.
[8] Boj, E.; Caballé, A.; Delicado, P.; Esteve, A.; Fortiana, J., Global and local distance-based generalized linear models, TEST, 25, 1, 170-195 (2015) · Zbl 1338.62111
[9] Breiman, L., Random forests, Machine Learning, 45, 1, 5-32 (2001) · Zbl 1007.68152
[10] Cohen, J., Eta-squared and partial Eta-squared in fixed factor ANOVA designs, Educational and Psychological Measurement, 33, 1, 107-112 (1973)
[11] Commandeur, Jj; Groenen, Pj; Meulman, J., A distance-based variety of nonlinear multivariate data analysis, including weights for objects and variables, Psychometrika, 64, 2, 169-186 (1999) · Zbl 1291.62198
[12] Cooper, M.C., & Milligan, G.W. (1988). The effect of measurement error on determining the number of clusters in cluster analysis. In Gaul, W., & Schader, M. (Eds.), Data, expert knowledge and decisions (pp. 319-328). Berlin: Springer.
[13] Cormack, R. M., A Review of Classification, Journal of the Royal Statistical Society. Series A (General), 134, 3, 321 (1971)
[14] Cortes, C.; Vapnik, V., Support-vector networks, Machine Learning, 20, 3, 273-297 (1995) · Zbl 0831.68098
[15] Cox, Tf; Cox, Ma, Multidimensional scaling (2000), Boca Raton: CRC press, Boca Raton
[16] De Rooij, M., Distance models for transition frequency data: Ph.D dissertation (2001), Leiden University: Department of Psychology, Leiden University
[17] Dietterich, Tg; Lathrop, Rh; Lozano-Pérez, T., Solving the multiple instance problem with axis-parallel rectangles, Artificial intelligence, 89, 1-2, 31-71 (1997) · Zbl 1042.68650
[18] Duch, W., Jankowski, N., Maszczyk, T. (2012). Make it cheap: learning with O (nd) complexity. In: The 2012 International Joint Conference on Neural Networks (IJCNN). IEEE, pp. 1-4.
[19] Duin, R.P., Loog, M., Pekalska, E., Tax, D.M. (2010). Feature-based dissimilarity space classification. In: Recognizing patterns in signals, speech, images and videos. Springer, pp. 46-55.
[20] Duin, Rp; Pekalska, E., The dissimilarity space: bridging structural and statistical pattern recognition, Pattern Recognition Letters, 33, 7, 826-832 (2012)
[21] Fawcett, T., An introduction to ROC analysis, Pattern Recognition Letters, 27, 8, 861-874 (2006)
[22] Fleiss, Jl; Zubin, J., On the methods and theory of clustering, Multivariate Behavioral Research, 4, 2, 235-250 (1969)
[23] Fox, J.; Weisberg, S., An R companion to applied regression (2011), Thousand Oaks: Sage, Thousand Oaks
[24] Freund, Y.; Schapire, Re, A decision-theoretic generalization of on-line learning and an application to boosting, Journal of Computer and System Sciences, 55, 1, 119-139 (1997) · Zbl 0880.68103
[25] Friedman, J., Greedy function approximation: a gradient boosting machine, Annals of Statistics, 29, 5, 1189-1232 (2001) · Zbl 1043.62034
[26] Friedman, J.; Hastie, T.; Tibshirani, R., The elements of statistical learning (2009), New York: Springer, New York · Zbl 1273.62005
[27] Friedman, J., Hastie, T., Tibshirani, R. (2010a). glmnet: regularization paths for generalized linear models via coordinate descent, R package version 1.6-4, Available at http://www.jstatsoft.org/v33/i01/.
[28] Friedman, J.; Hastie, T.; Tibshirani, R., Regularization paths for generalized linear models via coordinate descent, Journal of Statistical Software, 33, 1, 1-22 (2010)
[29] Friedman, J.; Meulman, J., Clustering objects on subsets of attributes (with discussion), Journal of the Royal Statistical Society: Series B (Statistical Methodology),, 66, 4, 815-849 (2004) · Zbl 1060.62064
[30] Ghazvini, A.; Awwalu, J.; Bakar, Aa, Comparative analysis of algorithms in supervised classification: a case study of bank notes dataset, International Journal of Computer Trends and Technology, 17, 1, 39-43 (2014)
[31] Gower, Jc, Some distance properties of latent root and vector methods used in multivariate analysis, Biometrika, 53, 3-4, 325-338 (1966) · Zbl 0192.26003
[32] Gower, Jc, A general coefficient of similarity and some of its properties, Biometrics, 27, 4, 857-871 (1971)
[33] Hastie, T. (2015). gam: generalized additive models, R package version 1.12.
[34] Hastie, T., & Tibshirani, R. (1990). Generalized additive models, 1st, Vol. 43, CRC Press, Boca Raton. · Zbl 0747.62061
[35] Hornik, K.; Stinchcombe, M.; White, H., Multilayer feedforward networks are universal approximators, Neural networks, 2, 5, 359-366 (1989) · Zbl 1383.92015
[36] James, G.; Witten, D.; Hastie, T.; Tibshirani, R., An introduction to statistical learning (2013), New York: Springer, New York · Zbl 1281.62147
[37] Japkowicz, N.; Stephen, S., The class imbalance problem: a systematic study, Intelligent data analysis, 6, 5, 429-449 (2002) · Zbl 1085.68628
[38] Kaufman, L.; Rousseeuw, Pj, Finding groups in data: an introduction to cluster analysis (1990), New York: Wiley, New York · Zbl 1345.62009
[39] MacQueen, J. (1967). Some methods for classification and analysis of multivariate observations. In Cam, L.M.L., & Neyman, J. (Eds.) Proceedings of the 5th Berkeley symposium on mathematical statistics and probability (pp. 281-297). Berkeley: Calif.: University of California Press. · Zbl 0214.46201
[40] Maechler, M., Rousseeuw, P., Struyf, A., Hubert, M., Hornik, K. (2013). Cluster: cluster analysis basics and extensions, R package version 1.14.4.
[41] Mcdermott, J.; Forsyth, Rs, Diagnosing a disorder in a classification benchmark, Pattern Recognition Letters, 73, 41-43 (2016)
[42] Meulman, J., The integration of multidimensional scaling and multivariate analysis with optimal transformations, Psychometrika, 57, 4, 539-565 (1992)
[43] Meyer, D., Dimitriadou, E., Hornik, K., Weingessel, A., Leisch, F. (2014). e1071: Misc Functions of the Department of Statistics (e1071), TU Wien, R package version 1.6-4, Available at http://CRAN.R-project.org/package=e1071.
[44] Mirkin, B., Concept learning and feature selection based on square-error clustering, Machine Learning, 35, 1, 25-39 (1999) · Zbl 0920.68102
[45] Mirkin, B., Clustering: a data recovery approach, 230-233 (2012), Boca Raton: Chapman & Hall, Boca Raton
[46] Newman, D.J., Hettich, S., Blake, C.L., Merz, C.J. (1998). UCI repository of machine learning databases, Available at http://www.ics.uci.edu/mlearn/MLRepository.html.
[47] Nosofsky, Rm, Attention, similarity, and the identification-categorization relationship, Journal of Experimental Psychology: General, 115, 1, 39-57 (1986)
[48] Pekalska, E.; Duin, Rp, The dissimilarity representation for pattern recognition: foundations and applications (2005), Singapore: World Scientific, Singapore · Zbl 1095.68105
[49] R Core Team. (2015). R: a language and environment for statistical computing, Vienna, Austria: R Foundation for Statistical Computing, Available at http://www.R-project.org/.
[50] Richardson, Jt, Eta squared and partial eta squared as measures of effect size in educational research, Educational Research Review, 6, 2, 135-147 (2011)
[51] Rousseeuw, Pj, Silhouettes: a graphical aid to the interpretation and validation of cluster analysis, Journal of Computational and Applied Mathematics, 20, 53-65 (1987) · Zbl 0636.62059
[52] Rovai, Ap; Baker, Jd; Ponton, Mk, Social science research design and statistics: a practitioner’s guide to research methods and IBM SPSS, Vol. 2 (2013), Chesapeake: Watertree Press, Chesapeake
[53] Schaffer, Cm; Green, Pe, An empirical comparison of variable standardization methods in cluster analysis, Multivariate Behavioral Research, 31, 2, 149-167 (1996)
[54] Steinley, Douglas, Standardizing Variables in K-means Clustering, Classification, Clustering, and Data Mining Applications, 53-60 (2004), Berlin, Heidelberg: Springer Berlin Heidelberg, Berlin, Heidelberg
[55] Steinley, D.; Brusco, Mj, Choosing the number of clusters in K-means clustering, Psychological Methods, 16, 3, 285 (2011)
[56] Tao, Q., Scott, S., Vinodchandran, N.V., Osugi, T.T. (2004). SVM-based generalized multiple-instance learning via approximate box counting. In: Proceedings of the 21st international conference on machine learning. ACM, pp. 101.
[57] Tibshirani, R., Regression shrinkage and selection via the lasso, Journal of the Royal Statistical Society. Series B (Methodological), 58, 1, 267-288 (1996) · Zbl 0850.62538
[58] Van Der Kooij, Aj, Prediction accuracy and stability of regression with optimal scaling transformations: Ph.D dissertation (2007), Leiden University: Department of Education and Child Studies, Leiden University
[59] Van Rijsbergen, C.J. (1979). Information retrieval (2nd ed.): Butterworths. · Zbl 0227.68052
[60] Venables, Wn; Ripley, Bd, Modern applied statistics with S (2002), New York: Springer, New York · Zbl 1006.62003
[61] Vesanto, Juha, Importance of Individual Variables in the k-Means Algorithm, Advances in Knowledge Discovery and Data Mining, 513-518 (2001), Berlin, Heidelberg: Springer Berlin Heidelberg, Berlin, Heidelberg · Zbl 0978.68653
[62] Yeh, I-C; Yang, K-J; Ting, T-M, Knowledge discovery on RFM model using bernoulli sequence, Expert Systems with Applications, 36, 3, 5866-5871 (2009)
[63] Zhu, J.; Hastie, T., Kernel logistic regression and the import vector machine, Journal of Computational and Graphical Statistics, 14, 1, 185-205 (2012)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.