New and efficient DCA based algorithms for minimum sum-of-squares clustering. (English) Zbl 1326.68225

Summary: The purpose of this paper is to develop new efficient approaches based on DC (Difference of Convex functions) programming and DCA (DC Algorithm) to perform clustering via minimum sum-of-squares Euclidean distance. We consider the two most widely used models for the so-called Minimum Sum-of-Squares Clustering (MSSC in short) that are a bilevel programming problem and a mixed integer program. Firstly, the mixed integer formulation of MSSC is carefully studied and is reformulated as a continuous optimization problem via a new result on exact penalty technique in DC programming. DCA is then investigated to the resulting problem. Secondly, we introduce a Gaussian kernel version of the bilevel programming formulation of MSSC, named GKMSSC. The GKMSSC problem is formulated as a DC program for which a simple and efficient DCA scheme is developed. A regularization technique is investigated for exploiting the nice effect of DC decomposition and a simple procedure for finding good starting points of DCA is developed. The proposed DCA schemes are original and very inexpensive because they amount to computing, at each iteration, the projection of points onto a simplex and/or onto a ball, and/or onto a box, which are all determined in the explicit form. Numerical results on real word datasets show the efficiency, the scalability of DCA and its great superiority with respect to k-means and kernel k-means, standard methods for clustering.


68T05 Learning and adaptive systems in artificial intelligence
90C90 Applications of mathematical programming
Full Text: DOI


[1] Aizerman, M.; Braverman, E.; Rozonoer, L., Theoretical foundations of the potential function method in pattern recognition learning, Automation and Remote Control, 25, 821, (1964) · Zbl 0151.24701
[2] D. Aloise, A. Deshpande, P. Hansen, P. Popat, Np-Hardness of Euclidean Sum-of-Squares Clustering, Cahiers du GERAD G-2008-33, 2008. · Zbl 1378.68047
[3] S. Arora, R. Kannan, Learning mixtures of arbitrary Gaussians, in: Proceedings of the 33rd Annual ACM Symposium on Theory of Computing, 2001, pp. 247-257. · Zbl 1323.68440
[4] Bagirov, A. M., Modified global k-means algorithm for minimum sum-of-squares clustering problems, Pattern Recognition, 41, 10, 3192-3199, (2008) · Zbl 1147.68669
[5] Bharath K. Sriperumbudur, David A. Torres, Gert R.G. Lanckriet, Sparse eigen methods by D.C. programming, in: Proceedings of the 25th International Conference on Machine Learning, 2007. · Zbl 1237.65060
[6] B.S. Bradley, O.L. Mangasarian, Feature selection via concave minimization and support vector machines, in: Machine Learning Proceedings of the 15th International Conferences (ICML’98), San Francisco, CA, Morgan Kaufmann, 1998, pp. 82-90.
[7] J.C. Bezdek, Fuzzy Mathematics in Pattern Classification, Ph.D. Dissertation, Cornell University, Ithaca, NY, 1973.
[8] Brusco, M. J., A repetitive branch-and-bound procedure for minimum within-cluster sum of squares partitioning, Psychometrika, 71, 347-363, (2006) · Zbl 1306.62387
[9] Cortes, C.; Vapnik, V., Support vector networks, Machine Learning, 20, 273-297, (1995) · Zbl 0831.68098
[10] Dempster, A. P.; Laird, N. M.; Rubin, D. B., Maximum likelihood from incomplete data via the EM algorithm, Journal of the Royal Statistical Society: Series B, 39, 1-38, (1977) · Zbl 0364.62022
[11] Dhilon, I. S.; Korgan, J.; Nicholas, C., Feature selection and document clustering, (Berry, M. W., A Comprehensive Survey of Text Mining, (2003), Springer-Verlag), 73-100
[12] Fisher, D., Knowledge acquisition via incremental conceptual clustering, Machine Learning, 2, 139-172, (1987)
[13] Duda, R. O.; Hart, P. E., Pattern classification and scene analysis, (1972), Wiley
[14] T. Feder, D. Greene, Optimal algorithms for approximate clustering, in: Proceedings of STOC, 1988.
[15] Filippone, M.; Camastra, F.; Masulli, F.; Rovetta, S., A survey of kernel and spectral methods for clustering, Pattern Recognition, 41, 176-190, (2008) · Zbl 1122.68530
[16] Forgy, E., Cluster analysis of multivariate dateefficiency vs. interpretability of classifications, Biometrics, 21, 768, (1965)
[17] Jancey, R. C., Multidimensional group analysis, Australian Journal of Botany, 14, 127-130, (1996)
[18] Jain, A. K.; Murty, M. N.; Flynn, P. J., Data clusteringa review, ACM Computing Surveys, 31, 3, 264-323, (1999)
[19] Joaquim, J. J.; Raydan, M.; Rosa, S. S.; Santos, S. A., On the solution of the symmetric eigenvalue complementarity problem by the spectral projected gradient algorithm, Numerical Algorithms, 47, 4, 391-407, (2008) · Zbl 1144.65042
[20] Hansen, P.; Jaumard, B., Cluster analysis and mathematical programming, Mathematical Programming, 79, 191-215, (1997) · Zbl 0887.90182
[21] Herbrich, Learning kernel classifiers, (2002), MIT Press · Zbl 1063.62092
[22] N. Krause, Y. Singer, Leveraging the margin more carefully, in: International Conference on Machine Learning, ICML, 2004.
[23] Le Thi Hoai An, Contribution à l’optimisation non convexe et l’optimisation globale: Théorie, Algoritmes et Applications, Habilitation à Diriger des Recherches, Université de Rouen, 1997.
[24] Le Thi Hoai An, DC Programming and DCA, 〈http://lita.sciences.univ-metz.fr/ lethi〉.
[25] An, Le Thi Hoai; Tao, Pham Dinh, Solving a class of linearly constrained indefinite quadratic problems by DC algorithms, Journal of Global Optimization, 11, 3, 253-285, (1997) · Zbl 0905.90131
[26] An, Le Thi Hoai; Tao, Pham Dinh, A branch-and-bound method via D.C. optimization algorithm and ellipsoidal technique for box constrained nonconvex quadratic programming problems, Journal of Global Optimization, 13, 171-206, (1998) · Zbl 0912.90233
[27] An, Le Thi Hoai; Tao, Pham Dinh, The DC (difference of convex functions) programming and DCA revisited with DC models of real world nonconvex optimization problems, Annals of Operations Research, 133, 23-46, (2005) · Zbl 1116.90122
[28] Le Thi Hoai An, Huynh Van Ngai, Pham Dinh Tao, Exact penalty and error bounds in DC programming, Journal of Global Optimization, dedicated to Reiner Horst, 52 (3) (2012) 509-535
[29] An, Le Thi Hoai; Minh, Le Hoai; Tao, Pham Dinh, Optimization based DC programming and DCA for hierarchical clustering, European Journal of Operational Research, 183, 1067-1085, (2006) · Zbl 1149.90117
[30] An, Le Thi Hoai; Belghiti, T.; Tao, Pham Dinh, A new efficient algorithm based on DC programming and DCA for clustering, Journal of Global Optimization, 37, 593-608, (2007) · Zbl 1198.90327
[31] An, Le Thi Hoai; Minh, Le Hoai; Tao, Pham Dinh, Fuzzy clustering based on nonconvex optimisation approaches using difference of convex (DC) functions algorithms, Journal of Advances in Data Analysis and Classification, 2, 1-20, (2007) · Zbl 1301.90072
[32] An, Le Thi Hoai; Minh, Le Hoai; Vinh, Nguyen Van; Tao, Pham Dinh, A DC programming approach for feature selection in support vector machines learning, Journal of Advances in Data Analysis and Classification, 2, 3, 259-278, (2008) · Zbl 1284.90057
[33] Le Thi Hoai An, Pham Dinh Tao, Minimum sum-of-squares clustering by DC programming and DCA, in: ICIC’09 Proceedings of the Intelligent Computing 5th International Conference on Emerging Intelligent Computing Technology and Applications, 2009, pp. 327-340.
[34] An, Le Thi Hoai; Minh, Le Hoai; Tao, Pham Dinh; Ngai, Huynh Van, Binary classification via spherical separator by DC programming and DCA, Journal of Global Optimization, 23, 1-15, (2012) · Zbl 1322.90072
[35] Le Thi Hoai An, Le Hoai Minh, Pham Dinh Tao, Huynh Van Ngai, Block clustering based on DC programming and DCA, Neural Computation, http://dx.doi.org/10. 1162/NECO_a_00490, in press.
[36] Le Thi Hoai An, Pham Dinh Tao, Nguyen Canh Nam, Le Hoai Minh, DC programming and DCA for diversity data mining, Optimisation, in press.
[37] Liu, Y.; Shen, X.; Doss, H., Multicategory \(\psi \operatorname{-} \operatorname{learning}\) and support vector machine, computational tools, Journal of Computational and Graphical Statistics, 14, 219-236, (2005)
[38] Liu, Y.; Shen, X., Multicategory \(\psi \operatorname{-} \operatorname{learning}\), Journal of the American Statistical Association, 101, 500-509, (2006)
[39] Mangasarian, O. L., Mathematical programming in data mining, Data Mining and Knowledge Discovery, 1, 183-201, (1997)
[40] J.B. MacQueen, Some methods for classification and analysis of multivariate observations, in: Proceedings of 5th Berkeley Symposium on Mathematical Statistics and Probability, vol. 1, Berkeley, University of California Press, 1967, pp. 281-297. · Zbl 0214.46201
[41] Du Merle, O.; Hansen, P.; Jaumard, B.; Mladenovi’c, N., An interior point algorithm for minimum sum of squares clustering, SIAM Journal on Scientific Computing, 21, 4, 1485-1505, (2000) · Zbl 1049.90129
[42] Mladenović, N.; Hansen, P., Variable neighborhood search, Computers & Operations Research, 24, 1097-1100, (1997) · Zbl 0889.90119
[43] J. Neumann, C. Schnörr, G. Steidl, SVM-based feature selection by direct objective minimisation, pattern recognition, in: Proceedings of 26th DAGM Symposium, 2004, pp. 212-219.
[44] J. Peng, Y. Xiay, A cutting algorithm for the minimum sum-of-squared error clustering, in: Proceedings of the SIAM International Data Mining Conference, 2005.
[45] Tao, Pham Dinh; An, Le Thi Hoai, DC optimization algorithms for solving the trust region subproblem, SIAM Journal of Optimization, 8, 476-505, (1998) · Zbl 0913.65054
[46] C. Ronan, S. Fabian, W. Jason, B. Léon, Trading convexity for scalability, in: International Conference on Machine Learning ICML, 2006.
[47] Shen, X.; Tseng, G. C.; Zhang, X.; Wong, W. H., \(\psi \operatorname{-} \operatorname{learning}\), Journal of American Statistical Association, 98, 724-734, (2003)
[48] Xavier, A. E.; Xavier, V. L., Solving the minimum sum-of-squares clustering problem by hyperbolic smoothing and partition into boundary and gravitational regions, Pattern Recognition, 44, 70-77, (2011) · Zbl 1207.68326
[49] Sherali, H. D.; Desai, J., A global optimization RLT-based approach for solving the hard clustering problem, Journal of Global Optimization, 32, 281-306, (2005) · Zbl 1123.62045
[50] A.L. Yuille, A. Rangarajan, The convex concave procedure (CCCP), in: Advances in Neural Information Processing System, MIT Press, Cambridge MA, 2002, p. 14. · Zbl 1022.68112
[51] Vinod, H. D., Integer programming and the theory of grouping, Journal of the American Statistical Association, 64, 506-519, (1969) · Zbl 0272.90050
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.