×

Linear discriminant analysis guided by unsupervised ensemble learning. (English) Zbl 1443.68145

Summary: The high dimensionality and sparsity of data often increase the complexity of clustering; these factors occur simultaneously in unsupervised learning. Clustering and linear discriminant analysis (LDA) are methods to reduce the dimensionality and sparsity of data. In this study, the similarity of clustering and LDA are investigated based on their objective functions. Subsequently, their objective functions are integrated, and an LDA guided by an unsupervised ensemble learning (LDA-UEL) model is proposed. To create the proposed model, fuzziness \(F\) is designed to measure the confidence of unsupervised learning and the inference of the proposed model is illustrated. Furthermore, a corresponding algorithm for the inference is designed. Finally, extensive experiments are designed, and the results thus obtained demonstrate the effectiveness and high performance of the LDA-UEL model.

MSC:

68T05 Learning and adaptive systems in artificial intelligence
62H30 Classification and discrimination; cluster analysis (statistical aspects)

Software:

APCluster; apcluster
PDFBibTeX XMLCite
Full Text: DOI

References:

[1] Al-Razgan, M.; Domeniconi, C., Weighted clustering ensembles, Proceedings of the 2006 SIAM International Conference on Data Mining, 258-269 (2006), SIAM
[2] Balakrishnama, S.; Ganapathiraju, A., Linear discriminant analysis-a brief tutorial, 18, 1-8 (1998), Institute for Signal and information Processing
[3] Banerjee, B.; Bovolo, F.; Bhattacharya, A.; Bruzzone, L.; Chaudhuri, S.; Mohan, B. K., A new self-training-based unsupervised satellite image classification technique using cluster ensemble strategy, IEEE Geosci. Remote Sens. Lett., 12, 4, 741-745 (2015)
[4] Belkin, M.; Niyogi, P., Laplacian eigenmaps and spectral techniques for embedding and clustering, Advances in neural information processing systems, 585-591 (2002)
[5] Bezdek, J. C.; Ehrlich, R.; Full, W., Fcm: the fuzzy c-means clustering algorithm, Comput. Geosci., 10, 2-3, 191-203 (1984)
[6] Cai, W., A dimension reduction algorithm preserving both global and local clustering structure, Knowl. Based Syst., 118, 191-203 (2017)
[7] Caruana, R.; Niculescu-Mizil, A., An empirical comparison of supervised learning algorithms, Proceedings of the 23rd International Conference on Machine Learning, 161-168 (2006), ACM
[8] Cheng, Y., Mean shift, mode seeking, and clustering, IEEE Trans. Pattern Anal. Mach. Intell., 17, 8, 790-799 (1995)
[9] Chu, J.; Wang, H.; Meng, H.; Jin, P.; Li, T., Restricted boltzmann machines with gaussian visible units guided by pairwise constraints, IEEE Trans. Cybern., 99, 1-14 (2018)
[10] Comaniciu, D.; Meer, P., Mean shift analysis and applications, The Proceedings of the Seventh IEEE International Conference onComputer Vision, 1999., 2, 1197-1203 (1999), IEEE
[11] Dempster, A. P.; Laird, N. M.; Rubin, D. B., Maximum likelihood from incomplete data via the em algorithm, J.R.Stat.Soc.. Ser. B (Methodol.), 1-38 (1977) · Zbl 0364.62022
[12] D. Dheeru, E. Karra Taniskidou, UCI machine learning repository, 2017.; D. Dheeru, E. Karra Taniskidou, UCI machine learning repository, 2017.
[13] Ding, C.; He, X.; Zha, H.; Simon, H. D., Adaptive dimension reduction for clustering high dimensional data, IEEE International Conference on Data Mining, 2002. ICDM 2003. Proceedings, 147-154 (2002)
[14] Ding, C.; He, X., K-means clustering via principal component analysis, Proceedings of the twenty-first international conference on Machine learning, 29 (2004), ACM
[15] Ding, C.; Li, T., Adaptive dimension reduction using discriminant analysis and k-means clustering, Proceedings of the 24th International Conference on Machine Learning, 521-528 (2007), ACM
[16] Domeniconi, C.; Al-Razgan, M., Weighted cluster ensembles: methods and analysis, ACM Trans. Knowl. Disc. Data (TKDD), 2, 4, 17 (2009)
[17] Elhamifar, E.; Vidal, R., Sparse subspace clustering, IEEE Conference on Computer Vision and Pattern Recognition, 2009. CVPR 2009, 2790-2797 (2009), IEEE
[18] Elhamifar, E.; Vidal, R., Sparse subspace clustering: algorithm, theory, and applications, IEEE Trans. Pattern Anal. Mach. Intell., 35, 11, 2765-2781 (2013)
[19] Fern, X. Z.; Brodley, C. E., Random projection for high dimensional data clustering: A cluster ensemble approach, Proceedings of the 20th International Conference on Machine Learning (ICML-03), 186-193 (2003)
[20] Fodor, I. K., A survey of dimension reduction techniques, Technical Report (2002), Lawrence Livermore National Lab., CA (US)
[21] Frey, B. J.; Dueck, D., Clustering by passing messages between data points, Science, 315, 5814, 972-976 (2007) · Zbl 1226.94027
[22] García, S.; Fernández, A.; Luengo, J.; Herrera, F., Advanced nonparametric tests for multiple comparisons in the design of experiments in computational intelligence and data mining: experimental analysis of power, Inf. Sci. (Ny), 180, 10, 2044-2064 (2010)
[23] Ghosh, S.; Dubey, S. K., Comparative analysis of k-means and fuzzy c-means algorithms, Int. J. Adv. Comput. Sci. Appl., 4, 4 (2013)
[24] Gottlieb, L.-A.; Krauthgamer, R., A nonlinear approach to dimension reduction, Discr. Comput. Geom., 54, 2, 291-315 (2015) · Zbl 1334.68249
[25] Jain, A. K.; Murty, M. N.; Flynn, P. J., Data clustering: a review, ACM Comput.Surv. (CSUR), 31, 3, 264-323 (1999)
[26] Jolliffe, I. T., Principal Component Analysis and Factor Analysis, Principal Component Analysis, 115-128 (1986), Springer
[27] Karypis, G.; Kumar, V., A fast and high quality multilevel scheme for partitioning irregular graphs (1998), Soc. Ind. Appl. Math.
[28] Kaufman, L.; Rousseeuw, P., Clustering by means of medoids (1987), North-Holland
[29] Kuncheva, L. I.; Vetrov, D. P., Evaluation of stability of k-means cluster ensembles with respect to random initialization, IEEE Trans. Pattern Anal. Mach. Intell., 28, 11, 1798-1808 (2006)
[30] Liu, S.; Maljovec, D.; Wang, B.; Bremer, P.-T.; Pascucci, V., Visualizing high-dimensional data: advances in the past decade, IEEE Trans. Vis. Comput. Graph., 23, 3, 1249-1268 (2017)
[31] MacQueen, J., Some methods for classification and analysis of multivariate observations, Proceedings of the fifth Berkeley symposium on mathematical statistics and probability, 1, 281-297 (1967), Oakland, CA, USA. · Zbl 0214.46201
[32] Minaei-Bidgoli, B.; Topchy, A. P.; Punch, W. F., A comparison of resampling methods for clustering ensembles., IC-AI, 939-945 (2004)
[33] Murtagh, F.; Contreras, P., Random projection towards the baire metric for high dimensional clustering, International Symposium on Statistical Learning and Data Sciences, 424-431 (2015), Springer
[34] Nguyen, N.; Caruana, R., Consensus clusterings, ICDM 2007.Seventh IEEE International Conference onData Mining, 2007., 607-612 (2007), IEEE
[35] Rodriguez, A.; Laio, A., Clustering by fast search and find of density peaks, Science, 344, 6191, 1492-1496 (2014)
[36] Roweis, S. T.; Saul, L. K., Nonlinear dimensionality reduction by locally linear embedding, Science, 290, 5500, 2323-2326 (2000)
[37] Strehl, A.; Ghosh, J., Cluster ensembles—a knowledge reuse framework for combining multiple partitions, J.Mach.Learn.Res., 3, Dec, 583-617 (2002) · Zbl 1084.68759
[38] Tenenbaum, J. B.; De Silva, V.; Langford, J. C., A global geometric framework for nonlinear dimensionality reduction, Science, 290, 5500, 2319-2323 (2000)
[39] Topchy, A.; Jain, A. K.; Punch, W., Clustering ensembles: models of consensus and weak partitions, IEEE Trans. Pattern Anal. Mach. Intell., 27, 12, 1866-1881 (2005)
[40] Topchy, A. P.; Jain, A. K.; Punch, W. F., A mixture model for clustering ensembles, Siam International Conference on Data Mining, Lake Buena Vista, Florida, Usa, April (2004)
[41] Wang, D.; Chen, Y.; Guo, J.; Shi, X.; He, C.; Luo, X.; Yuan, H., Elastic-net regularized latent factor analysis-based models for recommender systems, Neurocomputing (2018)
[42] Wang, M.; Yang, L.; Hua, X.-S., Msra-mm: bridging research and industrial societies for multimedia information retrieval, Microsoft Res. Asia Tech. Rep., MSR-TR-2009-30, March (2009)
[43] Wang, T., Ca-tree: a hierarchical structure for efficient and scalable coassociation-based cluster ensembles, IEEE Trans. Syst., Man, Cybern, Part B (Cybern.), 41, 3, 686-698 (2011)
[44] Wang, X. Z.; Ashfaq, R. A.R.; Fu, A. M., Fuzziness based sample categorization for classifier performance improvement, J. Intell. Fuzzy Syst., 1-12 (2015)
[45] Wang, X. Z.; Xing, H. J.; Li, Y.; Hua, Q.; Dong, C. R.; Pedrycz, W., A study on relationship between generalization abilities and fuzziness of base classifiers in ensemble learning, IEEE Trans. Fuzzy Syst., 23, 5, 1638-1654 (2015)
[46] Xu, R., Survey of clustering algorithms (2005), IEEE Press
[47] Yang, Z.; Oja, E., Linear and nonlinear projective nonnegative matrix factorization, IEEE Trans. Neural Netw., 21, 5, 734-749 (2010)
[48] Yeung, D. S.; Wang, X., Improving performance of similarity-based clustering by feature weight learning, IEEE Trans. Pattern Anal. Mach. Intell., 24, 4, 556-561 (2002)
[49] Zhou, Z.-H.; Tang, W., Clusterer ensemble, Knowl. Based Syst., 19, 1, 77-83 (2006)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.