×

On the equivalence between non-negative matrix factorization and probabilistic latent semantic indexing. (English) Zbl 1452.62053

Summary: Non-negative Matrix Factorization (NMF) and Probabilistic Latent Semantic Indexing (PLSI) have been successfully applied to document clustering recently. In this paper, we show that PLSI and NMF (with the I-divergence objective function) optimize the same objective function, although PLSI and NMF are different algorithms as verified by experiments. This provides a theoretical basis for a new hybrid method that runs PLSI and NMF alternatively, each jumping out of the local minima of the other method successively, thus achieving a better final solution. Extensive experiments on five real-life datasets show relations between NMF and PLSI, and indicate that the hybrid method leads to significant improvements over NMF-only or PLSI-only methods. We also show that at first-order approximation, NMF is identical to the \(\chi ^{2}\)-statistic.

MSC:

62-08 Computational methods for problems pertaining to statistics

Software:

WebACE
PDF BibTeX XML Cite
Full Text: DOI Link

References:

[1] Blei, D.; Ng, A.; Jordan, M.I., Latent Dirichlet allocation, Journal of machine learning research, 3, 993-1022, (2003) · Zbl 1112.68379
[2] Ding, C., He, X., Simon, H.D., On the equivalence of nonnegative matrix factorization and spectral clustering. In: Proc. SIAM Data Mining Conf, 2005
[3] Ding, C., Li, T., Peng, W., Nonnegative matrix factorization and probabilistic latent semant ic indexing: Equivalence, chi-square statistic, and a hybrid method. In: Proc. of National Conf. on Artificial Intelligence, AAAI-06, 2006
[4] Gaussier, Eric; Goutte, Cyril, Relation between plsa and nmf and implications, (), 601-602
[5] Milligan, G.W.; Cooper, M.C., A study of the comparability of external criteria for hierarchical cluster analysis, Multivar. behav. res., 21, 846-850, (1986)
[6] Han, Eui-Hong; Boley, Daniel; Gini, Maria; Gross, Robert; Hastings, Kyle; Karypis, George; Kumar, Vipin; Mobasher, Bamshad; Moore, Jerome, Webace: A web agent for document categorization and exploration, ()
[7] Hofmann, Thomas, Probabilistic latent semantic analysis, (), 289-296 · Zbl 0970.68130
[8] Lee, D.D.; Seung, H.S., Learning the parts of objects by non-negative matrix factorization, Nature, 401, 788-791, (1999) · Zbl 1369.68285
[9] Lee, D.D.; Seung, H.S., Algorithms for non-negative matrix factorization, ()
[10] Li, Tao, A general model for clustering binary data, (), 188-197
[11] Pauca, V.P., Shahnaz, F., Berry, M.W., Plemmons, R.J., 2004. Text mining using non-negative matrix factorization. In: Proc. SIAM Int’l conf on Data Mining, SDM 2004, pp. 452-456
[12] Rand, W.M., Objective criteria for the evaluation of clustering methods, Journal of American statistical association, 66, 846-850, (1971)
[13] Xu, W., Liu, X., Gong, Y., Document clustering based on non-negative matrix factorization. In: Proc. ACM Conf. Research development in IR(SIRGIR), pp. 267-273, 2003
[14] Zhao, Ying; Karypis, George, Empirical and theoretical comparisons of selected criterion functions for document clustering, Machine learning, 55, 3, 311-331, (2004) · Zbl 1089.68615
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.