×

Model-based classification using latent Gaussian mixture models. (English) Zbl 1181.62095

Summary: A novel model-based classification technique is introduced based on parsimonious Gaussian mixture models (PGMMs). PGMMs, which were introduced recently as a model-based clustering technique [N. Dean et al., J. R. Stat. Soc. Ser. C 55, No. 1, 1–14 (2006; Zbl 1490.62155)], arise from a generalization of the mixtures of factor analyzers model and are based on a latent Gaussian mixture model. In this paper, this mixture modelling structure is used for model-based classification and the particular area of application is food authenticity. Model-based classification is performed by jointly modelling data with known and unknown group memberships within a likelihood framework and then estimating the parameters, including the unknown group memberships, within an alternating expectation-conditional maximization framework. Model selection is carried out using the Bayesian information criteria and the quality of the maximum a posteriori classifications is summarized using the misclassification rate and the adjusted W. M. Rand index [J. Am. Stat. Assoc. 66, 846–850 (1971)]. This new model-based classification technique gives excellent classification performance when applied to real food authenticity data on the chemical properties of olive oils from nine areas of Italy.

MSC:

62H30 Classification and discrimination; cluster analysis (statistical aspects)
62F15 Bayesian inference
62P99 Applications of statistics

Citations:

Zbl 1490.62155

Software:

GGobi; PGMM; mclust
PDFBibTeX XMLCite
Full Text: DOI

References:

[1] Banfield, J. D.; Raftery, A. E., Model-based Gaussian and non-Gaussian clustering, Biometrics, 49, 3, 803-821 (1993) · Zbl 0794.62034
[2] Bensmail, H.; Celeux, G., Regularized gaussian discriminant analysis through eigenvalue decomposition, Journal of the American Statistical Association, 91, 1743-1748 (1996) · Zbl 0885.62068
[3] Böhning, D.; Dietz, E.; Schaub, R.; Schlattmann, P.; Lindsay, B., The distribution of the likelihood ratio for mixtures of densities from the one-parameter exponential family, Annals of the Institute of Statistical Mathematics, 46, 373-388 (1994) · Zbl 0802.62017
[4] Celeux, G.; Govaert, G., Gaussian parsimonious clustering models, Pattern Recognition, 28, 781-793 (1995)
[5] Cook, D.; Swayne, D. F., Interactive and Dynamic Graphics for Data Analysis with R and GGobi (2007), Springer: Springer New York · Zbl 1154.62006
[6] Dasgupta, A.; Raftery, A. E., Detecting features in spatial point processes with clutter via model-based clustering, Journal of the American Statistical Association, 93, 294-302 (1998) · Zbl 0906.62105
[7] Dean, N.; Murphy, T. B.; Downey, G., Using unlabelled data to update classification rules with applications in food authenticity studies, Journal of the Royal Statistical Society. Series C, 55, 1, 1-14 (2006) · Zbl 1490.62155
[8] Dempster, A. P.; Laird, N. M.; Rubin, D. B., Maximum likelihood from incomplete data via the EM algorithm, Journal of the Royal Statistical Society. Series B, 39, 1, 1-38 (1977) · Zbl 0364.62022
[9] Forina, M.; Armanino, C.; Lanteri, S.; Tiscornia, E., Classification of olive oils from their fatty acid composition, (Martens, H.; Russwurm, H., Food Research and Data Analysis (1983), Applied Science Publishers: Applied Science Publishers London), 189-214
[10] Forina, M.; Tiscornia, E., Pattern recognition methods in the prediction of Italian olive oil origin by their fatty acid content, Annali di Chimica, 72, 143-155 (1982)
[11] Fraley, C.; Raftery, A. E., How many clusters? Which clustering methods? Answers via model-based cluster analysis, The Computer Journal, 41, 8, 578-588 (1998) · Zbl 0920.68038
[12] Fraley, C.; Raftery, A. E., Model-based clustering, discriminant analysis, and density estimation, Journal of the American Statistical Association, 97, 611-631 (2002) · Zbl 1073.62545
[13] Fraley, C.; Raftery, A. E., Enhanced software for model-based clustering, density estimation, and discriminant analysis: MCLUST, Journal of Classification, 20, 263-286 (2003) · Zbl 1055.62071
[14] Fraley, C., Raftery, A.E., 2006. MCLUST version 3 for R: Normal mixture modeling and model-based clustering. Technical Report 504, Department of Statistics, University of Washington. Minor revisions January 2007 and November 2007.; Fraley, C., Raftery, A.E., 2006. MCLUST version 3 for R: Normal mixture modeling and model-based clustering. Technical Report 504, Department of Statistics, University of Washington. Minor revisions January 2007 and November 2007.
[15] Frühwirth-Schnatter, S., Finite Mixture and Markov Switching Models (2006), Springer: Springer New York · Zbl 1108.62002
[16] Ghahramani, Z., Hinton, G.E., 1997. The EM algorithm for factor analyzers. Technical Report CRG-TR-96-1, University of Toronto.; Ghahramani, Z., Hinton, G.E., 1997. The EM algorithm for factor analyzers. Technical Report CRG-TR-96-1, University of Toronto.
[17] Hastie, T.; Tibshirani, R., Discriminant analysis by Gaussian mixtures, Journal of the Royal Statistical Society. Series B, 58, 1, 155-176 (1996) · Zbl 0850.62476
[18] Hubert, L.; Arabie, P., Comparing partitions, Journal of Classification, 2, 193-218 (1985)
[19] Kass, R. E.; Raftery, A. E., Bayes factors, Journal of the American Statistical Association, 90, 773-795 (1995) · Zbl 0846.62028
[20] Lindsay, B.G., 1995. Mixture models: theory, geometry and applications. In: NSF-CBMS Regional Conference Series in Probability and Statistics, vol. 5. Institute of Mathematical Statistics, Hayward, California.; Lindsay, B.G., 1995. Mixture models: theory, geometry and applications. In: NSF-CBMS Regional Conference Series in Probability and Statistics, vol. 5. Institute of Mathematical Statistics, Hayward, California. · Zbl 1163.62326
[21] Lopes, H. F.; West, M., Bayesian model assessment in factor analysis, Statistica Sinica, 14, 41-67 (2004) · Zbl 1035.62060
[22] McLachlan, G. J.; Basford, K. E., Mixture Models: Inference and Applications to Clustering (1988), Marcel Dekker Inc.: Marcel Dekker Inc. New York · Zbl 0697.62050
[23] McLachlan, G. J.; Peel, D., Mixtures of factor analyzers, (Langley, P., Seventh International Conference on Machine Learning (2000), Morgan Kaufmann: Morgan Kaufmann San Francisco), 599-606 · Zbl 1256.62036
[24] McLachlan, G. J.; Peel, D.; Bean, R. W., Modelling high-dimensional data by mixtures of factor analyzers, Computational Statistics and Data Analysis, 41, 3-4, 379-388 (2003) · Zbl 1256.62036
[25] McNicholas, P.D., Murphy, T.B., 2005. Parsimonious Gaussian mixture models. Technical Report 05/11, Department of Statistics, Trinity College Dublin.; McNicholas, P.D., Murphy, T.B., 2005. Parsimonious Gaussian mixture models. Technical Report 05/11, Department of Statistics, Trinity College Dublin.
[26] McNicholas, P. D.; Murphy, T. B., Parsimonious Gaussian mixture models, Statistics and Computing, 18, 3, 285-296 (2008)
[27] McNicholas, P. D.; Murphy, T. B.; McDaid, A. F.; Frost, D., Serial and parallel implementations of model-based clustering via parsimonious Gaussian mixture models, Computational Statistics and Data Analysis, 54, 711-723 (2010) · Zbl 1464.62131
[28] Meng, X. L.; Rubin, D. B., Maximum likelihood estimation via the ECM algorithm: a general framework, Biometrika, 80, 267-278 (1993) · Zbl 0778.62022
[29] Meng, X. L.; van Dyk, D., The EM algorithm—an old folk-song sung to a fast new tune (with discussion), Journal of the Royal Statistical Society. Series B, 59, 511-567 (1997) · Zbl 1090.62518
[30] Raftery, A. E.; Dean, N., Variable selection for model-based clustering, Journal of the American Statistical Association, 101, 473, 168-178 (2006) · Zbl 1118.62339
[31] Rand, W. M., Objective criteria for the evaluation of clustering methods, Journal of the American Statistical Association, 66, 846-850 (1971)
[32] Schwarz, G., Estimating the dimension of a model, Annals of Statistics, 6, 31-38 (1978)
[33] Spearman, C., The proof and measurement of association between two things, American Journal of Psychology, 15, 72-101 (1904)
[34] Swayne, D., Cook, D., Buja, A., Lang, D., Wickham, H., Lawrence, M., 2006. GGobi Manual. Sourced from \(\langle\) http://www.ggobi.org/docs/manual.pdf \(\rangle \).; Swayne, D., Cook, D., Buja, A., Lang, D., Wickham, H., Lawrence, M., 2006. GGobi Manual. Sourced from \(\langle\) http://www.ggobi.org/docs/manual.pdf \(\rangle \).
[35] Tipping, T. E.; Bishop, C. M., Mixtures of probabilistic principal component analysers, Neural Computation, 11, 2, 443-482 (1999)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.