×

A moment-distance hybrid method for estimating a mixture of two symmetric densities. (English) Zbl 1392.62062

Summary: In clustering of high-dimensional data a variable selection is commonly applied to obtain an accurate grouping of the samples. For two-class problems this selection may be carried out by fitting a mixture distribution to each variable. We propose a hybrid method for estimating a parametric mixture of two symmetric densities. The estimator combines the method of moments with the minimum distance approach. An evaluation study including both extensive simulations and gene expression data from acute leukemia patients shows that the hybrid method outperforms a maximum-likelihood estimator in model-based clustering. The hybrid estimator is flexible and performs well also under imprecise model assumptions, suggesting that it is robust and suited for real problems.

MSC:

62F07 Statistical ranking and selection procedures
62F10 Point estimation
62F35 Robustness and adaptive procedures (parametric inference)
62P10 Applications of statistics to biology and medical sciences; meta analysis
92D10 Genetics and epigenetics
PDFBibTeX XMLCite
Full Text: DOI

References:

[1] Benaglia, T.; Chauveau, D.; Hunter, D.R.; Young, D., Mixtools: an R package for analyzing finite mixture models, Journal of Statistical Software, 32, 6, 1-29, (2009)
[2] Bordes, L.; Mottelet, S.; Vandekerkhove, P., Semiparametric estimation of a two-component mixture model, The Annals of Statistics, 34, 3, 1204-1232, (2006) · Zbl 1112.62029
[3] Brouwer, R.K., Extending the rand, adjusted rand and jaccard indices to fuzzy partitions, Journal of Intelligent Information Systems, 32, 3, 213-235, (2009)
[4] Celeux, G.; Chauveau, D.; Diebolt, J., Stochastic versions of the EM algorithm: an experimental study in the mixture case, Journal of Statistical Computation and Simulation, 55, 4, 287-314, (1996) · Zbl 0907.62024
[5] Clarke, B.; Heathcote, C., Robust estimation of k-component univariate normal mixtures, Annals of the Institute of Statistical Mathematics, 46, 1, 83-93, (1994) · Zbl 0802.62039
[6] Cutler, A.; Cordero-Braña, O.I., Minimum Hellinger distance estimation for finite mixture models, Journal of the American Statistical Association, 91, 436, 1716-1723, (1996) · Zbl 0881.62035
[7] Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the em algorithm. Journal of the Royal Statistical Society. Series B (Methodological), 1-38 (1977) · Zbl 0364.62022
[8] Dudoit, S.; Fridlyand, J.; Speed, T.P., Comparison of discrimination methods for the classification of tumors using gene expression data, Journal of the American Statistical Association, 97, 457, 77-87, (2002) · Zbl 1073.62576
[9] Fan, J.; Lv, J., A selective overview of variable selection in high dimensional feature space, Statistica Sinica, 20, 1, 101, (2010) · Zbl 1180.62080
[10] Freyhult, E.; Landfors, M.; Önskog, J.; Hvidsten, T.R.; Rydén, P., Challenges in microarray class discovery: a comprehensive examination of normalization, gene selection and clustering, BMC Bioinformatics, 11, 1, 503, (2010)
[11] Fujisawa, H.; Eguchi, S., Robust estimation in the normal mixture model, Journal of Statistical Planning and Inference, 136, 11, 3989-4011, (2006) · Zbl 1104.62017
[12] Gleason, J.R., Understanding elongation: the scale contaminated normal family, Journal of the American Statistical Association, 88, 421, 327-337, (1993) · Zbl 0775.62080
[13] Golub, T.R.; Slonim, D.K.; Tamayo, P.; Huard, C.; Gaasenbeek, M.; Mesirov, J.P.; Coller, H.; Loh, M.L.; Downing, J.R.; Caligiuri, M.A., Molecular classification of cancer: class discovery and class prediction by gene expression monitoring, Science, 286, 5439, 531-537, (1999)
[14] Hastie, T.; Tibshirani, R.; Friedman, J.; Franklin, J., The elements of statistical learning: data mining, inference and prediction, The Mathematical Intelligencer, 27, 2, 83-85, (2005)
[15] Hathaway, R.J.: A constrained formulation of maximum-likelihood estimation for normal mixture distributions. The Annals of Statistics, 795-800 (1985) · Zbl 0576.62039
[16] Hodge, V.J.; Austin, J., A survey of outlier detection methodologies, Artificial Intelligence Review, 22, 2, 85-126, (2004) · Zbl 1101.68023
[17] Hunter, D.R., Wang, S., Hettmansperger, T.P.: Inference for mixtures of symmetric distributions. The Annals of Statistics, 224-251 (2007) · Zbl 1114.62035
[18] Ju, J.; Kolaczyk, E.D.; Gopal, S., Gaussian mixture discriminant analysis and sub-pixel land cover characterization in remote sensing, Remote Sensing of Environment, 84, 4, 550-560, (2003)
[19] McLachlan, G.; Peel, D., Finite Mixture Models, (2004), John Wiley & Sons
[20] McLachlan, G.J., Basford, K.E.: Mixture models: Inference and applications to clustering. Applied Statistics (1988) · Zbl 0697.62050
[21] Nelder, J.A.; Mead, R., A simplex method for function minimization, The Computer Journal, 7, 4, 308-313, (1965) · Zbl 0229.65053
[22] Pearson, K.: Contributions to the mathematical theory of evolution. Philosophical Transactions of the Royal Society of London. A, 71-110 (1894) · JFM 25.0347.02
[23] R Core Team, R: A Language and Environment for Statistical Computing, (2017), R Foundation for Statistical Computing, Vienna, Austria
[24] Schlattmann, P.; Böhning, D., Mixture models and disease mapping, Statistics in Medicine, 12, 19-20, 1943-1950, (1993)
[25] Sfikas, G.; Nikou, C.; Galatsanos, N., IEEE International Conference on Image Processing, 2007. ICIP 2007, 1, Robust image segmentation with mixtures of student’s t-distributions, 273, (2007), IEEE
[26] Titterington, D.; Smith, A.; Makov, U., Statistical Analysis of Finite Mixture Models, (1985), Wiley, Chichester, UK · Zbl 0646.62013
[27] Wolf, D.M.; Lenburg, M.E.; Yau, C.; Boudreau, A.; van ’t Veer, L.J., Gene co-expression modules as clinically relevant hallmarks of breast cancer diversity, PloS ONE, 9, 2, 88309, (2014)
[28] Woodward, W.A.; Parr, W.C.; Schucany, W.R.; Lindsey, H., A comparison of minimum distance and maximum likelihood estimation of a mixture proportion, Journal of the American Statistical Association, 79, 387, 590-598, (1984) · Zbl 0547.62017
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.