×

Multiple hypothesis testing and clustering with mixtures of non-central \(t\)-distributions applied in microarray data analysis. (English) Zbl 1242.62079

Summary: Multiple testing analysis and clustering methodologies are usually applied in microarray data analysis. A combination of both methods to deal with multiple comparisons among groups obtained from microarray expressions of genes is proposed. Assuming normal data, a statistic which depends on sample means and sample variances, distributed as a non-central \(t\)-distribution is defined. As multiple comparisons among groups are considered, a mixture of non-central \(t\)-distributions is derived. The estimation of the components of mixtures is obtained via a Bayesian approach, and the model is applied in a multiple comparison problem from a microarray experiment obtained from gorilla, bonobo and human cultured fibroblasts.

MSC:

62J15 Paired and multiple comparisons; multiple testing
62P10 Applications of statistics to biology and medical sciences; meta analysis
92C40 Biochemistry, molecular biology
65C40 Numerical analysis or methods applied to Markov chains
62F15 Bayesian inference

Software:

MADE4; JAGS; Bioconductor
PDFBibTeX XMLCite
Full Text: DOI

References:

[1] Celeux, G.; Forbes, F.; Robert, C. P.; Titterington, D. M., Deviance information criteria for missing data models, Bayesian Analysis, 1, 4, 651-674 (2006) · Zbl 1331.62329
[2] Choy, S. T.B.; Smith, A. E.M., Hierarchical models with scale mixtures of normal distributions, TEST, 6, 1, 205-221 (1997) · Zbl 0891.62016
[3] Culhane, A. C.; Thioulouse, J.; Perriere, G.; Higgins, D. G., MADE4: an \(R\) package for multivariate analysis of gene expression data, Bioinformatics, 21, 11, 2789-2790 (2005)
[4] Dahl, D. B.; Mo, Q.; Vannucci, M., Simultaneous inference for multiple testing and clustering via a Dirichlet process mixture model, Statistical Modelling, 8, 1, 23-39 (2008) · Zbl 07257860
[5] Dahl, D. B.; Newton, M. A., Multiple hypothesis testing by clustering treatment effects, Journal of the American Statistical Association, 102, 478, 517-526 (2007) · Zbl 1172.62316
[6] Dudoit, S.; Shaffer, J. P.; Boldrick, J. C., Multiple hypothesis testing in microarray experiments, Statistical Science, 18, 1, 71-103 (2003) · Zbl 1048.62099
[7] Gentleman, R.; Carey, V.; Bates, D.; Bolstad, B.; Dettling, M.; Dudoit, S.; Ellis, B.; Gautier, L.; Ge, Y.; Gentry, J.; Hornik, K.; Hothorn, T.; Huber, W.; Iacus, S.; Irizarryand, R.; Leisch, F.; Li, C.; Maechler, M.; Rossiniand, A.; Sawitzki, G.; Smith, C.; Smyth, G.; Tierneyand, L.; Yang, J.; Zhang, J., Bioconductor: open software development for computational biology and bioinformatics, Genome Biology, 5, R80 (2004)
[8] Gordon, A.; Chen, L.; Glazko, G.; Yakovlev, A., Balancing type one and two errors in multiple testing for differential expression of genes, Computational Statistics & Data Analysis, 53, 5, 1622-1629 (2009) · Zbl 1298.62188
[9] Green, P., Reversible jump MCMC computation and Bayesian model determination, Biometrika, 82, 711-732 (1995) · Zbl 0861.62023
[10] Hoshino, T., Bayesian significance testing and multiple comparisons from MCMC outputs, Computational Statistics & Data Analysis, 52, 7, 3543-3559 (2008) · Zbl 1452.62556
[11] Ji, Y.; Lu, Y.; Mills, G. B., Bayesian models based on test statistics for multiple hypothesis testing problems, Bioinformatics, 24, 7, 943-949 (2008)
[12] Johnson, N. L.; Kotz, S.; Balakrishnan, N., Continuous Univariate Distributions Volume 2 (1995), Wiley & Sons: Wiley & Sons New York · Zbl 0821.62001
[13] Karaman, M. W.; Houck, M. L.; Chemnick, L. G.; Nagpal, S.; Chawannakul, D.; Sudano, D.; Pike, B. L.; Ho, V. V.; Ryder, O. A.; Hacia, J. G., Comparative analysis of gene-expression patterns in human and African great ape cultured fibroblasts, Genome Research, 13, 1619-1630 (2003)
[14] Liu, X.; Lee, S.-C.; Casella, G.; Peter, G. F., Assessing agreement of clustering methods with gene expression microarray data, Computational Statistics & Data Analysis, 52, 12, 5356-5366 (2008) · Zbl 1452.62825
[15] Marin, J. M.; Robert, C. P., Bayesian Core: A Practical Approach to Computational Bayesian Statistics (2007), Springer: Springer New York · Zbl 1137.62013
[16] McLachlan, G. J.; Bean, R. W.; Ben-Tovim, J. L., A simple implementation of a normal mixture approach to differential gene expression in multiclass microarrays, Bioinformatics, 22, 1608-1615 (2006)
[17] McLachlan, G. J.; Bean, R. W.; Peel, D., A mixture model-based approach to the clustering of microarray expression data, Bioinformatics, 18, 3, 413-422 (2002)
[18] Medvedovic, M.; Sivaganesan, S., Bayesian infinite mixture model based clustering of gene expression profiles, Bioinformatics, 18, 9, 1194-1206 (2002)
[19] Newton, M. A.; Noueiry, A.; Sarkar, D.; Ahlquist, P., Detecting differential gene expression with a semiparametric hierarchical mixture method, Biostatistics, 5, 2, 155-176 (2004) · Zbl 1096.62124
[20] Ng, S. K.; McLachlan, G. J.; Wang, K.; Ben-Tovim, J. L.; Ng, S. W., A mixture model with random-effects components for clustering correlated gene-expression profiles, Bioinformatics, 22, 1745-1752 (2006)
[21] Plummer, M., 2003. JAGS: a program for analysis of Bayesian graphical models using Gibbs sampling. DSC 2003 Working Papers. http://www-fis.iarc.fr/ martyn/software/jags/; Plummer, M., 2003. JAGS: a program for analysis of Bayesian graphical models using Gibbs sampling. DSC 2003 Working Papers. http://www-fis.iarc.fr/ martyn/software/jags/
[22] Richardson, S., Discussion of Spiegelhalter et al., Journal of the Royal Statistical Society: Series B, 631 (2002)
[23] Richardson, S.; Green, P., On Bayesian analysis of mixtures with an unknown number of components, Journal of the Royal Statistical Society: Series B, 59, 731-792 (1997) · Zbl 0891.62020
[24] Tsionas, E. G., Bayesian inference in the noncentral student-\(t\) model, Journal of Computational and Graphical Statistics, 11, 1, 208-221 (2002)
[25] van der Laan, M.; Pollard, K., A new algorithm for hybrid hierarchical clustering with visualization and the bootstrap, Journal of Statistical Planning and Inference, 117, 275-303 (2003) · Zbl 1021.62048
[26] Wu, H.-M., On biological validity indices for soft clustering algorithms for gene expression data, Computational Statistics & Data Analysis, 55, 5, 1969-1979 (2011) · Zbl 1328.62392
[27] Yuan, M.; Kendziorski, C., A unified approach for simultaneous gene clustering and differential expression identification, Biometrics, 62, 1089-1098 (2006) · Zbl 1114.62130
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.