×

More power via graph-structured tests for differential expression of gene networks. (English) Zbl 1243.62080

Summary: We consider multivariate two-sample tests of means, where the location shift between the two populations is expected to be related to a known graph structure. An important application of such tests is the detection of differentially expressed genes between two patient populations, as shifts in expression levels are expected to be coherent with the structure of graphs reflecting gene properties such as biological process, molecular function, regulation or metabolism. For a fixed graph of interest, we demonstrate that accounting for graph structure can yield more powerful tests under the assumption of smooth distribution shift on the graph. We also investigate the identification of nonhomogeneous subgraphs of a given large graph, which poses both computational and multiple hypothesis testing problems. The relevance and benefits of the proposed approach are illustrated on synthetic data and on breast and bladder cancer gene expression data analyzed in the context of KEGG and NCI pathways.

MSC:

62H15 Hypothesis testing in multivariate analysis
05C90 Applications of graph theory
92C42 Systems biology, networks
92C40 Biochemistry, molecular biology
62P10 Applications of statistics to biology and medical sciences; meta analysis
65C60 Computational problems in statistics (MSC2010)

Software:

NCIgraph; HotNet; GOstat; KEGG
PDFBibTeX XMLCite
Full Text: DOI arXiv Euclid

References:

[1] Andersson, J., Larsson, L., Klaar, S., Holmberg, L., Nilsson, J., Inganäs, M., Carlsson, G., Ohd, J., Rudenstam, C.-M., Gustavsson, B. and Bergh, J. (2005). Worse survival for TP53 (p53)-mutated breast cancer patients receiving adjuvant CMF. Ann. Oncol. 16 743-748.
[2] Bai, Z. and Saranadasa, H. (1996). Effect of high dimension: By an example of a two sample problem. Statist. Sinica 6 311-329. · Zbl 0848.62030
[3] Bakkar, A. A., Wallerand, H., Radvanyi, F., Lahaye, J.-B., Pissard, S., Lecerf, L., Kouyoumdjian, J. C., Abbou, C. C., Pairon, J.-C., Jaurand, M.-C., Thiery, J.-P., Chopin, D. K. and de Medina, S. G. D. (2003). FGFR3 and TP53 gene mutations define two distinct pathways in urothelial cell carcinoma of the bladder. Cancer Res. 63 8108-8112.
[4] Barnes, D. M. (1997). Cyclin D1 in mammary carcinoma. J. Pathol. 181 267-269.
[5] Beissbarth, T. and Speed, T. P. (2004). GOstat: Find statistically overrepresented Gene Ontologies within a group of genes. Bioinformatics 20 1464-1465.
[6] Benjamini, Y. and Hochberg, Y. (1995). Controlling the false discovery rate: A practical and powerful approach to multiple testing. J. Roy. Statist. Soc. Ser. B 57 289-300. · Zbl 0809.62014
[7] Chen, S. X. and Qin, Y.-L. (2010). A two-sample test for high-dimensional data with applications to gene-set testing. Ann. Statist. 38 808-835. · Zbl 1183.62095 · doi:10.1214/09-AOS716
[8] Cunningham, J. M., Vierkant, R. A., Sellers, T. A., Phelan, C., Rider, D. N., Liebow, M., Schildkraut, J., Berchuck, A., Couch, F. J., Wang, X., Fridley, B. L., Ovarian Cancer Association Consortium, Gentry-Maharaj, A., Menon, U., Hogdall, E., Kjaer, S., Whittemore, A., DiCioccio, R., Song, H., Gayther, S. A., Ramus, S. J., Pharaoh, P. D. P. and Goode, E. L. (2009). Cell cycle genes and ovarian cancer susceptibility: A tagSNP analysis. Br. J. Cancer 101 1461-1468.
[9] Das Gupta, S. and Perlman, M. D. (1974). Power of the noncentral \(F\) test: Effect of additional variates on Hotelling’s \(T^{2}\)-test. J. Amer. Statist. Assoc. 69 174-180. · Zbl 0285.62027 · doi:10.2307/2285519
[10] Davis, C. and Kahan, W. M. (1969). Some new bounds on perturbation of subspaces. Bull. Amer. Math. Soc. 75 863-868. · Zbl 0175.43204 · doi:10.1090/S0002-9904-1969-12330-X
[11] Dudoit, S. and van der Laan, M. J. (2008). Multiple Testing Procedures with Applications to Genomics . Springer, New York. · Zbl 1261.62014
[12] Dunn, S. E., Kari, F. W., French, J., Leininger, J. R., Travlos, G., Wilson, R. and Barrett, J. C. (1997). Dietary restriction reduces insulin-like growth factor I levels, which modulates apoptosis, cell proliferation, and tumor progression in p53-deficient mice. Cancer Res. 57 4667-4672.
[13] Ein-Dor, L., Kela, I., Getz, G., Givol, D. and Domany, E. (2005). Outcome signature genes in breast cancer: Is there a unique set? Bioinformatics 21 171-178.
[14] Eswarakumar, V. P., Lax, I. and Schlessinger, J. (2005). Cellular signaling by fibroblast growth factor receptors. Cytokine Growth Factor Rev. 16 139-149.
[15] Evans, L. C. (1998). Partial Differential Equations. Graduate Studies in Mathematics 19 . Amer. Math. Soc., Providence, RI. · Zbl 0902.35002
[16] Fan, J. and Lin, S.-K. (1998). Test of significance when data are curves. J. Amer. Statist. Assoc. 93 1007-1021. · Zbl 1064.62525 · doi:10.2307/2669845
[17] Fernandez-Cuesta, L., Anaganti, S., Hainaut, P. and Olivier, M. (2010). p53 status influences response to tamoxifen but not to fulvestrant in breast cancer cell lines. Int. J. Cancer 128 1813-1821.
[18] Goeman, J. J. and Bühlmann, P. (2007). Analyzing gene expression data in terms of gene sets: Methodological issues. Bioinformatics 23 980-987.
[19] Goldberg, A. B. (2007). Dissimilarity in graph-based semisupervised classification. In Eleventh International Conference on Artificial Intelligence and Statistics ( AISTATS ).
[20] Gutierrez, M. C., Detre, S., Johnston, S., Mohsin, S. K., Shou, J., Allred, D. C., Schiff, R., Osborne, C. K. and Dowsett, M. (2005). Molecular changes in tamoxifen-resistant breast cancer: Relationship between estrogen receptor, HER-2, and p38 mitogen-activated protein kinase. J. Clin. Oncol. 23 2469-2476.
[21] Hammond, D. K., Vandergheynst, P. and Gribonval, R. (2009). Wavelets on graphs via spectral graph theory. Available at . 0912.3848 · Zbl 1213.42091 · doi:10.1016/j.acha.2010.04.005
[22] Haury, A. C., Gestraud, P. and Vert, J. P. (2011). The influence of feature selection methods on accuracy, stability and interpretability of molecular signatures. Preprint. Available at . 1101.5008
[23] Haury, A. C., Jacob, L. and Vert, J. P. (2010). Increasing stability and interpretability of gene expression signatures. ArXiv E-prints.
[24] He, Z. and Yu, W. (2010). Stable feature selection for biomarker discovery. Available at . 1001.0887 · Zbl 1403.92068
[25] Hernández, S., López-Knowles, E., Lloreta, J., Kogevinas, M., Jaramillo, R., Amorós, A., Tardón, A., García-Closas, R., Serra, C., Carrato, A., Malats, N. and Real, F. X. (2005). FGFR3 and Tp53 mutations in T1G3 transitional bladder carcinomas: Independent distribution and lack of association with prognosis. Clin. Cancer Res. 11 5444-5450.
[26] Herynk, M. H., Beyer, A. R., Cui, Y., Weiss, H., Anderson, E., Green, T. P. and Fuqua, S. A. W. (2006). Cooperative action of tamoxifen and c-Src inhibition in preventing the growth of estrogen receptor-positive human breast cancer cells. Mol. Cancer Ther. 5 3023-3031.
[27] Hung, T.-T., Wang, H., Kingsley, E. A., Risbridger, G. P. and Russell, P. J. (2008). Molecular profiling of bladder cancer: Involvement of the TGF-beta pathway in bladder cancer progression. Cancer Lett. 265 27-38.
[28] Ideker, T., Ozier, O., Schwikowski, B. and Siegel, A. F. (2002). Discovering regulatory and signalling circuits in molecular interaction networks. In ISMB 233-240.
[29] Ipsen, I. C. F. (2010). The eigenproblem and invariant subspaces: Perturbation theory. In G. W. Stewart : Selected Works with Commentaries (M. E. Kilmer and D. P. O’Leary, eds.) 71-93. Birkhäuser, Basel.
[30] Jacob, L. (2011). NCIgraph: Pathways from the NCI Pathways Database R package version 1.0.0.
[31] Jacob, L., Neuvial, P. and Dudoit, S. (2011a). Supplement A to “More power via graph-structured tests for differential expression of gene networks.” . · Zbl 1243.62080
[32] Jacob, L., Neuvial, P. and Dudoit, S. (2011b). Supplement B to “More power via graph-structured tests for differential expression of gene networks.” . · Zbl 1243.62080
[33] Jacob, L., Neuvial, P. and Dudoit, S. (2011c). Supplement C to “More power via graph-structured tests for differential expression of gene networks.” . · Zbl 1243.62080
[34] Jacob, L., Obozinski, G. and Vert, J.-P. (2009). Group lasso with overlap and graph lasso. In ICML’ 09: Proceedings of the 26 th Annual International Conference on Machine Learning 433-440. ACM, New York.
[35] Jenatton, R., Audibert, J. Y. and Bach, F. (2009). Structured variable selection with sparsity-inducing norms. Research report, WILLOW-INRIA. · Zbl 1280.68170
[36] Johnson, N., Bentley, J., Wang, L.-Z., Newell, D. R., Robson, C. N., Shapiro, G. I. and Curtin, N. J. (2010). Pre-clinical evaluation of cyclin-dependent kinase 2 and 1 inhibition in anti-estrogen-sensitive and resistant breast cancer cells. Br. J. Cancer 102 342-350.
[37] Knowles, M. A. (2006). Molecular subtypes of bladder cancer: Jekyll and Hyde or chalk and cheese? Carcinogenesis 27 361-373.
[38] Land, A. H. and Doig, A. G. (1960). An automatic method of solving discrete programming problems. Econometrica 28 497-520. · Zbl 0101.37004 · doi:10.2307/1910129
[39] Levidou, G., Saetta, A. A., Karlou, M., Thymara, I., Pratsinis, H., Pavlopoulos, P., Isaiadis, D., Diamantopoulou, K., Patsouris, E. and Korkolopoulou, P. (2010). D-type cyclins in superficial and muscle-invasive bladder urothelial carcinoma: Correlation with clinicopathological data and prognostic significance. J. Cancer Res. Clin. Oncol. 136 1563-1571.
[40] Loi, S., Haibe-Kains, B., Desmedt, C., Wirapati, P., Lallemand, F., Tutt, A. M., Gillet, C., Ellis, P., Ryder, K., Reid, J. F. et al. (2008). Predicting prognosis using molecular profiling in estrogen receptor-positive breast cancer treated with tamoxifen. BMC Genomics 9 239.
[41] Lönnstedt, I. and Speed, T. (2002). Replicated microarray data. Statist. Sinica 12 31-46. · Zbl 1004.62086
[42] Lopes, M. E., Jacob, L. and Wainwright, M. J. (2011). A more powerful two-sample test in high dimensions using random projection. Technical report. Available at . 1108.2401
[43] Louie, M. C., McClellan, A., Siewit, C. and Kawabata, L. (2010). Estrogen receptor regulates E2F1 expression to mediate tamoxifen resistance. Mol. Cancer Res. 8 343-352.
[44] Lu, Y., Liu, P.-Y., Xiao, P. and Deng, H.-W. (2005). Hotelling’s \(T^{2}\) multivariate profiling for detecting differential expression in microarrays. Bioinformatics 21 3105-3113.
[45] Lücke, C. D., Philpott, A., Metcalfe, J. C., Thompson, A. M., Hughes-Davies, L., Kemp, P. R. and Hesketh, R. (2001). Inhibiting mutations in the transforming growth factor beta type 2 receptor in recurrent human breast cancer. Cancer Res. 61 482-485.
[46] Ma, S. and Kosorok, M. R. (2009). Identification of differential gene pathways with principal component analysis. Bioinformatics 25 882-889.
[47] Man, Y.-G. (2010). Aberrant leukocyte infiltration: A direct trigger for breast tumor invasion and metastasis. Int. J. Biol. Sci. 6 129-132.
[48] McGlynn, L. M., Kirkegaard, T., Edwards, J., Tovey, S., Cameron, D., Twelves, C., Bartlett, J. M. S. and Cooke, T. G. (2009). Ras/Raf-1/MAPK pathway mediates response to tamoxifen but not chemotherapy in breast cancer patients. Clin. Cancer Res. 15 1487-1495.
[49] Mellon, J. K., Lunec, J., Wright, C., Horne, C. H., Kelly, P. and Neal, D. E. (1996). C-erbB-2 in bladder cancer: Molecular biology, correlation with epidermal growth factor receptors and prognostic value. J. Urol. 155 321-326.
[50] Mitra, A. P., Pagliarulo, V., Yang, D., Waldman, F. M., Datar, R. H., Skinner, D. G., Groshen, S. and Cote, R. J. (2009). Generation of a concise gene panel for outcome prediction in urinary bladder cancer. J. Clin. Oncol. 27 3929-3937.
[51] Musgrove, E. A. and Sutherland, R. L. (2009). Biological determinants of endocrine resistance in breast cancer. Nat. Rev. Cancer 9 631-643.
[52] Nacu, S., Critchley-Thorne, R., Lee, P. and Holmes, S. (2007). Gene expression network analysis and applications to immunology. Bioinformatics 23 850.
[53] Obozinski, G., Jacob, L. and Vert, J. P. (2011). Group Lasso with overlaps: The latent group Lasso approach. Technical report. arXiv.
[54] Ohtake, F., Baba, A., Takada, I., Okada, M., Iwasaki, K., Miki, H., Takahashi, S., Kouzmenko, A., Nohara, K., Chiba, T., Fujii-Kuriyama, Y. and Kato, S. (2007). Dioxin receptor is a ligand-dependent E3 ubiquitin ligase. Nature 446 562-566.
[55] Paik, S., Shak, S., Tang, G., Kim, C., Baker, J., Cronin, M., Baehner, F. L., Walker, M. G., Watson, D., Park, T., Hiller, W., Fisher, E. R., Wickerham, D. L., Bryant, J. and Wolmark, N. (2004). A multigene assay to predict recurrence of tamoxifen-treated, node-negative breast cancer. N. Engl. J. Med. 351 2817-2826.
[56] Perou, C. M., Sørlie, T., Eisen, M. B., van de Rijn, M., Jeffrey, S. S., Rees, C. A., Pollack, J. R., Ross, D. T., Johnsen, H., Akslen, L. A., Fluge, O., Pergamenschikov, A., Williams, C., Zhu, S. X., Lønning, P. E., Børresen-Dale, A. L., Brown, P. O. and Botstein, D. (2000). Molecular portraits of human breast tumours. Nature 406 747-752.
[57] Rakha, E. A., Boyce, R. W. G., El-Rehim, D. A., Kurien, T., Green, A. R., Paish, E. C., Robertson, J. F. R. and Ellis, I. O. (2005). Expression of mucins (MUC1, MUC2, MUC3, MUC4, MUC5AC and MUC6) and their prognostic significance in human breast cancer. Mod. Pathol. 18 1295-1304.
[58] Rapaport, F., Zinovyev, A., Dutreix, M., Barillot, E. and Vert, J.-P. (2007). Classification of microarray data using gene networks. BMC Bioinformatics 8 35.
[59] Roy, D., Sarkar, S. and Felty, Q. (2006). Levels of IL-1 beta control stimulatory/inhibitory growth of cancer cells. Front. Biosci. 11 889-898.
[60] Sanchez-Carbayo, M., Socci, N. D., Lozano, J., Saint, F. and Cordon-Cardo, C. (2006). Defining molecular profiles of poor outcome in patients with invasive bladder cancer using oligonucleotide microarrays. J. Clin. Oncol. 24 778-789.
[61] Sandler, T., Blitzer, J., Talukdar, P. and Pereira, F. (2009). Regularized learning with networks of features. In Neural Information Processing Systems . MIT Press, Cambridge, MA.
[62] Spruck, C. H., Ohneseit, P. F., Gonzalez-Zulueta, M., Esrig, D., Miyao, N., Tsai, Y. C., Lerner, S. P., Schmütte, C., Yang, A. S. and Cote, R. (1994). Two molecular pathways to transitional cell carcinoma of the bladder. Cancer Res. 54 784-788.
[63] Srinivasan, S., Zafar, S., Nawaz, Z. and Loggie, B. W. (2007). Transcriptional regulation of MUC2 by estrogen. 2007 Gastrointestinal Cancers Symposium.
[64] Srivastava, M. S. (2009). A test for the mean vector with fewer observations than the dimension under nonnormality. J. Multivariate Anal. 100 518-532. · Zbl 1154.62046 · doi:10.1016/j.jmva.2008.06.006
[65] Srivastava, M. S. and Du, M. (2008). A test for the mean vector with fewer observations than the dimension. J. Multivariate Anal. 99 386-402. · Zbl 1148.62042 · doi:10.1016/j.jmva.2006.11.002
[66] Stewart, G. W. and Sun, J. G. (1990). Matrix Perturbation Theory . Academic Press, Boston, MA. · Zbl 0706.65013
[67] Stransky, N., Vallot, C., Reyal, F., Bernard-Pierrot, I., Diez de Medina, S. G., Segraves, R., de Rycke, Y., Elvin, P., Cassidy, A., Spraggon, C., Graham, A., Southgate, J., Asselain, B., Allory, Y., Abbou, C. C., Albertson, D. G., Thiery, J. P., Chopin, D. K., Pinkel, D. and Radvanyi, F. (2006). Regional copy number-independent deregulation of transcription in cancer. Nat. Genet. 38 1386-1396.
[68] Subramanian, A., Tamayo, P., Mootha, V. K., Mukherjee, S., Ebert, B. L., Gillette, M. A., Paulovich, A., Pomeroy, S. L., Golub, T. R., Lander, E. S. and Mesirov, J. P. (2005). Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles. Proc. Natl. Acad. Sci. USA 102 15545-15550.
[69] Sutherland, R. L. and Musgrove, E. A. (2009). CDK inhibitors as potential breast cancer therapeutics: New evidence for enhanced efficacy in ER+ disease. Breast Cancer Res. 11 112.
[70] Tai, Y. C. and Speed, T. P. (2009). On gene ranking using replicated microarray time course data. Biometrics 65 40-51. · Zbl 1159.62081 · doi:10.1111/j.1541-0420.2008.01057.x
[71] Turner, N., Pearson, A., Sharpe, R., Lambros, M., Geyer, F., Lopez-Garcia, M. A., Natrajan, R., Marchio, C., Iorns, E., Mackay, A., Gillett, C., Grigoriadis, A., Tutt, A., Reis-Filho, J. S. and Ashworth, A. (2010). FGFR1 amplification drives endocrine therapy resistance and is a therapeutic target in breast cancer. Cancer Res. 70 2085-2094.
[72] van Rhijn, B. W. G., van der Kwast, T. H., Vis, A. N., Kirkels, W. J., Boevé, E. R., Jöbsis, A. C. and Zwarthoff, E. C. (2004). FGFR3 and P53 characterize alternative genetic pathways in the pathogenesis of urothelial cell carcinoma. Cancer Res. 64 1911-1914.
[73] Vandin, F., Upfal, E. and Raphael, B. J. (2010). Algorithms for detecting significantly mutated pathways in cancer. In RECOMB (B. Berger, ed.). Lecture Notes in Computer Science 6044 506-521. Springer, Berlin. · doi:10.1089/cmb.2010.0265
[74] Vaske, C., Benz, S., Sanborn, Z., Earl, D., Szeto, C., Zhu, J., Haussler, D. and Stuart, J. (2010). Inference of patient-specific pathway activities from multi-dimensional cancer genomics data using PARADIGM. In ISMB .
[75] Walsh, M. D., McGuckin, M. A., Devine, P. L., Hohn, B. G. and Wright, R. G. (1993). Expression of MUC2 epithelial mucin in breast carcinoma. J. Clin. Pathol. 46 922-925.
[76] Wu, W., Pew, T., Zou, M., Pang, D. and Conzen, S. D. (2005). Glucocorticoid receptor-induced MAPK phosphatase-1 (MPK-1) expression inhibits paclitaxel-associated MAPK activation and contributes to breast cancer cell survival. J. Biol. Chem. 280 4117-4124.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.