×

Regularized multivariate regression for identifying master predictors with application to integrative genomics study of breast cancer. (English) Zbl 1189.62174

Summary: We propose a new method remMap – REgularized multivariate regression for identifying MAster Predictors – for fitting multivariate response regression models under a high-dimension-low-sample-size setting. remMap is motivated by investigating the regulatory relationships among different biological molecules based on multiple types of high dimensional genomic data. Particularly, we are interested in studying the influence of DNA copy number alterations on RNA transcript levels. For this purpose, we model the dependence of the RNA expression levels on DNA copy numbers through multivariate linear regressions and utilize proper regularization to deal with the high dimensionality as well as to incorporate the desired network structures. Criteria for selecting the tuning parameters are also discussed.
The performance of the proposed method is illustrated through extensive simulation studies. Finally, remMap is applied to a breast cancer study, in which genome wide RNA transcript levels and DNA copy numbers were measured for 172 tumor samples. We identify a trans-hub region in cytoband 17q12-q21, whose amplification influences the RNA expression levels of more than 30 unlinked genes. These findings may lead to a better understanding of breast cancer pathology.

MSC:

62P10 Applications of statistics to biology and medical sciences; meta analysis
62J05 Linear regression; mixed models
92C40 Biochemistry, molecular biology
62H99 Multivariate analysis
65C60 Computational problems in statistics (MSC2010)
92C50 Medical applications (general)

Software:

glmnet
PDFBibTeX XMLCite
Full Text: DOI arXiv

References:

[1] Albertson, D. G., Collins, C., McCormick, F. and Gray, J. W. (2003). Chromosome aberrations in solid tumors. Nature Genetics 34 369-376.
[2] Antoniadis, A. and Fan, J. (2001). Regularization of wavelet approximations. J. Amer. Statist. Assoc. 96 939-967. · Zbl 1072.62561 · doi:10.1198/016214501753208942
[3] Bai, T. and Luoh, S. W. (2008). GRB-7 facilitates HER-2/Neu-mediated signal transduction and tumor formation. Carcinogenesis 29 473-479.
[4] Bakin, S. (1999). Adaptive regression and model selection in data mining problems. Ph.D. thesis, Australian National Univ., Canberra.
[5] Bedrick, E. and Tsai, C. (1994). Model selection for multivariate regression in small samples. Biometrics 50 226-231. · Zbl 0825.62564 · doi:10.2307/2533213
[6] Bergamaschi, A., Kim, Y. H., Wang, P., Sorlie, T., Hernandez-Boussard, T., Lonning, P. E., Tibshirani, R., Borresen-Dale, A. L. and Pollack, J. R. (2006). Distinct patterns of DNA copy number alteration are associated with different clinicopathological features and gene-expression subtypes of breast cancer. Genes Chromosomes Cancer 45 1033-1040.
[7] Bergamaschi, A., Kim, Y. H., Kwei, K. A., Choi, Y. L., Bocanegra, M., Langerod, A., Han, W., Noh, D. Y., Huntsman, D. G., Jeffrey, S. S., Borresen-Dale, A. L. and Pollack, J. R. (2008). CAMK1D amplification implicated in epithelial-mesenchymal transition in basal-like breast cancer. Mol. Oncol. 2 327-339.
[8] Breiman, L. and Friedman, J. H. (1997). Predicting multivariate responses in multiple linear regression (with discussion). J. Roy Statist. Soc. Ser. B 59 3-54. · Zbl 0897.62068 · doi:10.1111/1467-9868.00054
[9] Brown, P., Fearn, T. and Vannucci, M. (1999). The choice of variables in multivariate regression: A non-conjugate Bayesian decision theory approach. Biometrika 86 635-648. · Zbl 1072.62510 · doi:10.1093/biomet/86.3.635
[10] Brown, P., Vannucci, M. and Fearn, T. (1998). Multivariate Bayesian variable selection and prediction. J. Roy. Statist. Soc. Ser. B 60 627-641. · Zbl 0909.62022 · doi:10.1111/1467-9868.00144
[11] Brown, P., Vannucci, M. and Fearn, T. (2002). Bayes model averaging with selection of regressors. J. Roy. Statist. Soc. Ser. B 64 519-536. · Zbl 1073.62004 · doi:10.1111/1467-9868.00348
[12] Chang, H. Y., Sneddon, J. B., Alizadeh, A. A., Sood, R., West, R. B., Montgomery, K., Chi, J. T., van de Rijn, M., Botstein, D. and Brown, P. O. (2004). Gene expression signature of fibroblast serum response predicts human cancer progression: Similarities between tumors and wounds. PLoS Biol. 2 .
[13] Efron, B., Hastie, T., Johnstone, I. and Tibshirani, R. (2004). Least angle regression. Ann. Statist. 32 407-499. · Zbl 1091.62054 · doi:10.1214/009053604000000067
[14] Frank, I. and Friedman, J. (1993). A statistical view of some chemometrics regression tools (with discussion). Technometrics 35 109-148. · Zbl 0775.62288 · doi:10.2307/1269656
[15] Fu, W. (1998). Penalized regressions: The bridge vs the lasso. J. Comput. Graph. Statist. 7 397-416.
[16] Friedman, J., Hastie, T. and Tibshirani, R. (2008). Regularized paths for generalized linear models via coordinate descent. Technical report, Dept. Statistics, Stanford Univ.
[17] Friedman, J., Hastie, T. and Tibshirani, R. (2007). Pathwise coordinate optimization. Ann. Appl. Statist. 1 302-332. · Zbl 1378.90064 · doi:10.1214/07-AOAS131
[18] Fujikoshi, Y. and Satoh, K. (1997). Modified AIC and Cp in multivariate linear regression. Biometrika 84 707-716. · Zbl 0888.62055 · doi:10.1093/biomet/84.3.707
[19] Gardner, T. S., di Bernardo, D., Lorenz, D. and Collins, J. J. (2003). Inferring genetic networks and identifying compound mode of action via expression profiling. Science 301 102-105.
[20] Hyman, E., Kauraniemi, P., Hautaniemi, S., Wolf, M., Mousses, S., Rozenblum, E., Ringner, M., Sauter, G., Monni, O., Elkahloun, A., Kallioniemi, O.-P. and Kallioniemi, A. (2002). Impact of dna amplification on gene expression patterns in breast cancer. Cancer Res. 62 6240-6245.
[21] Izenman, A. (1975). Reduced-rank regression for the multivariate linear model. J. Multivariate Anal. 5 248-264. · Zbl 0313.62042 · doi:10.1016/0047-259X(75)90042-1
[22] Jeong, H., Mason, S. P., Barabasi, A. L. and Oltvai, Z. N. (2001). Lethality and centrality in protein networks. Nature 411 41-42.
[23] Kapp, A. V., Jeffrey, S. S., Langerod, A., Borresen-Dale, A. L., Han, W., Noh, D. Y., Bukholm, I. R., Nicolau, M., Brown, P. O. and Tibshirani, R. (2006). Discovery and validation of breast cancer subtypes. BMC Genomics 7 231.
[24] Kao, J. and Pollack, J. R. (2006). RNA interference-based functional dissection of the 17q12 amplicon in breast cancer reveals contribution of coamplified genes. Genes Chromosomes Cancer 45 761-769.
[25] Kim, S., Sohn, K.-A. and Xing, E. P. (2009). A multivariate regression approach to association analysis of a quantitative trait network. Bioinformatics 25 204-212.
[26] Langerod, A., Zhao, H., Borgan, O., Nesland, J. M., Bukholm, I. R., Ikdahl, T., Karesen, R., Borresen-Dale, A. L. and Jeffrey, S. S. (2007). TP53 mutation status and gene expression profiles are powerful prognostic markers of breast cancer. Breast Cancer Res. 9 R30.
[27] Lutz, R. and Bühlmann, P. (2006). Boosting for high-multivariate responses in high-dimensional linear regression. Statist. Sinica 16 471-494. · Zbl 1096.62057
[28] Obozinski, G., Wainwright, M. J. and Jordan, M. I. (2008). Union support recovery in high-dimensional multivariate regression. Available at . · Zbl 1373.62372
[29] Paik, S., Shak, S., Tang, G., Kim, C., Baker, J., Cronin, M., Baehner, F. L. Walker, M. G., Watson, D., Park, T., Hiller, W., Fisher, E. R., Wickerham, D. L., Bryant, J. and Wolmark, N. (2004). A multigene assay to predict recurrence of tamoxifen-treated, node-negative breast cancer. New England J. of Medicine 351 2817-2826.
[30] Peng, J., Wang, P., Zhou, N. and Zhu, J. (2009a). Partial correlation estimation by joint sparse regression models. J. Amer. Statist. Assoc. 104 735-746. · Zbl 1388.62046 · doi:10.1198/jasa.2009.0126
[31] Peng, J., Zhu, J., Bergamaschi, A., Han, W., Noh, D. Y., Pollack J. R. and Wang, P. (2009b). Supplement to “Regularized multivariate regression for identifying master predictors with application to integrative genomics study of breast cancer.” DOI: . · Zbl 1189.62174
[32] Pollack, J., Srlie, T., Perou, C., Rees, C., Jeffrey, S., Lonning, P., Tibshirani, R., Botstein, D., Brresen-Dale, A. and Brown, P. (2002). Microarray analysis reveals a major direct role of dna copy number alteration in the transcriptional program of human breast tumors. Proc. Natl. Acad. Sci. USA 99 12963-12968.
[33] Reinsel, G. and Velu, R. (1998). Multivariate Reduced-Rank Regression: Theory and Applications . Springer, New York. · Zbl 0909.62066
[34] Saal, L. H., Johansson, P., Holm, K., Gruvberger-Saal, S. K., She, Q. B., Maurer, M., Koujak, S., Ferrando, A. A., Malmström, P., Memeo, L., Isola, J., Bendahl, P., Rosen, N., Hibshoosh, H., Ringner, M., Borg, A. and Parsons, R. (2007). Poor prognosis in carcinoma is associated with a gene expression signature of aberrant PTEN tumor suppressor pathway activity. Proc. Natl. Acad. Sci. USA 104 7564-7569.
[35] Sorlie, T., Perou, C. M., Tibshirani, R., Aas, T., Geisler, S., Johnsen, H., Hastie, T., Eisen, M. B., van de Rijn, M., Jeffrey, S. S., Thorsen, T., Quist, H., Matese, J. C., Brown, P. O., Botstein, D., Lønning P. E. and Børresen-Dale, A. L. (2001). Gene expression patterns of breast carcinomas distinguish tumor subclasses with clinical implications. Proc. Natl. Acad. Sci. USA 98 10869-10874.
[36] Sorlie, T., Tibshirani, R., Parker, J., Hastie, T., Marron, J. S., Nobel, A., Deng, S., Johnsen, H., Pesich, R., Geisler, S., Demeter, J., Perou, C. M., Lønning, P. E., Brown, P. O., Børresen-Dale, A.-L. and Botstein, D. (2003). Repeated observation of breast tumor subtypes in independent gene expression data sets. Proc. Natl. Acad. Sci. USA 100 8418-8423.
[37] Sotiriou, C., Wirapati, P., Loi, S., Harris, A., Fox, S., Smeds, J., Nordgren, H., Farmer, P., Praz, V., Haibe-Kains, B., Desmedt, C., Larsimont, D., Cardoso, F., Peterse, H., Nuyten, D., Buyse, M., Van de Vijver, M. J., Bergh, J., Piccart, M. and Delorenzi, M. (2006). Gene expression profiling in breast cancer: Understanding the molecular basis of histologic grade to improve prognosis. J. Natl. Cancer. Inst. 98 262-272.
[38] Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. J. Roy. Statist. Soc. Ser. B 58 267-288. · Zbl 0850.62538
[39] Tibshirani, R. and Wang, P. (2008). Spatial smoothing and hot spot detection for cgh data using the fused lasso. Biostatistics 9 18-29. · Zbl 1274.62886
[40] Turlach, B., Venables, W. and Wright, S. (2005). Simultaneous variable selection. Technometrics 47 349-363. · doi:10.1198/004017005000000139
[41] Wang, P. (2004). Statistical methods for CGH array analysis. Ph.D. thesis, Stanford Univ.
[42] Wang, Y., Klijn, J. G., Zhang, Y., Sieuwerts, A. M., Look, M. P., Yang, F., Talantov, D., Timmermans, M., Meijer-van Gelder, M. and Yu, J. (2005). Gene-expression profiles to predict distant metastasis of lymph-node-negative primary breast cancer. Lancet 365 671-679.
[43] van de Vijver, M. J., He, Y. D., van’t Veer, L. J., Dai, H., Hart, A. A., Voskuil, D. W., Schreiber, G. J., Peterse, J. L., Roberts, C., Marton, M. J., Parrish, M., Atsma, D., Witteveen, A., Glas, A., Delahaye, L., van der Velde, T., Bartelink, H., Rodenhuis, S., Rutgers, E. T., Friend, S. H. and Bernards, R. (2002). A gene-expression signature as a predictor of survival in breast cancer. New England J. of Medicine 347 1999-2009.
[44] Yuan, M. and Lin, Y. (2006). Model selection and estimation in regression with grouped variables. J. Roy. Statist. Soc. Ser. B 68 49-67. · Zbl 1141.62030 · doi:10.1111/j.1467-9868.2005.00532.x
[45] Yuan, M., Ekici, A., Lu, Z. and Monterio, R. (2007). Dimension reduction and coefficient estimation in multivariate linear regression. J. Roy. Statist. Soc. Ser. B 69 329-346. · doi:10.1111/j.1467-9868.2007.00591.x
[46] Zhao, H., Langerod, A., Ji, Y., Nowels, K. W., Nesland, J. M., Tibshirani, R., Bukholm, I. K., Karesen, R., Botstein, D., Borresen-Dale, A. L. and Jeffrey, S. S. (2004). Different gene expression patterns in invasive lobular and ductal carcinomas of the breast. Mol. Biol. Cell. 15 2523-2536.
[47] Zhao, P., Rocha, G. and Yu, B. (2009). The composite absolute penalties family for grouped and hierarchical variable selection. Ann. Statist. 37 3468-3497. · Zbl 1369.62164 · doi:10.1214/07-AOS584
[48] Zou, H., Hastie, T. and Tibshirani, R. (2007). On the degrees of freedom of the lasso. Ann. Statist. 35 2173-2192. · Zbl 1126.62061 · doi:10.1214/009053607000000127
[49] Zou, H. and Hastie, T. (2005). Regularization and variable selection via the elastic net. J. Roy. Statist. Soc. Ser. B 67 301-320. · Zbl 1069.62054 · doi:10.1111/j.1467-9868.2005.00503.x
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.