×

Structured analysis of the high-dimensional FMR model. (English) Zbl 1504.62167

Summary: The finite mixture of regression (FMR) model is a popular tool for accommodating data heterogeneity. In the analysis of FMR models with high-dimensional covariates, it is necessary to conduct regularized estimation and identify important covariates rather than noises. In the literature, there has been a lack of attention paid to the differences among important covariates, which can lead to the underlying structure of covariate effects. Specifically, important covariates can be classified into two types: those that behave the same in different subpopulations and those that behave differently. It is of interest to conduct structured analysis to identify such structures, which will enable researchers to better understand covariates and their associations with outcomes. Specifically, the FMR model with high-dimensional covariates is considered. A structured penalization approach is developed for regularized estimation, selection of important variables, and, equally importantly, identification of the underlying covariate effect structure. The proposed approach can be effectively realized, and its statistical properties are rigorously established. Simulation demonstrates its superiority over alternatives. In the analysis of cancer gene expression data, interesting models/structures missed by the existing analysis are identified.

MSC:

62P10 Applications of statistics to biology and medical sciences; meta analysis
62J99 Linear inference, regression

Software:

ElemStatLearn
PDFBibTeX XMLCite
Full Text: DOI

References:

[1] Bickel, P. J.; Ritov, Y.; Tsybakov, A. B., Simultaneous analysis of Lasso and Dantzig selector, Ann. Statist., 37, 4, 1705-1732 (2009) · Zbl 1173.62022
[2] Chai, H.; Shi, X.; Zhang, Q.; Zhao, Q.; Huang, Y.; Ma, S., Analysis of cancer gene expression data with an assisted robust marker identification approach, Genet. Epidemiol., 41, 6, 779-789 (2017)
[3] Collisson, E.; Campbell, J.; Brooks, A.; Berger, A.; Lee, W.; Chmielecki, J.; Beer, D.; Cope, L.; Creighton, C.; Danilova, L., Comprehensive molecular profiling of lung adenocarcinoma, Nature, 511, 7511, 543-550 (2014)
[4] Dempster, A. P.; Laird, N. M.; Rubin, D. B., Maximum likelihood from incomplete data via the EM algorithm, J. R. Stat. Soc. Ser. B Stat. Methodol., 39, 1, 1-38 (1977) · Zbl 0364.62022
[5] Dicker, L.; Huang, B.; Lin, X., Variable selection and estimation with the seamless-L0 penalty, Statist. Sinica, 23, 2, 929-962 (2013) · Zbl 1433.62068
[6] Fan, X.; Liu, M.; Fang, K.; Huang, Y.; Ma, S., Promoting structural effects of covariates in the cure rate model with penalization, Stat. Methods Med. Res., 26, 5, 2078-2092 (2017)
[7] Frommlet, F.; Bogdan, M.; Ramsey, D., Statistical Methods in High Dimensions (2016), Springer: Springer London
[8] Hafidi, B.; Mkhadri, A., The Kullback information criterion for mixture regression models, Statist. Probab. Lett., 79, 9, 807-815 (2010) · Zbl 1186.62098
[9] Hammerman, P. S.; Voet, D.; Lawrence, M. S.; Voet, D.; Jing, R.; Cibulskis, K.; Sivachenko, A.; Stojanov, P.; McKenna, A.; Lander, E. S., Comprehensive genomic characterization of squamous cell lung cancers, Nature, 489, 7417, 519-525 (2012)
[10] Hastie, T.; Tibshirani, R.; Friedman, J., The Elements of Statistical Learning Data Mining, Inference, and Prediction, 192 (2008), Springer: Springer New York
[11] Huang, Y.; Zhang, Q.; Zhang, S.; Huang, J.; Ma, S., Promoting similarity of sparsity structures in integrative analysis with penalization, J. Amer. Statist. Assoc., 112, 517, 342-350 (2017)
[12] Jiang, Y.; Shi, X.; Zhao, Q.; Krauthammer, M.; Rothberg, B. E.; Ma, S., Integrated analysis of multidimensional omics data on cutaneous melanoma prognosis, Genomics, 107, 6, 223-230 (2016)
[13] Khalili, A.; Chen, J., Variables selection in finite mixture of regression models, J. Amer. Statist. Assoc., 102, 479, 1025-1038 (2007) · Zbl 1469.62306
[14] Khalili, A.; Lin, S., Regularization in finite mixture of regression models with diverging number of parameters, Biometrics, 69, 2, 436-446 (2013) · Zbl 1273.62254
[15] Lawrence, M. S.; Petar, S.; Paz, P.; Kryukov, G. V.; Kristian, C.; Andrey, S.; Carter, S. L.; Chip, S.; Mermel, C. H.; Roberts, S. A., Mutational heterogeneity in cancer and the search for new cancer-associated genes, Nature, 499, 7457, 214-218 (2013)
[16] Liu, J.; Huang, J.; Ma, S., Incorporating network structure in integrative analysis of cancer prognosis data, Genet. Epidemiol., 37, 2, 173-183 (2013)
[17] Lloyd-Jones, L. R.; Nguyen, H. D.; Mclachlan, G. J., A globally convergent algorithm for lasso-penalized mixture of linear regression models, Comput. Statist. Data Anal., 119, 19-38 (2018) · Zbl 1469.62109
[18] McLachlan, G. J.; Peel, D., Finite Mixture Models (2000), Wiley: Wiley New York · Zbl 0963.62061
[19] Molony, C.; Sieberts, S. K.; Schadt, E. E., Processing Large-Scale, High-Dimension Genetic and Gene Expression Data, 307-330 (2009), Springer: Springer Berlin, Heidelberg
[20] Pan, W.; Shen, X., Penalized model-based clustering with application to variable selection, 1145-1164 (2007) · Zbl 1222.68279
[21] Städler, N.; Bühlmann, P.; Van De Geer, S., \( l_1\)-Penalization for mixture regression models, Test, 19, 2, 280-285 (2010) · Zbl 1203.62129
[22] Tibshirani, R.; Saunders, M.; Rosset, S.; Zhu, J.; Knight, K., Sparsity and smoothness via the fused lasso, J. R. Stat. Soc. Ser. B Stat. Methodol., 67, 1, 91-108 (2005) · Zbl 1060.62049
[23] Van De Geer, S. A.; Bühlmann, P., On the conditions used to prove oracle results for the lasso, Electron. J. Stat., 3, 1360-1392 (2009) · Zbl 1327.62425
[24] Wedel, M.; Desarbo, W. S., A mixture likelihood approach for generalized linear models, J. Classification, 12, 1, 21-55 (1995) · Zbl 0825.62611
[25] Wedel, M.; Desarbo, W. S., Mixture Regression Models, 366-382 (2000), Springer: Springer Boston, MA
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.