×

Doubly penalized Buckley-James method for survival data with high-dimensional covariates. (English) Zbl 1139.62063

Summary: Recent interest in cancer research focuses on predicting patients’ survival by investigating gene expression profiles based on microarray analysis. We propose a doubly penalized J. Buckley and I. James method [Biometrika 66, 429–436 (1979; Zbl 0425.62051)] for the semiparametric accelerated failure time model to relate high-dimensional genomic data to censored survival outcomes, which uses the elastic-net penalty, that is a mixture of \(L_{1}\)- and \(L_{2}\)-norm penalties. Similar to the elastic-net method for a linear regression model with uncensored data, the proposed method performs automatic gene selection and parameter estimation, where highly correlated genes are able to be selected (or removed) together. The two-dimensional tuning parameter is determined by generalized crossvalidation. The proposed method is evaluated by simulations and applied to the Michigan squamous cell lung carcinoma study.

MSC:

62P10 Applications of statistics to biology and medical sciences; meta analysis
62N02 Estimation in survival analysis and censored data
62N01 Censored data models

Citations:

Zbl 0425.62051
PDFBibTeX XMLCite
Full Text: DOI Link

References:

[1] Beer, Gene-expression profiles predict survival of patients with lung adenocarcinoma, Nature Medicine 8 pp 816– (2002)
[2] Buckley, Linear regression with censored data, Biometrika 66 pp 429– (1979) · Zbl 0425.62051 · doi:10.1093/biomet/66.3.429
[3] Cox, Regression models and lifetables, Journal of the Royal Statistical Society, Series B 34 pp 187– (1972)
[4] De Langhe, Levels of mesenchymal FGFR2 signaling modulate smooth muscle progenitor cell commitment in the lung, De Biol 299 pp 52– (2006)
[5] Efron, Least angle regression, Annals of Statistics 32 pp 407– (2004) · Zbl 1091.62054 · doi:10.1214/009053604000000067
[6] Fang, Number-Theoretic Methods in Statistics (1994) · doi:10.1007/978-1-4899-3095-8
[7] Gharib, Proteomic analysis of cytokeratin isoforms associated with survival in lung adenocarcinoma, Neoplasia 4 pp 440– (2002) · doi:10.1038/sj.neo.7900257
[8] Gharib, Genomic and proteomic analyses of VEGF and IGFBP3 in lung adenocarcinomas, Clinical Lung Cancer 5 pp 307– (2004) · doi:10.3816/CLC.2004.n.011
[9] Gui, Penalized Cox regression analysis in the highdimensional and lowsample size settings, with applications to microarray gene expression data, Bioinformatics 21 pp 3001– (2005) · doi:10.1093/bioinformatics/bti422
[10] Heller, A comparison of estimators for regression with a censored response variable, Biometrika 77 pp 515– (1990) · doi:10.1093/biomet/77.3.515
[11] Huang, Penalized partial likelihood regression for right censored data with bootstrap selection of the penalty parameter, Biometrics 58 pp 781– (2002) · Zbl 1210.62042 · doi:10.1111/j.0006-341X.2002.00781.x
[12] Huang, Iterative partial least squares with right-censored data analysis: A comparison to other dimension reduction techniques, Biometrics 61 pp 17– (2005) · Zbl 1077.62109 · doi:10.1111/j.0006-341X.2005.040304.x
[13] Huang, Regularized estimation in the accelerated failure time model with high dimensional covariates, Biometrics 62 pp 813– (2006) · Zbl 1111.62090 · doi:10.1111/j.1541-0420.2006.00562.x
[14] Kalbfleisch, The Statistical Analysis of Failure Time Data (2002) · Zbl 1012.62104 · doi:10.1002/9781118032985
[15] Koul, Regression analysis with randomly right-censored data, Annals of Statistics 9 pp 1276– (1981) · Zbl 0477.62046 · doi:10.1214/aos/1176345644
[16] Lai, Large sample theory of a modified Buckley-James estimator for regression analysis with censored data, Annals of Statistics 10 pp 1370– (1991) · Zbl 0742.62043 · doi:10.1214/aos/1176348253
[17] Li, Kernel Cox regression models for linking gene expression profiles to censored survival data, Pacific Symposium of Biocomputing 8 pp 65– (2003)
[18] Li, Partial Cox regression analysis for highdimensional microarray gene expression data, Bioinformatics 20 pp i208– (2004) · doi:10.1093/bioinformatics/bth900
[19] Li, Dimension reduction methods for microarrays with application to censored survival data, Bioinformatics 20 pp 3406– (2004) · doi:10.1093/bioinformatics/bth415
[20] Little, Statistical Analysis with Missing Data (2002) · doi:10.1002/9781119013563
[21] Ma, Additive risk models for survival data with high-dimensional covariates, Biometrics 62 pp 202– (2006) · Zbl 1091.62124 · doi:10.1111/j.1541-0420.2005.00405.x
[22] Miller, Least squares regression with censored data, Biometrika 63 pp 449– (1976) · Zbl 0344.62058 · doi:10.1093/biomet/63.3.449
[23] Nan, A varying-coefficient Cox model for the effect of age at a marker event on age at menopause, Biometrics 61 pp 576– (2005) · doi:10.1111/j.1541-0420.2005.030905.x
[24] O’Sullivan, Nonparametric estimation of relative risk using splines and crossvalidation, SIAM Journal on Scientific and Statistical Computing 9 pp 531– (1988) · Zbl 0688.65084 · doi:10.1137/0909035
[25] Park, AnL1regularizationpath algorithm for generalized linear models (2006)
[26] Raponi, Gene expression signatures for predicting prognosis of squamous cell and adenocarcinomas of the lung, Cancer Research 66 pp 7466– (2006) · doi:10.1158/0008-5472.CAN-06-1191
[27] Ritov, Estimation in a linear regression model with censored data, Annals of Statistics 18 pp 303– (1990) · Zbl 0713.62045 · doi:10.1214/aos/1176347502
[28] Rosset, Boosting as a regularized path to a maximum margin classifier, Journal of Machine Learning Research 5 pp 941– (2004) · Zbl 1222.68290
[29] Schneider, Estimation in linear models with censored data, Biometrika 73 pp 741– (1986) · Zbl 0655.62072 · doi:10.1093/biomet/73.3.741
[30] Susarla, Large sample theory for an estimator of the mean survival time from censored samples, Annals of Statistics 8 pp 1002– (1980) · Zbl 0455.62030 · doi:10.1214/aos/1176345138
[31] Susarla, A Buckley-James-type estimator for the mean with censored data, Biometrika 71 pp 624– (1984) · doi:10.1093/biomet/71.3.624
[32] Tibshirani, Regression shrinkage and selection via the Lasso, Journal of the Royal Statistical Society, Series B 58 pp 267– (1996) · Zbl 0850.62538
[33] Tibshirani, The Lasso method for variable selection in the Cox model, Statistics in Medicine 16 pp 385– (1997) · doi:10.1002/(SICI)1097-0258(19970228)16:4<385::AID-SIM380>3.0.CO;2-3
[34] Tsiatis, Estimating regression parameters using linear rank tests for censored data, Annals of Statistics 18 pp 354– (1990) · Zbl 0701.62051 · doi:10.1214/aos/1176347504
[35] Wei, The accelerated failure time model: A useful alternative to the Cox regression model in survival analysis, Statistics in Medicine 11 pp 1871– (1992) · doi:10.1002/sim.4780111409
[36] Wei, Linear regression analysis of censored survival data based on rank tests, Biometrika 77 pp 845– (1990) · doi:10.1093/biomet/77.4.845
[37] Ying, A large sample study of rank estimation for censored regression data, The Annals of Statistics 21 pp 76– (1993) · Zbl 0773.62048 · doi:10.1214/aos/1176349016
[38] Yu, A hybrid Newton-type method for censored survival data using double weights in linear models, Lifetime Data Analysis 12 pp 345– (2006) · Zbl 1356.62193 · doi:10.1007/s10985-006-9014-0
[39] Zou, Regularization and variable selection via the elastic net, Journal of the Royal Statistical Society, Series B 67 pp 301– (2005) · Zbl 1069.62054 · doi:10.1111/j.1467-9868.2005.00503.x
[40] Zou, On the degrees of freedom of the Lasso (2005)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.