×

A partially linear tree-based regression model for multivariate outcomes. (English) Zbl 1187.62182

Summary: In genetic studies of complex traits, especially behavior related ones, such as smoking and alcoholism, usually several phenotypic measurements are obtained for the description of a complex trait, but no single measurement can quantify fully the complicated characteristics of the symptom because of our lack of understanding of the underlying etiology. If those phenotypes share a common genetic mechanism, rather than studying each individual phenotype separately, it is more advantageous to analyze them jointly as a multivariate trait to enhance the power to identify associated genes.
We propose a multilocus association test for the study of multivariate traits. The test is derived from a partially linear tree-based regression model for multiple outcomes. This novel tree-based model provides a formal statistical testing framework for the evaluation of the association between a multivariate outcome and a set of candidate predictors, such as markers within a gene or pathway, while accommodating adjustment for other covariates. Through simulation studies we show that the proposed method has an acceptable type I error rate and improves power over the univariate outcome analysis, which studies each component of the complex trait separately with multiple-comparison adjustment. A candidate gene association study of multiple smoking-related phenotypes is used to demonstrate the application and advantages of this new method. The proposed method is general enough to be used for the assessment of the joint effect of a set of multiple risk factors on a multivariate outcome in other biomedical research settings.

MSC:

62P10 Applications of statistics to biology and medical sciences; meta analysis
92C50 Medical applications (general)
62J05 Linear regression; mixed models
65C60 Computational problems in statistics (MSC2010)
62N03 Testing in survival analysis and censored data
62H99 Multivariate analysis

Software:

rpart; geepack
PDFBibTeX XMLCite
Full Text: DOI Link

References:

[1] Breiman, Classification and Regression Trees (1984)
[2] Chen, A partially linear tree-based regression model for assessing complex joint gene-gene and gene-environment effects, Genetic Epidemiology 32 pp 238– (2007) · doi:10.1002/gepi.20205
[3] Cook, Tree and spline based association analysis of gene-gene interaction models for ischemic stroke, Statistics in Medicine 23 pp 1439– (2003) · doi:10.1002/sim.1749
[4] Ge, Resampling-based multiple testing for microarray data analysis, Test 18 pp 1– (2003) · Zbl 1056.62117 · doi:10.1007/BF02595811
[5] Halekoh, The R package geepack for generalized estimating equations, Journal of Statistical Software 15 pp 1– (2006) · doi:10.18637/jss.v015.i02
[6] Lange, A multivariate family-based association test using generalized estimating equations: FBAT-GEE, Biostatistics 4 pp 195– (2003) · Zbl 1139.62317 · doi:10.1093/biostatistics/4.2.195
[7] Larsen, Multivariate regression trees for analysis of abundance data, Biometrics 60 pp 543– (2004) · Zbl 1274.62807 · doi:10.1111/j.0006-341X.2004.00202.x
[8] LeBlanc, Survival trees by goodness of split, Journal of the American Statistical Association 88 pp 457– (1993) · Zbl 0773.62071 · doi:10.2307/2290325
[9] Liang, Longitudinal data analysis using generalized linear models, Biometrika 73 pp 13– (1986) · Zbl 0595.62110 · doi:10.1093/biomet/73.1.13
[10] Ring, Gene-gene interactions between CYP2B6 and CYP2A6 in nicotine metabolism, Pharmacogenetics and Genomics 17 pp 1007– (2007) · doi:10.1097/01.fpc.0000220560.59972.33
[11] Saccone, Cholinergic nicotinic receptor genes implicated in a nicotine dependence association study targeting 348 candidate genes with 3713 SNPs, Human Molecular Genetics 16 pp 36– (2007) · doi:10.1093/hmg/ddl438
[12] Segal, Tree-structured methods for longitudinal data, Journal of the American Statistical Association 87 pp 407– (1992) · doi:10.2307/2290271
[13] Therneau, An introduction to recursive partitioning using the RPART routines (1997)
[14] Westfall, Resampling-Based Multiple Testing: Examples and Methods for P-Value Adjustment (1993)
[15] Xu, Combining dependent tests for linkage association across multiple phenotypic traits, Biostatistics 4 pp 223– (2003) · Zbl 1141.62355 · doi:10.1093/biostatistics/4.2.223
[16] Yeager, Genome-wide association study of prostate cancer identifies a second risk locus at 8q24, Nature Genetics 39 pp 645– (2007) · doi:10.1038/ng2022
[17] Yu, Using tree-based recursive partitioning methods to group haplotypes for increased power in association studies, Annals of Human Genetics 69 pp 577– (2005) · doi:10.1111/j.1529-8817.2005.00193.x
[18] Yu, Two-sample comparison based on prediction error, with applications to candidate gene association studies, Annals of Human Genetics 71 pp 107– (2007) · doi:10.1111/j.1469-1809.2006.00306.x
[19] Zhang, Classification trees for multiple binary responses, Journal of American Statistical Association 93 pp 180– (1998) · Zbl 0906.62130 · doi:10.2307/2669615
[20] Zhang, Use of classification trees for association studies, Genetic Epidemiology 19 pp 323– (2000) · doi:10.1002/1098-2272(200012)19:4<323::AID-GEPI4>3.0.CO;2-5
[21] Zhang, Recursive Partitioning in the Health Sciences (1999) · Zbl 0920.62135 · doi:10.1007/978-1-4757-3027-2
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.