×

Methods for analyzing multivariate phenotypes in genetic association studies. (English) Zbl 1263.62135

Summary: Multivariate phenotypes are frequently encountered in genetic association studies. The purpose of analyzing multivariate phenotypes usually includes discovery of novel genetic variants of pleiotropy effects, that is, affecting multiple phenotypes, and the ultimate goal of uncovering the underlying genetic mechanism. In recent years, there have been new method developments and applications of existing statistical methods to such phenotypes.
We provide a review of the available methods for analyzing association between a single marker and a multivariate phenotype consisting of the same type of components (e.g., all continuous or all categorical) or different types of components (e.g., some are continuous and others are categorical). We also reviewed causal inference methods designed to test whether the detected association with the multivariate phenotype is truly pleiotropy or the genetic marker exerts its effects on some phenotypes through affecting the others.

MSC:

62P10 Applications of statistics to biology and medical sciences; meta analysis
92D10 Genetics and epigenetics
92D15 Problems related to evolution

Software:

PBAT
PDFBibTeX XMLCite
Full Text: DOI

References:

[1] International HapMap Consortium, K. A. Frazer, D. G. Ballinger, et al., “A second generation human haplotype map of over 3.1 million SNPs,” Nature, vol. 449, pp. 851-861, 2007.
[2] 1000 Genomes Project Consortium, “A map of human genome variation from population-scale sequencing,” Nature, vol. 467, pp. 1061-1073, 2010. · doi:10.1038/nature09534
[3] L. A. Hindorff, H. A. Junkins, P. N. Hall, J. P. Mehta, and T. A. Manolio, “A catalog of published genome-wide association studies,” National Human Genome Research Institute, 2011, http://www.genome.gov/gwastudies/.
[4] T. A. Manolio, F. S. Collins, N. J. Cox et al., “Finding the missing heritability of complex diseases,” Nature, vol. 461, no. 7265, pp. 747-753, 2009. · doi:10.1038/nature08494
[5] E. E. Eichler, J. Flint, G. Gibson et al., “Missing heritability and strategies for finding the underlying causes of complex disease,” Nature Reviews Genetics, vol. 11, no. 6, pp. 446-450, 2010. · doi:10.1038/nrg2809
[6] N. M. Laird and J. H. Ware, “Random-effects models for longitudinal data,” Biometrics, vol. 38, no. 4, pp. 963-974, 1982. · Zbl 0512.62107 · doi:10.2307/2529876
[7] G. M. Fitzmaurice and N. M. Laird, “A likelihood-based method for analysing longitudinal binary responses,” Biometrika, vol. 80, no. 1, pp. 141-151, 1993. · Zbl 0775.62296 · doi:10.1093/biomet/80.1.141
[8] H. D. Patterson and R. Thompson, “Recovery of inter-block information when block sizes are unequal,” Biometrika, vol. 58, pp. 545-554, 1971. · Zbl 0228.62046 · doi:10.1093/biomet/58.3.545
[9] D. A. Harville, “Maximum likelihood approaches to variance component estimation and to related problems,” Journal of the American Statistical Association, vol. 72, no. 358, pp. 320-340, 1977. · Zbl 0373.62040 · doi:10.2307/2286796
[10] N. E. Breslow and D. G. Clayton, “Approximate inference in generalized linear mixed models,” Journal of the American Statistical Association, vol. 88, pp. 9-25, 1993. · Zbl 0775.62195 · doi:10.2307/2290687
[11] D. M. Bates and S. DebRoy, “Linear mixed models and penalized least squares,” Journal of Multivariate Analysis, vol. 91, no. 1, pp. 1-17, 2004. · Zbl 1051.62063 · doi:10.1016/j.jmva.2004.04.013
[12] A. T. Kraja, D. Vaidya, J. S. Pankow et al., “A bivariate genome-wide approach to metabolic syndrome: STAMPEED Consortium,” Diabetes, vol. 60, no. 4, pp. 1329-1339, 2011. · doi:10.2337/db10-1011
[13] T. M. Therneau, P. M. Grambsch, and V. S. Pankratz, “Penalized survival models and frailty,” Journal of Computational and Graphical Statistics, vol. 12, no. 1, pp. 156-175, 2003. · doi:10.1198/1061860031365
[14] R. M. Pfeiffer, A. Hildesheim, M. H. Gail et al., “Robustness of inference on measured covariates to misspecification of genetic random effects in family studies,” Genetic Epidemiology, vol. 24, no. 1, pp. 14-23, 2003. · doi:10.1002/gepi.10191
[15] M. H. Chen, X. Liu, F. Wei et al., “A comparison of strategies for analyzing dichotomous outcomes in genome-wide association studies with general pedigrees,” Genetic Epidemiology, vol. 35, no. 7, pp. 650-657, 2011. · doi:10.1002/gepi.20614
[16] K. Y. Liang and S. L. Zeger, “Longitudinal data analysis using generalized linear models,” Biometrika, vol. 73, no. 1, pp. 13-22, 1986. · Zbl 0595.62110 · doi:10.1093/biomet/73.1.13
[17] L. A. Cupples, H. T. Arruda, E. J. Benjamin et al., “The Framingham Heart Study 100K SNP genome-wide association study resource: overview of 17 phenotype working group reports,” BMC Medical Genetics, vol. 8, supplement 1, 2007. · doi:10.1186/1471-2350-8-S1-S1
[18] J. Ott and D. Rabinowitz, “A principal-components approach based on heritability for combining phenotype information,” Human Heredity, vol. 49, no. 2, pp. 106-111, 1999. · doi:10.1159/000022854
[19] Y. Wang, Y. Fang, and M. Jin, “A ridge penalized principal-components approach based on heritability for high-dimensional data,” Human Heredity, vol. 64, no. 3, pp. 182-191, 2007. · doi:10.1159/000102991
[20] Y. . Wang, Y. Fang, and S. Wang, “Clustering and principal-components approach based on heritability for mapping multiple gene expressions,” BMC Proceedings, vol. 1, supplement 1, p. S121, 2007. · doi:10.1186/1753-6561-1-s1-s121
[21] C. Lange, K. van Steen, T. Andrew, et al., “A family-based association test for repeatedly measured quantitative traits adjusting for unknown environmental and/or polygenic effects,” Statistical Applications in Genetics and Molecular Biology, vol. 3, no. 1, pp. 1544-6115, 2004. · Zbl 1166.62352 · doi:10.2202/1544-6115.1067
[22] C. Lange, D. L. DeMeo, and N. M. Laird, “Power and design considerations for a general class of family-based association tests: quantitative traits,” American Journal of Human Genetics, vol. 71, no. 6, pp. 1330-1341, 2002. · doi:10.1086/344696
[23] L. Klei, D. Luca, B. Devlin, and K. Roeder, “Pleiotropy and principal components of heritability combine to increase power for association analysis,” Genetic Epidemiology, vol. 32, no. 1, pp. 9-19, 2008. · doi:10.1002/gepi.20257
[24] M. A. R. Ferreira and S. M. Purcell, “A multivariate test of association,” Bioinformatics, vol. 25, no. 1, pp. 132-133, 2009. · Zbl 05743696 · doi:10.1093/bioinformatics/btn563
[25] C. Lange, E. K. Silverman, X. Xu, S. T. Weiss, and N. M. Laird, “A multivariate family-based association test using generalized estimating equations: FBAT-GEE,” Biostatistics, vol. 4, no. 2, pp. 195-206, 2003. · Zbl 1139.62317 · doi:10.1093/biostatistics/4.2.195
[26] Y. S. Aulchenko, D. J. de Koning, and C. Haley, “Genomewide rapid association using mixed model and regression: a fast and simple method for genomewide pedigree-based quantitative trait loci association analysis,” Genetics, vol. 177, no. 1, pp. 577-585, 2007. · doi:10.1534/genetics.107.075614
[27] K. E. Muller and B. L. Peterson, “Practical methods for computing power in testing the multivariate general linear hypothesis,” Computational Statistics and Data Analysis, vol. 2, no. 2, pp. 143-158, 1984. · Zbl 0571.65119 · doi:10.1016/0167-9473(84)90002-1
[28] H. Wu, Methods for genetic association studies using longitudinal and multivariate phenotypes in families [Ph.D. thesis], Boston University, Boston, Mass, USA, 2009.
[29] G. M. Fitzmaurice and N. M. Laird, “Regression models for mixed discrete and continuous responses with potentially missing values,” Biometrics, vol. 53, no. 1, pp. 110-122, 1997. · Zbl 0904.62082 · doi:10.2307/2533101
[30] J. Liu, Y. Pei, C. J. Papasian, and H. W. Deng, “Bivariate association analyses for the mixture of continuous and binary traits with the use of extended generalized estimating equations,” Genetic Epidemiology, vol. 33, no. 3, pp. 217-227, 2009. · doi:10.1002/gepi.20372
[31] P. C. O’Brien, “Procedures for comparing samples with multiple endpoints,” Biometrics, vol. 40, no. 4, pp. 1079-1087, 1984. · doi:10.2307/2531158
[32] X. Xu, L. Tian, and L. J. Wei, “Combining dependent tests for linkage or association across multiple phenotypic traits,” Biostatistics, vol. 4, no. 2, pp. 223-229, 2003. · Zbl 1141.62355 · doi:10.1093/biostatistics/4.2.223
[33] L. J. Wei and W. E. Johnson, “Combining dependent tests with incomplete repeated measurements,” Biometrika, vol. 72, no. 2, pp. 359-364, 1985. · Zbl 0573.62039 · doi:10.1093/biomet/72.2.359
[34] Q. Yang, H. Wu, C. Y. Guo, and C. S. Fox, “Analyze multivariate phenotypes in genetic association studies by combining univariate association tests,” Genetic Epidemiology, vol. 34, no. 5, pp. 444-454, 2010. · doi:10.1002/gepi.20497
[35] W. Pan, “Asymptotic tests of association with multiple SNPs in linkage disequilibrium,” Genetic Epidemiology, vol. 33, no. 6, pp. 497-507, 2009. · doi:10.1002/gepi.20402
[36] J.-T. Zhang, “Approximate and asymptotic distributions of chi-squared-type mixtures with applications,” Journal of the American Statistical Association, vol. 100, no. 469, pp. 273-285, 2005. · Zbl 1117.62460 · doi:10.1198/016214504000000575
[37] X. Liu and Q. Yang, “CUMP: an R package for analyzing multivariate phenotypes in genetic association studies”.
[38] S. Vansteelandt, S. Goetgeluk, S. Lutz et al., “On the adjustment for covariates in genetic association analysis: a novel, simple principle to infer direct causal effects,” Genetic Epidemiology, vol. 33, no. 5, pp. 394-405, 2009. · doi:10.1002/gepi.20393
[39] J. Pearl, “Causal diagrams for empirical research,” Biometrika, vol. 82, no. 4, pp. 669-688, 1995. · Zbl 0860.62045 · doi:10.1093/biomet/82.4.669
[40] J. M. Robins, “Data, design, and background knowledge in etiologic inference,” Epidemiology, vol. 12, no. 3, pp. 313-320, 2001. · doi:10.1097/00001648-200105000-00011
[41] P. J. Lipman, K. Y. Liu, J. D. Muehlschlegel, S. Body, and C. Lange, “Inferring genetic causal effects on survival data with associated endo-phenotypes,” Genetic Epidemiology, vol. 35, no. 2, pp. 119-124, 2011. · doi:10.1002/gepi.20557
[42] S. Vansteelandt, “Estimation of controlled direct effects on a dichotomous outcome using logistic structural direct effect models,” Biometrika, vol. 97, no. 4, pp. 921-934, 2010. · Zbl 1204.62183 · doi:10.1093/biomet/asq053
[43] G. D. Smith and S. Ebrahim, “‘Mendelian randomization’: can genetic epidemiology contribute to understanding environmental determinants of disease?” International Journal of Epidemiology, vol. 32, no. 1, pp. 1-22, 2003. · doi:10.1093/ije/dyg070
[44] D. A. Lawlor, R. M. Harbord, J. A. C. Sterne, N. Timpson, and G. D. Smith, “Mendelian randomization: using genes as instruments for making causal inferences in epidemiology,” Statistics in Medicine, vol. 27, no. 8, pp. 1133-1163, 2008. · doi:10.1002/sim.3034
[45] P. M. McKeigue, H. Campbell, S. Wild et al., “Bayesian methods for instrumental variable analysis with genetic instruments (“Mendelian randomization”): example with urate transporter SLC2A9 as an instrumental variable for effect of urate levels on metabolic syndrome,” International Journal of Epidemiology, vol. 39, no. 3, pp. 907-918, 2010. · doi:10.1093/ije/dyp397
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.