zbMATH — the first resource for mathematics

The impact of measurement error on principal component analysis. (English) Zbl 1305.62222
Summary: We investigate the effect of measurement error on principal component analysis in the high-dimensional setting. The effects of random, additive errors are characterized by the expectation and variance of the changes in the eigenvalues and eigenvectors. The results show that the impact of uncorrelated measurement error on the principal component scores is mainly in terms of increased variability and not bias. In practice, the error-induced increase in variability is small compared with the original variability for the components corresponding to the largest eigenvalues. This suggests that the impact will be negligible when these component scores are used in classification and regression or for visualizing data. However, the measurement error will contribute to a large variability in component loadings, relative to the loading values, such that interpretation based on the loadings can be difficult. The results are illustrated by simulating additive Gaussian measurement error in microarray expression data from cancer tumours and control tissues.

62H25 Factor analysis and principal components; correspondence analysis
BGX; Bioconductor
Full Text: DOI
[1] Buonaccorsi, Measurement error: models, methods, and applications (2009) · Zbl 1277.62014
[2] Carroll, Measurement error in nonlinear models: a modern perspective (2006) · Zbl 1119.62063 · doi:10.1201/9781420010138
[3] Faber, Standard errors in the eigenvalues of a cross-product matrix: Theory and applications, J. Chemom. 7 pp 495– (1993) · doi:10.1002/cem.1180070605
[4] Faber, Random error bias in principal component analysis. Part I. Derivation of theoretical predictions, Anal. Chim. Acta 304 pp 257– (1995) · doi:10.1016/0003-2670(94)00585-A
[5] Fan, Sparse high dimensional models in economics, Annu. Rev. Econ. 3 pp 291– (2011) · doi:10.1146/annurev-economics-061109-080451
[6] FerrĂ©, Selection of components in principal component analysis: a comparison of methods, Comput. Stat. Data Anal. 19 pp 669– (1995) · Zbl 0875.62253 · doi:10.1016/0167-9473(94)00020-J
[7] Hein, Bgx: a fully bayesian integrated approach to the analysis of affymetrix genechip data, Biostatistics 6 pp 349– (2005) · Zbl 1070.62103 · doi:10.1093/biostatistics/kxi016
[8] Johnstone, On the distribution of the largest eigenvalue in principal components analysis, Ann. Stat. 29 pp 295– (2001) · Zbl 1016.62078 · doi:10.1214/aos/1009210544
[9] Johnstone, On consistency and sparsity for principal components analysis in high dimensions, J. Am. Stat. Assoc. 104 pp 682– (2009) · Zbl 1388.62174 · doi:10.1198/jasa.2009.0121
[10] Jolliffe, Principal component analysis (2002) · Zbl 1011.62064
[11] Kadane, Testing overidentifying restrictions when the disturbances are small, J. Am. Stat. Assoc. 65 pp 182– (1970) · doi:10.1080/01621459.1970.10481072
[12] Karakach, Methods for estimating and mitigating errors in spotted, dual-color dna microarrays, Omics: J. Integr. Biol. 11 pp 186– (2007) · doi:10.1089/omi.2007.0008
[13] Kritchman, Determining the number of components in a factor model from limited noisy data, Chemom. Intell. Lab. Syst. 94 (1) pp 19– (2008) · doi:10.1016/j.chemolab.2008.06.002
[14] Li, High-dimensional data analysis in cancer research (2008)
[15] Nadler, Finite sample approximation results for principal component analysis: A matrix perturbation approach, Ann. Stat. 36 pp 2791– (2008) · Zbl 1168.62058 · doi:10.1214/08-AOS618
[16] Rao, Statistical eigen-inference from large wishart matrices, Ann. Stat. 36 pp 2850– (2008) · Zbl 1168.62056 · doi:10.1214/07-AOS583
[17] Rocke, A model for measurement error for gene expression arrays, J. Comput. Biol. 8 pp 557– (2001) · doi:10.1089/106652701753307485
[18] Sanguinetti, Accounting for probe-level noise in principal component analysis of microarray data, Bioinformatics 21 pp 3748– (2005) · doi:10.1093/bioinformatics/bti617
[19] Stewart, Stochastic perturbation theory, SIAM Rev. 32 pp 579– (1990) · Zbl 0722.15002 · doi:10.1137/1032121
[20] Stewart, Matrix perturbation theory (1990) · Zbl 0706.65013
[21] Turro, Bgx: a bioconductor package for the bayesian integrated analysis of affymetrix genechips, BMC Bioinformatics 8 pp 439– (2007) · Zbl 05326156 · doi:10.1186/1471-2105-8-439
[22] Wentzell, Exploratory data analysis with noisy measurements, J. Chemom. 26 pp 264– (2012) · doi:10.1002/cem.2428
[23] Wentzell, Maximum likelihood principal component analysis, J. Chemom. 11 pp 339– (1997) · doi:10.1002/(SICI)1099-128X(199707)11:4<339::AID-CEM476>3.0.CO;2-L
[24] Wilkinson, The algebraic eigenvalue problem (1965) · Zbl 0258.65037
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.