×

Regularised PCA to denoise and visualise data. (English) Zbl 1331.62298

Summary: Principal component analysis (PCA) is a well-established dimensionality reduction method commonly used to denoise and visualise data. A classical PCA model is the fixed effect model in which data are generated as a fixed structure of low rank corrupted by noise. Under this model, PCA does not provide the best recovery of the underlying signal in terms of mean squared error. Following the same principle as in ridge regression, we suggest a regularised version of PCA that essentially selects a certain number of dimensions and shrinks the corresponding singular values. Each singular value is multiplied by a term which can be seen as the ratio of the signal variance over the total variance of the associated dimension. The regularised term is analytically derived using asymptotic results and can also be justified from a Bayesian treatment of the model. Regularised PCA provides promising results in terms of the recovery of the true signal and the graphical outputs in comparison with classical PCA and with a soft thresholding estimation strategy. The distinction between PCA and regularised PCA becomes especially important in the case of very noisy data.

MSC:

62H25 Factor analysis and principal components; correspondence analysis
62J07 Ridge regression; shrinkage estimators (Lasso)
65C60 Computational problems in statistics (MSC2010)
PDFBibTeX XMLCite
Full Text: DOI arXiv

References:

[1] Bartholomew, D.: Latent Variable Models and Factor Analysis. Charles Griffin and Company Limited, London (1987) · Zbl 0664.62057
[2] Candès, E.J., Tao, T.: The power of convex relaxation: near-optimal matrix completion. IEEE Trans. Inf. Theory 56(5), 2053-2080 (2009) · Zbl 1366.15021 · doi:10.1109/TIT.2010.2044061
[3] Candès, E.J., Sing-Long, C.A., Trzasko, J.D.: Unbiased risk estimates for singular value thresholding and spectral estimators. IEEE Trans. Signal Process. 61(19), 4643-4657 (2013) · Zbl 1393.94187 · doi:10.1109/TSP.2013.2270464
[4] Caussinus, H.: Models and Uses of Principal Component Analysis (with Discussion) pp. 149-178. DSWO Press, Leiden (1986)
[5] Chikuse, Y.: Statistics on Special Manifolds. Springer, Berlin (2003) · Zbl 1026.62051 · doi:10.1007/978-0-387-21540-2
[6] Cornelius, P., Crossa, J.: Prediction assessment of shrinkage estimators of multiplicative models for multi-environment cultivar trials. Crop Sci. 39, 998-1009 (1999) · doi:10.2135/cropsci1999.0011183X003900040007x
[7] Denis, J.B., Gower, J.C.: Asymptotic covariances for the parameters of biadditive models. Util. Math. 193-205 (1994) · Zbl 0814.62072
[8] Denis, J.B., Gower, J.C.: Asymptotic confidence regions for biadditive models: interpreting genotype-environment interactions. J. R. Stat. Soc., Ser. C, Appl. Stat. 45, 479-493 (1996)
[9] Denis, J.B., Pázman, A.: Bias of least squares estimators in nonlinear regression models with constraints. Part ii: biadditive models. Appl. Math. 44, 359-374 (1999) · Zbl 1059.62557 · doi:10.1023/A:1023045028073
[10] Désert, C., Duclos, M., Blavy, P., Lecerf, F., Moreews, F., Klopp, C., Aubry, M., Herault, F., Le Roy, P., Berri, C., Douaire, M., Diot, C., Lagarrigue, S.: Transcriptome profiling of the feeding-to-fasting transition in chicken liver. BMC Genomics (2008) · Zbl 1242.68237
[11] Eisen, M.B., Spellman, P.T., Brown, P.O., Botstein, D.: Cluster analysis and display of genome-wide expression patterns. Proc. Natl. Acad. Sci. USA 95(25), 863-868 (1998) · doi:10.1073/pnas.95.25.14863
[12] Gower, J.C., Dijksterhuis, G.B.: Procrustes Problems. Oxford University Press, London (2004) · Zbl 1057.62044 · doi:10.1093/acprof:oso/9780198510581.001.0001
[13] Greenacre, M. J., Biplots in practice (2010)
[14] Hastie, T.J., Tibshirani, R.J., Friedman, J.H.: The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd edn. Springer, Berlin (2009) · Zbl 1273.62005 · doi:10.1007/978-0-387-84858-7
[15] Hoff, P.D.: Model averaging and dimension selection for the singular value decomposition. J. Am. Stat. Assoc. 102(478), 674-685 (2007) · Zbl 1172.62318 · doi:10.1198/016214506000001310
[16] Hoff, P.D.: Simulation of the matrix Bingham-von Mises-Fisher distribution, with applications to multivariate and relational data. J. Comput. Graph. Stat. 18(2), 438-456 (2009) · doi:10.1198/jcgs.2009.07177
[17] Husson, F., Le, S., Pages, J.: Exploratory Multivariate Analysis by Example Using R, 1st edn. CRC Press, Boca Raton (2010) · Zbl 1281.62006 · doi:10.1201/b10345
[18] Hwang, H., Tomiuk, M., Takane, Y.: In: Correspondence Analysis, Multiple Correspondence Analysis and Recent Developments, Sage Publications, pp. 243-263 (2009)
[19] Jolliffe, I.: In: Principal Component Analysis. Springer Series in Statistics (2002) · Zbl 1011.62064
[20] Josse, J., Husson, F.: Selecting the number of components in pca using cross-validation approximations. Comput. Stat. Data Anal. 56, 1869-1879 (2011) · Zbl 1243.62082 · doi:10.1016/j.csda.2011.11.012
[21] Mazumder, R., Hastie, T., Tibshirani, R.: Spectral regularization algorithms for learning large incomplete matrices. J. Mach. Learn. Res. 99, 2287-2322 (2010) · Zbl 1242.68237
[22] Papadopoulo, T.; Lourakis, M. I.A., Estimating the Jacobian of the singular value decomposition: theory and applications, 554-570 (2000), Berlin
[23] R Core Team, R: a language and environment for statistical computing, Vienna, Austria
[24] Robinson, G.K.: That BLUP is a good thing: the estimation of random effects. Stat. Sci. 6(1), 15-32 (1991) · Zbl 0955.62500 · doi:10.1214/ss/1177011926
[25] Roweis, S., Em algorithms for pca and spca, 626-632 (1998), Cambridge
[26] Rubin, D.B., Thayer, D.T.: EM algorithms for ML factor analysis. Psychometrika 47(1), 69-76 (1982) · Zbl 0483.62046 · doi:10.1007/BF02293851
[27] Sharif, B.; Bresler, Y., Physiologically improved NCAT phantom (PINCAT) enables in-silico study of the effects of beat-to-beat variability on cardiac MR, Berlin
[28] Takane, Y., Hwang, H.: Regularized Multiple Correspondence Analysis pp. 259-279. Chapman & Hall, Boca Raton (2006) · Zbl 1277.62161 · doi:10.1201/9781420011319.ch11
[29] Tipping, M., Bishop, C.: Probabilistic principal component analysis. J. R. Stat. Soc. B 61, 611-622 (1999) · Zbl 0924.62068 · doi:10.1111/1467-9868.00196
[30] Witten, D., Tibshirani, R., Hastie, T.: A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis. Biostatistics 10, 515-534 (2009) · Zbl 1437.62658 · doi:10.1093/biostatistics/kxp008
[31] Witten, D., Tibshirani, R., Gross, S., Narasimhan, B.: PMA: Penalized Multivariate Analysis (2011). http://CRAN.R-project.org/package=PMA, R package version 1.0.8
[32] Zou, H., Hastie, T., Tibshirani, R.: Sparse principal component analysis. J. Comput. Graph. Stat. 15(2), 265-286 (2006) · doi:10.1198/106186006X113430
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.