×

Sparse principal component analysis subject to prespecified cardinality of loadings. (English) Zbl 1348.65014

Summary: Most of the existing procedures for sparse principal component analysis (PCA) use a penalty function to obtain a sparse matrix of weights by which a data matrix is post-multiplied to produce PC scores. In this paper, we propose a new sparse PCA procedure which differs from the existing ones in two ways. First, the new procedure does not sparsify the weight matrix. Instead, the so-called loadings matrix is sparsified by which the score matrix is post-multiplied to approximate the data matrix. Second, the cardinality of the loading matrix i.e., the total number of nonzero loadings, is pre-specified to be an integer without using penalty functions. The procedure is called unpenalized sparse loading PCA (USLPCA). A desirable property of USLPCA is that the indices for the percentages of explained variances can be defined in the same form as in the standard PCA. We develop an alternate least squares algorithm for USLPCA which uses the fact that the PCA loss function can be decomposed as a sum of a term irrelevant to the loadings, and another one being easily minimized under cardinality constraints. A procedure is also presented for selecting the best cardinality using information criteria. The procedures are assessed in a simulation study and illustrated with real data examples.

MSC:

65C60 Computational problems in statistics (MSC2010)
62H25 Factor analysis and principal components; correspondence analysis

Software:

PMA
PDF BibTeX XML Cite
Full Text: DOI

References:

[1] Akaike, H, A new look at the statistical model identification, IEEE Trans Automat Contr, 19, 716-723, (1974) · Zbl 0314.62039
[2] d’Aspremont, A; Bach, F; Ghaoui, LE, Optimal solutions for sparse principal component analysis, J Mach Learn Res, 9, 1269-1294, (2008) · Zbl 1225.68170
[3] Eckart, C; Young, G, The approximation of one matrix by another of lower rank, Psychometrika, 1, 211-218, (1936) · JFM 62.1075.02
[4] Enki, DG; Trendafilov, NT, Sparse principal components by semi-partition clustering, Comput Stat, 27, 605-626, (2012) · Zbl 1304.65029
[5] Izenman AJ (2008) Modern multivariate statistical techniques: regression, classification, and manifold learning. Springer, New York · Zbl 1155.62040
[6] Jeffers, JNR, Two case studies in the application of principal component analysis, Appl Stat, 16, 225-236, (1967)
[7] Jolliffe IT (2002) Principal component analysis, 2nd edn. Springer, New York · Zbl 1011.62064
[8] Jolliffe, IT; Trendafilov, NT; Uddin, M, A modified principal component technique based on the LASSO, J Comput Graph Stat, 12, 531-547, (2003)
[9] Journée, M; Nesterov, Y; Richtárik, P; Sepulchre, R, Generalized power method for sparse principal component analysis, J Mach Learn Res, 11, 517-553, (2010) · Zbl 1242.62048
[10] Schwarz, G, Estimating the dimension of a model, Ann Stat, 6, 461-464, (1978) · Zbl 0379.62005
[11] Seber GAF (2008) A matrix handbook for statisticians. Wiley, Hoboken · Zbl 1143.15001
[12] Shen, H; Huang, JZ, Sparse principal component analysis via regularized low rank matrix approximation, J Multivar Anal, 99, 1015-1034, (2008) · Zbl 1141.62049
[13] SPSS Inc (1997) SPSS 7.5 statistical algorithms. SPSS Inc, Chicago · JFM 62.1075.02
[14] Takane Y (2014) Constrained principal component analysis and related techniques. CRC Press, Boca Raton · Zbl 1282.62150
[15] Trendafilov, NT, From simple structure to sparse components: a review, Comput Stat, 29, 431-454, (2014) · Zbl 1306.65143
[16] Trendafilov NT, Adachi K (2015) Sparse versus simple structure loadings. Psychometrika. doi:10.1007/s11336-014-9416-y · Zbl 1323.62124
[17] Deun, K; Wilderjans, TF; Berg, RA; Antoiadis, A; Mechelen, I, A flexible framework for sparse simultaneous component based data integration, BMC Bioinformatics, 12, 448-464, (2011)
[18] Witten, DM; Tibshirani, R; Hastie, T, A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis, Biostatistics, 10, 515-534, (2009)
[19] Yeung, KY; Ruzzo, WL, Principal component analysis for clustering gene expression data, Bioinformatics, 17, 763-774, (2001)
[20] Zou, DM; Hastie, T; Tibshirani, R, Sparse principal component analysis, J Comput Graph Stat, 15, 265-286, (2006)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.