# zbMATH — the first resource for mathematics

On consistency and sparsity for principal components analysis in high dimensions. (English) Zbl 1388.62174
Summary: Principal components analysis (PCA) is a classic method for the reduction of dimensionality of data in the form of $$n$$ observations (or cases) of a vector with $$p$$ variables. Contemporary datasets often have $$p$$ comparable with or even much larger than $$n$$. Our main assertions, in such settings, are (a) that some initial reduction in dimensionality is desirable before applying any PCA-type search for principal modes, and (b) the initial reduction in dimensionality is best achieved by working in a basis in which the signals have a sparse representation. We describe a simple asymptotic model in which the estimate of the leading principal component vector via standard PCA is consistent if and only if $$p(n)/n \to 0$$. We provide a simple algorithm for selecting a subset of coordinates with largest sample variances, and show that if PCA is done on the selected subset, then consistency is recovered, even if $$p(n)\gg n$$.

##### MSC:
 62H25 Factor analysis and principal components; correspondence analysis
Full Text: