On the distribution of the largest eigenvalue in principal components analysis. (English) Zbl 1016.62078
Summary: Let $$x_{(1)}$$ denote the square of the largest singular value of an $$n\times p$$ matrix $$X$$, all of whose entries are independent standard Gaussian variates. Equivalently, $$x_{(1)}$$ is the largest principal component variance of the covariance matrix $$X'X$$, or the largest eigenvalue of a $$p$$-variate Wishart distribution with $$n$$ degrees of freedom and identity covariance. Consider the limit of large $$p$$ and $$n$$ with $$n/p=\gamma\geq 1$$. When centered by $$\mu_p=(\sqrt{n-1}+\sqrt p)^2$$ and scaled by $$\sigma_p=\break (\sqrt{n-1}+\sqrt p)(1/\sqrt{n-1}+1/\sqrt p)^{1/3}$$, the distribution of $$x_{(1)}$$ approaches the Tracy-Widom law [C.A. Tracy and H. Widom, J. Stat. Phys. 92, No. 5-6, 809-835 (1998; Zbl 0942.60099)] of order 1, which is defined in terms of a Painlevé II differential equation and can be numerically evaluated and tabulated by software.
Simulations show the approximation to be informative for $$n$$ and $$p$$ as small as 5. The limit is derived via a corresponding result for complex Wishart matrices using methods from random matrix theory. The result suggests that some aspects of large $$p$$ multivariate distribution theory may be easier to apply in practice than their fixed $$p$$ counterparts.

##### MSC:
 62H25 Factor analysis and principal components; correspondence analysis 62H10 Multivariate distribution of statistics 15B52 Random matrices (algebraic aspects) 33E17 Painlevé-type functions 33C45 Orthogonal polynomials and functions of hypergeometric type (Jacobi, Laguerre, Hermite, Askey scheme, etc.) 60F05 Central limit and other weak theorems
