## High-dimensional covariance estimation by minimizing $$\ell _{1}$$-penalized log-determinant divergence.(English)Zbl 1274.62190

Summary: Given i.i.d. observations of a random vector $$X\in \mathbb R^{p}$$, we study the problem of estimating both its covariance matrix $$\Sigma ^{*}$$, and its inverse covariance or concentration matrix $$\Theta ^{*}=(\Sigma ^{*})^{ - 1}$$. When $$X$$ is multivariate Gaussian, the non-zero structure of $$\Theta ^{*}$$ is specified by the graph of an associated Gaussian Markov random field; and a popular estimator for such sparse $$\Theta ^{*}$$ is the $$\ell _{1}$$-regularized Gaussian MLE. This estimator is sensible even for for non-Gaussian $$X$$, since it corresponds to minimizing an $$\ell _{1}$$-penalized log-determinant Bregman divergence. We analyze its performance under high-dimensional scaling, in which the number of nodes in the graph $$p$$, the number of edges $$s$$, and the maximum node degree $$d$$, are allowed to grow as a function of the sample size $$n$$. In addition to the parameters $$(p,s,d)$$, our analysis identifies other key quantities that control rates: (a) the $$\ell _{\infty }$$-operator norm of the true covariance matrix $$\Sigma ^{*}$$; and (b) the $$\ell _{\infty }$$-operator norm of the sub-matrix $$\Gamma ^{*}_{SS}$$, where $$S$$ indexes the graph edges, and $$\Gamma ^{*}=(\Theta ^{*})^{ - 1}\otimes (\Theta ^{*})^{ - 1}$$; and (c) a mutual incoherence or irrepresentability measure on the matrix $$\Gamma ^{*}$$; and (d) the rate of decay $$1/f(n,\delta )$$ on the probabilities $$\{|\hat \Sigma^{n}_{ij}-\Sigma^{*}_{ij}|>\delta\}$$, where $$\hat \Sigma^{n}$$ is the sample covariance based on $$n$$ samples. Our first result establishes consistency of our estimate $$\hat \Theta$$ in the elementwise maximum-norm. This in turn allows us to derive convergence rates in Frobenius and spectral norms, with improvements upon existing results for graphs with maximum node degrees $$d=o(\sqrt{s})$$. In our second result, we show that with probability converging to one, the estimate $$\hat \Theta$$ correctly specifies the zero pattern of the concentration matrix $$\Theta^{*}$$. We illustrate our theoretical results via simulations for various graphs and problem parameters, showing good correspondences between the theoretical predictions and behavior in simulations.

### MSC:

 62F12 Asymptotic properties of parametric estimators 62F30 Parametric inference under constraints

glasso
Full Text:

### References:

