zbMATH — the first resource for mathematics

Confidence intervals and hypothesis testing for high-dimensional regression. (English) Zbl 1319.62145
Summary: Fitting high-dimensional statistical models often requires the use of non-linear parameter estimation procedures. As a consequence, it is generally impossible to obtain an exact characterization of the probability distribution of the parameter estimates. This in turn implies that it is extremely challenging to quantify the uncertainty associated with a certain parameter estimate. Concretely, no commonly accepted procedure exists for computing classical measures of uncertainty and statistical significance as confidence intervals or \(p\)-values for these models. We consider here high- dimensional linear regression problem, and propose an efficient algorithm for constructing confidence intervals and \(p\)-values. The resulting confidence intervals have nearly optimal size. When testing for the null hypothesis that a certain parameter is vanishing, our method has nearly optimal power. Our approach is based on constructing a ‘de-biased’ version of regularized M-estimators. The new construction improves over recent work in the field in that it does not assume a special structure on the design matrix. We test our method on synthetic data and a high-throughput genomic data set about riboflavin production rate, made publicly available by Bühlmann et al.

62J07 Ridge regression; shrinkage estimators (Lasso)
62F12 Asymptotic properties of parametric estimators
62F25 Parametric tolerance and confidence regions
62P10 Applications of statistics to biology and medical sciences; meta analysis
92D10 Genetics and epigenetics
Full Text: Link arXiv