Estimation and confidence sets for sparse normal mixtures.

*(English)*Zbl 1360.62113Summary: For high dimensional statistical models, researchers have begun to focus on situations which can be described as having relatively few moderately large coefficients. Such situations lead to some very subtle statistical problems. In particular, Ingster and Donoho and Jin have considered a sparse normal means testing problem, in which they described the precise demarcation or detection boundary. Meinshausen and Rice have shown that it is even possible to estimate consistently the fraction of nonzero coordinates on a subset of the detectable region, but leave unanswered the question of exactly in which parts of the detectable region consistent estimation is possible.

In the present paper we develop a new approach for estimating the fraction of nonzero means for problems where the nonzero means are moderately large. We show that the detection region described by Ingster and Donoho and Jin turns out to be the region where it is possible to consistently estimate the expected fraction of nonzero coordinates. This theory is developed further and minimax rates of convergence are derived. A procedure is constructed which attains the optimal rate of convergence in this setting. Furthermore, the procedure also provides an honest lower bound for confidence intervals while minimizing the expected length of such an interval. Simulations are used to enable comparison with the work of Meinshausen and Rice, where a procedure is given but where rates of convergence have not been discussed. Extensions to more general Gaussian mixture models are also given.

In the present paper we develop a new approach for estimating the fraction of nonzero means for problems where the nonzero means are moderately large. We show that the detection region described by Ingster and Donoho and Jin turns out to be the region where it is possible to consistently estimate the expected fraction of nonzero coordinates. This theory is developed further and minimax rates of convergence are derived. A procedure is constructed which attains the optimal rate of convergence in this setting. Furthermore, the procedure also provides an honest lower bound for confidence intervals while minimizing the expected length of such an interval. Simulations are used to enable comparison with the work of Meinshausen and Rice, where a procedure is given but where rates of convergence have not been discussed. Extensions to more general Gaussian mixture models are also given.

##### MSC:

62G05 | Nonparametric estimation |

62G20 | Asymptotic properties of nonparametric inference |

62G32 | Statistics of extreme values; tail inference |

##### Keywords:

confidence lower bound; estimating fraction; higher criticism; minimax estimation; optimally adaptive; sparse normal mixture
PDF
BibTeX
XML
Cite

\textit{T. T. Cai} et al., Ann. Stat. 35, No. 6, 2421--2449 (2007; Zbl 1360.62113)

**OpenURL**

##### References:

[1] | Benjamini, Y. and Hochberg, Y. (1995). Controlling the false discovery rate: A practical and powerful approach to multiple testing. J. Roy. Statist. Soc. Ser. B 57 289-300. · Zbl 0809.62014 |

[2] | Cai, T., Jin, J. and Low, M. G. (2006). Estimation and confidence sets for sparse normal mixtures. Technical report, Dept. Statistics, The Wharton School, Univ. Pennsylvania. Available at www.arxiv.org/abs/math/0612623. · Zbl 1360.62113 |

[3] | Cai, T. and Low, M. G. (2004). An adaptation theory for nonparametric confidence intervals. Ann. Statist. 32 1805-1840. · Zbl 1056.62060 |

[4] | Donoho, D. (1988). One-sided inference about functionals of a density. Ann. Statist. 16 1390-1420. · Zbl 0665.62040 |

[5] | Donoho, D. and Jin, J. (2004). Higher criticism for detecting sparse heterogeneous mixtures. Ann. Statist. 32 962-994. · Zbl 1092.62051 |

[6] | Efron, B. (2004). Large-scale simultaneous hypothesis testing: The choice of a null hypothesis. J. Amer. Statist. Assoc. 99 96-104. · Zbl 1089.62502 |

[7] | Genovese, C. and Wasserman, L. (2004). A stochastic process approach to false discovery control. Ann. Statist. 32 1035-1061. · Zbl 1092.62065 |

[8] | Ingster, Y. I. (1999). Minimax detection of a signal for \(l^p_n\)-balls. Math. Methods Statist. 7 401-428. · Zbl 1103.62312 |

[9] | Jin, J. (2004). Detecting a target in very noisy data from multiple looks. In A Festschrift for Herman Rubin (A. DasGupta, ed.) 255-286. IMS, Beachwood, OH. · Zbl 1268.94013 |

[10] | Jin, J. (2006). Proportion of nonzero normal means: Universal oracle equivalences and uniformly consistent estimations. Technical report, Dept. Statistics, Purdue Univ. |

[11] | Jin, J., Peng, J. and Wang, P. (2007). Estimating the proportion of non-null effects, with applications to CGH lung cancer data. Working manuscript. |

[12] | Le Cam, L. and Yang, G. L. (1990). Asymptotics in Statistics : Some Basic Concepts . Springer, New York. · Zbl 0719.62003 |

[13] | Maraganore, D. M., de Andrade, M. et al. (2005). High-resolution whole-genome association study of Parkinson disease. Amer. J. Human Genetics 77 685-693. |

[14] | Meinshausen, N. and Rice, J. (2006). Estimating the proportion of false null hypotheses among a large number of independently tested hypotheses. Ann. Statist. 34 373-393. · Zbl 1091.62059 |

[15] | Shorack, G. R. and Wellner, J. A. (1986). Empirical Processes with Applications to Statistics. Wiley, New York. · Zbl 1170.62365 |

This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.