×

Neyman’s truncation test for two-sample means under high dimensional setting. (English) Zbl 07477298

Summary: Multivariate two-sample testing problems often arise from the statistical analysis for scientific data, especially for bioinformatics data. To detect components with different values between two mean vectors, well-known procedures are to apply Sum-of-Squares type tests, such as Hotelling’s \(T^2\)-test. However, such a test is not suitable to high dimensional settings because of singular covariance matrix and accumulated errors. Nowadays, a lot of test methods for high dimensional data are developed, mainly including two types, Sum-of-Squares type and Max type. The Sum-of-Squares type test statistics have poor performance against sparse alternatives. And the Max type test statistic is not powerful enough to deal with non-sparse datasets. In this paper, we propose a Max-Partial-Sum type statistic named Neyman’s Truncation test, which is conducted by maximum partial sums of marginal test statistics. Besides non-sparse datasets, Neyman’s Truncation test also has great power against dense and sparse alternatives. The asymptotic distribution of the test statistic under null hypothesis is obtained and the power of the test is analyzed. To avoid the slow convergence rate of the asymptotic distribution, we realize our method by Bootstrap procedures. Simulation studies and the analysis of leukemia dataset are carried out to verify the numerical performance.

MSC:

62-XX Statistics
65-XX Numerical analysis

Software:

gcrma; Bioconductor; R; glasso
PDFBibTeX XMLCite
Full Text: DOI

References:

[1] Bai, Z. and Saranadasa, H. (1996). Effect of high dimension: By an example of a two sample problem. Statistica Sinica 6, 311-329. · Zbl 0848.62030
[2] Barigozzi, M. and Hallin, M. (2017). A network analysis of the volatility of high dimensional financial series. Journal of the Royal Statistical Society Series C 66, 581-605. · doi:10.1111/rssc.12177
[3] Barnard, G. A. (1963). Contribution to the discussion of professor Bartlett’s paper. Journal of the Royal Statistical Society, Series B 25, 294-296.
[4] Cai, T., Liu, W. and Luo, X. (2011). A constrained \[{l_1}\] minimization approach to sparse precision matrix estimation. Journal of the American Statistical Association 106, 594-607. · Zbl 1232.62087 · doi:10.1198/jasa.2011.tm10155
[5] Cai, T., Liu, W. and Xia, Y. (2014). Two-sample test of high dimensional means under dependence. Journal of the Royal Statistical Society, Series B, Statistical Methodology 76, 349-372. · Zbl 07555454 · doi:10.1111/rssb.12034
[6] Chen, S. X., Li, J. and Zhong, P. S. (2019). A two-sample test for high-dimensional data with applications to gene-set testing. The Annals of Statistics 47, 1443-1474. · Zbl 1417.62147 · doi:10.1214/18-AOS1720
[7] Chen, S. X. and Qin, Y. L. (2010). A two-sample test for high-dimensional data with applications to gene-set testing. The Annals of Statistics 38, 808-835. · Zbl 1183.62095 · doi:10.1214/09-AOS716
[8] Dempster, A. P. (1958). A high dimensional two sample significance test. The Annals of Mathematical Statistics 29, 995-1010. · Zbl 0226.62014 · doi:10.1214/aoms/1177706437
[9] Dempster, A. P. (1960). A significance test for the separation of two highly multivariate small samples. Biometrics 16, 41-50. · Zbl 0218.62065 · doi:10.2307/2527954
[10] Dong, P., Lin, L. and Song, Y. (2018). Significance test of clustering under high dimensional setting with applications to cancer data. Journal of Statistical Computation and Simulation 88, 3349-3378. · Zbl 07192721 · doi:10.1080/00949655.2018.1518448
[11] Efron, B. and Tibshirani, R. J. (1994). An Introduction to the Bootstrap. Florida: Chapman and Hall/CRC. · doi:10.1007/978-1-4899-4541-9
[12] Eubank, R. L. and LaRiccia, V. N. (1992). Asymptotic comparison of Cramer-von Mises and nonparametric function estimation techniques for testing goodness-of-fit. The Annals of Statistics 20, 2071-2086. · Zbl 0769.62033
[13] Fan, J. (1996). Test of significance based on wavelet thresholding and Neyman’s truncation. Journal of the American Statistical Association 91, 674-688. · Zbl 0869.62032 · doi:10.2307/2291663
[14] Fan, J., Hall, P. and Yao, Q. (2007). To how many simultaneous hypothesis tests can normal, Student’s t or bootstrap calibration be applied? Journal of the American Statistical Association 102, 1282-1288. · Zbl 1332.62063 · doi:10.1198/016214507000000969
[15] Friedman, J., Hastie, T. and Tibshirani, R. (2008). Sparse inverse covariance estimation with the graphical lasso. Biostatistics 9, 432-441. · Zbl 1143.62076
[16] Fujikoshi, Y., Ulyanov, V. V. and Shimizu, R. (2011). Multivariate Statistics: High-Dimensional and Large-Sample Approximations. New York, NY: Wiley. · Zbl 1304.62016 · doi:10.1002/9780470539873
[17] Gentleman, R., Carey, V., Huber, W., Irizarry, R. and Dudoit, S. (2006). Bioinformatics and Computational Biology Solutions Using R and Bioconductor. Berlin: Springer. · Zbl 1142.62100 · doi:10.1007/0-387-29362-0
[18] Golub, T. R., Slonim, D. K., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, J. P., Bloomfield, C. D., (1999). Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring. Science 286, 531-537.
[19] Hope, A. C. (1968). A simplified Monte Carlo significance test procedure. Journal of the Royal Statistical Society, Series B 30, 582-598. · Zbl 0187.15901
[20] Hotelling, H. (1931). The generalization of Student’s ratio. The Annals of Mathematical Statistics 2, 360-378. · JFM 57.0633.01
[21] Khuyen, T. L., Caroline, C., Frédéric, J. P. and Richard, E. G. (2020). An adapted linear discriminant analysis with variable selection for the classification in high-dimension, and an application to medical data. Computational Statistics & Data Analysis 152, 107031. · Zbl 1510.62279
[22] Kosorok, M. R. and Ma, S. (2007). Marginal asymptotics for the ‘large p, small n’ paradigm: With applications to microarray data. The Annals of Statistics 35, 1456-1486. · Zbl 1123.62005 · doi:10.1214/009053606000001433
[23] Ledwina, T. (1994). Data-driven version of Neyman’s smooth test of fit. Journal of the American Statistical Association 89, 1000-1005. · Zbl 0805.62022
[24] Nettleton, D., Recknor, J. and Reecy, J. M. (2008). Identification of differentially expressed gene categories in microarray studies using nonparametric multivariate analysis. Bioinformatics 24, 192-201.
[25] Oodaira, H. (1976). Some limit theorems for the maximum of normalized sums of weakly dependent random variables. In Proceedings of the Third Japan-USSR Symposium on Probability Theory. Lecture Notes in Mathematics 550, 467-474. · Zbl 0355.60022
[26] Park, J. and Ayyala, D. N. (2013). A test for the mean vector in large dimension and small samples. Journal of Statistical Planning and Inference 143, 929-943. · Zbl 1428.62251 · doi:10.1016/j.jspi.2012.11.001
[27] Philipp, W. and Stout, W. F. (1975). Almost Sure Invariance Principles for Partial Sums of Weakly Dependent Random Variables. New York: American Mathematical Society. · Zbl 0361.60007 · doi:10.1090/memo/0161
[28] Srivastava, M. S. (2009). A test for the mean vector with fewer observations than the dimension under non-normality. Journal of Multivariate Analysis 100, 518-532. · Zbl 1154.62046 · doi:10.1016/j.jmva.2008.06.006
[29] Srivastava, M. S. and Du, M. (2008). A test for the mean vector with fewer observations than the dimension. Journal of Multivariate Analysis 99, 386-402. · Zbl 1148.62042 · doi:10.1016/j.jmva.2006.11.002
[30] Yang, X. and Nie, K. (2008). Hypothesis testing in functional linear regression models with Neyman’s truncation and wavelet thresholding for longitudinal data. Statistics in Medicine 27, 845-863. · doi:10.1002/sim.2952
[31] Zhao, T. and Liu, H. (2014). Calibrated precision matrix estimation for high-dimensional elliptical distributions. IEEE transactions on information theory/Professional Technical Group on Information Theory 60, 7874-7887. · Zbl 1359.62194 · doi:10.1109/TIT.2014.2360980
[32] Zhong, P. S. and Chen, S. X. (2011). Tests for high-dimensional regression coefficients with factorial designs. Journal of the American Statistical Association 106, 260-274 · Zbl 1396.62110 · doi:10.1198/jasa.2011.tm10284
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.