×

Distance metrics for measuring joint dependence with application to causal inference. (English) Zbl 1428.62337

Summary: Many statistical applications require the quantification of joint dependence among more than two random vectors. In this work, we generalize the notion of distance covariance to quantify joint dependence among \(d\geq2\) random vectors. We introduce the high-order distance covariance to measure the so-called Lancaster interaction dependence. The joint distance covariance is then defined as a linear combination of pairwise distance covariances and their higher-order counterparts which together completely characterize mutual independence. We further introduce some related concepts including the distance cumulant, distance characteristic function, and rank-based distance covariance. Empirical estimators are constructed based on certain Euclidean distances between sample elements. We study the large-sample properties of the estimators and propose a bootstrap procedure to approximate their sampling distributions. The asymptotic validity of the bootstrap procedure is justified under both the null and alternative hypotheses. The new metrics are employed to perform model selection in causal inference, which is based on the joint independence testing of the residuals from the fitted structural equation models. The effectiveness of the method is illustrated via both simulated and real datasets. for this article are available online.

MSC:

62J10 Analysis of variance and covariance (ANOVA)
62H20 Measures of association (correlation, canonical correlation, etc.)

Software:

energy
PDFBibTeX XMLCite
Full Text: DOI arXiv

References:

[1] Bach, F. R.; Jordan, M. I., “Kernel Independent Component Analysis,”, Journal of Machine Learning Research, 3, 1-48 (2002) · Zbl 1088.68689
[2] Baringhaus, L.; Henze, N., “A Consistent Test for Multivariate Normality Based on the Empirical Characteristic Function,”, Metrika, 35, 339-348 (1988) · Zbl 0654.62046
[3] Bergsma, W.; Dassios, A., “A Consistent Test of Independence Based on a Sign Covariance Related to Kendall’s Tau,”, Bernoulli, 20, 1006-1028 (2014) · Zbl 1400.62091
[4] Bowman, A. W.; Foster, P. J., “Adaptive Smoothing and Density-Based Tests of Multivariate Normality,”, Journal of the American Statistical Association, 88, 529-537 (1993) · Zbl 0775.62086
[5] Bühlmann, P.; Peters, J.; Ernest, J., “CAM: Causal Additive Models, High-Dimensional Order Search and Penalized Regression,”, The Annals of Statistics, 42, 2526-2556 (2014) · Zbl 1309.62063
[6] Cover, T. M.; Thomas, J. A., “Elements of Information Theory (1991), New York: Wiley, New York · Zbl 0762.94001
[7] Dueck, J.; Edelmann, D.; Gneiting, T.; Richards, D., “The Affinely Invariant Distance Correlation,”, Bernoulli, 20, 2305-2330 (2014) · Zbl 1320.62133
[8] Fano, R. M., Transmission of Information (1961), Cambridge, MA: MIT Press, Cambridge, MA
[9] Gretton, A.; Bousquet, O.; Smola, A.; Schölkopf, B.; Jain, S.; Simon, H. U.; Tomita, E., Algorithmic Learning Theory, “Measuring Statistical Dependence with Hilbert-Schmidt Norms,”, 63-77 (2005), Berlin: Springer-Verlag, Berlin · Zbl 1168.62354
[10] Gretton, A.; Fukumizu, C. H. Teo.; Song, L.; Schölkopf, B.; Smola, A., “A Kernel Statistical Test of Independence,”, Advances in Neural Information Processing Systems, 20, 585-592 (2007)
[11] Henze, N.; Wagner, T., “A New Approach to the BHEP Tests for Multivariate Normality,”, Journal of Multivariate Analysis, 62, 1-23 (1997) · Zbl 0874.62043
[12] Huang, C.; Huo, X., “A Statistically and Numerically Efficient Independence Test Based on Random Projections and Distance Covariance.”, arXiv:1701.06054 (2017)
[13] Huo, X.; Székely, G. J., “Fast Computing for Distance Covariance,”, Technometrics, 58, 435-446 (2016)
[14] Lancaster, H. O., The Chi-Squared Distribution (1969), London: Wiley, London · Zbl 0193.17802
[15] Lyons, R., “Distance Covariance in Metric Spaces,”, Annals of Probability, 41, 3284-3305 (2013) · Zbl 1292.62087
[16] Malkovich, J. F.; Afifi, A. A., “On Tests for Multivariate Normality,”, Journal of the American Statistical Association, 68, 176-179 (1973)
[17] Mardia, K. V., “Measures of Multivariate Skewness and Kurtosis with Applications,”, Biometrika, 57, 519-530 (1970) · Zbl 0214.46302
[18] Matteson, D. S.; Tsay, R. S., “Independent Component Analysis Via Distance Covariance,”, Journal of the American Statistical Association, 112, 623-637 (2017)
[19] Mcgill, W. J., “Multivariate Information Transmission,”, Psychometrika, 19, 97-116 (1954) · Zbl 0058.35706
[20] Peters, J.; Mooij, J. M.; Janzing, D.; Schölkopf, B., “Causal Discovery with Continuous Additive Noise Models,”, Journal of Machine Learning Research, 15, 2009-2053 (2014) · Zbl 1318.68151
[21] Pfister, N.; Bühlmann, P.; Schölkopf, B.; Peters, J., “Kernel-based Tests for Joint Independence,”, Journal of the Royal Statistical Society, Series B, 80, 5-31 (2018) · Zbl 1381.62105
[22] Read, T.; Cressie, N., Goodness-Of-Fit Statistics for Discrete Multivariate Analysis (1988), New York: Springer-Verlag, New York
[23] Sejdinovic, D.; Gretton, A.; Bergsma, W., “A Kernel Test for Three-variable Interactions,”, in Advances in Neural Information Processing Systems (NIPS 26, 1124-1132 (2013)
[24] Sejdinovic, D.; Sriperumbudur, B.; Gretton, A.; Fukumizu, K., “Equivalence of Distance-Based and RKHS-Based Statistics in Hypothesis Testing,”, Annals of Statistics, 41, 2263-2291 (2013) · Zbl 1281.62117
[25] Sen, A.; Sen, B., “Testing Independence and Goodness-of-Fit in Linear Models,”, Biometrika, 101, 927-942 (2014) · Zbl 1306.62158
[26] Shannon, C. E.; Weaver, W., The Mathematical Theory of Communication (1949), Urbana, IL: University of Illinois Press, Urbana, IL · Zbl 0041.25804
[27] Streitberg, B., “Lancaster Interactions Revisited,”, Annals of Statistics, 18, 1878-1885 (1990) · Zbl 0713.62056
[28] Székely, G. J.; Rizzo, M. L., “Testing for Equal Distributions in High Dimension,”, InterStat, 5 (2004)
[29] Székely, G. J.; Rizzo, M. L., “Hierarchical Clustering Via Joint Between-Within Distances: Extending Ward’s Minimum Variance Method, Journal of Classification, 22, 151-183 (2005) · Zbl 1336.62192
[30] Székely, G. J.; Rizzo, M. L., “Brownian Distance Covariance, Annals of Applied Statistics, 3, 1236-1265 (2009) · Zbl 1196.62077
[31] Székely, G. J.; Rizzo, M. L., “On the Uniqueness of Distance Covariance, Statistics and Probability Letters, 82, 2278-2282 (2012) · Zbl 1471.62342
[32] Székely, G. J.; Rizzo, M. L., “Energy Statistics: A Class of Statistics Based on Distances, Journal of Statistical Planning and Inference, 143, 1249-1272 (2013) · Zbl 1278.62072
[33] Székely, G. J.; Rizzo, M. L., “Partial Distance Correlation with Methods for Dissimilarities, Annals of Statistics, 42, 2382-2412 (2014) · Zbl 1309.62105
[34] Székely, G. J.; Rizzo, M. L.; Bakirov, N. K., “Measuring and Testing Independence by Correlation of Distances,”, Annals of Statistics, 35, 2769-2794 (2007) · Zbl 1129.62059
[35] Wang, X.; Wenliang, P.; Hu, W.; Tian, Y.; Zhang, H., “Conditional Distance Correlation,”, Journal of the American Statistical Association, 110, 1726-1734 (2015) · Zbl 1373.62288
[36] Wood, S. N.; Augustin, N. H., “GAMs with Integrated Model Selection Using Penalized Regression Splines and Applications to Environmental Modelling,”, Ecological Modelling, 157, 157-177 (2002)
[37] Yao, S.; Zhang, X.; Shao, X., “Testing Mutual Independence in High Dimension Via Distance Covariance,”, Journal of the Royal Statistical Society, Series B, 80, 455-480 (2018) · Zbl 1398.62151
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.