Composite likelihood methods for histogram-valued random variables. (English) Zbl 1452.62337

Summary: Symbolic data analysis has been proposed as a technique for summarising large and complex datasets into a much smaller and tractable number of distributions – such as random rectangles or histograms – each describing a portion of the larger dataset. Recent work has developed likelihood-based methods that permit fitting models for the underlying data while only observing the distributional summaries. However, while powerful, when working with random histograms this approach rapidly becomes computationally intractable as the dimension of the underlying data increases. We introduce a composite-likelihood variation of this likelihood-based approach for the analysis of random histograms in \(K\) dimensions, through the construction of lower-dimensional marginal histograms. The performance of this approach is examined through simulated and real data analysis of max-stable models for spatial extremes using millions of observed datapoints in more than \(K=100\) dimensions. Large computational savings are available compared to existing model fitting approaches.


62G32 Statistics of extreme values; tail inference
62H11 Directional data; spatial statistics
62R07 Statistical aspects of big data and data science
62P12 Applications of statistics to environmental and related topics
86A08 Climate science and climate modeling


Full Text: DOI arXiv


[1] Beranger, B.; Padoan, SA; Sisson, SA, Models for extremal dependence derived from skew-symmetric families, Scand. J. Stat., 44, 1, 21-45 (2017) · Zbl 1361.62009
[2] Beranger, B., Lin, H., Sisson, S. A.: New models for symbolic data analysis. (2018) arXiv:1809.03659
[3] Beranger, B., Stephenson, A., Sisson, S. A.: High-dimensional inference using the extremal skew-\(t\) process. (2019) arXiv:1907.10187
[4] Bertrand, P., Goupil, F.: Descriptive statistics for symbolic data. In: Bock, H. H., Diday, E. (eds.) Analysis of Symbolic Data: Exploratory Methods for Extracting Statistical Information from Complex Data. Springer (2000) · Zbl 0978.62005
[5] Bevilacqua, M.; Gaetan, C.; Mateu, J.; Porcu, E., Estimating space and space-time covariance functions for large data sets: a weighted composite likelihood approach, J. Am. Stat. Assoc., 107, 497, 268-280 (2012) · Zbl 1261.62088
[6] Billard, L., Brief overview of symbolic data and analytic issues, Stat. Anal. Data Min., 4, 2, 149-156 (2011)
[7] Billard, L.; Diday, E., From the statistics of data to the statistics of knowledge, J. Am. Stat. Assoc., 98, 470-487 (2003)
[8] Billard, L.; Diday, E., Symbolic Data Analysis (2006), Chichester: Wiley, Chichester · Zbl 1026.62073
[9] Blanchet, J.; Davison, AC, Spatial modeling of extreme snow depth, Ann. Appl. Stat., 5, 3, 1699-1725 (2011) · Zbl 1228.62154
[10] Bock, HH; Diday, E., Analysis of Symbolic Data (2000), Berlin: Springer, Berlin
[11] Brito, P.; Silva, APD, Modelling interval data with normal and skew-normal distributions, J. Appl. Stat., 39, 3-20 (2012)
[12] Brito, P.; Silva, APD; Dias, JG, Probabilistic clustering of interval data, Intell. Data Anal., 19, 293-313 (2015)
[13] Castruccio, S.; Huser, R.; Genton, MG, High-order composite likelihood inference for max-stable distributions and processes, J. Comput. Graph. Stat., 24, 4, 1212-1229 (2016)
[14] Davison, AC; Padoan, SA; Ribatet, M., Statistical modelling of spatial extremes, Stat. Sci., 27, 161-186 (2012) · Zbl 1330.86021
[15] de Haan, L., A spectral representation for max-stable processes, Ann. Probab., 12, 4, 1194-1204 (1984) · Zbl 0597.60050
[16] de Haan, L.; Ferreira, A., Extreme Value Theory (2006), New York: Springer, New York · Zbl 1101.62002
[17] Dias, S.; Brito, P., Linear regression model with histogram-valued variables, Stat. Anal. Data Min., 8, 75-113 (2015)
[18] Diday, E., Introduction a l’approche symbolique en analyse des données, RAIRO Rech. Opér., 23, 2, 193-236 (1989) · Zbl 0673.62003
[19] Genton, MG; Ma, Y.; Sang, H., On the likelihood function of Gaussian max-stable processes, Biometrika, 98, 2, 481-488 (2011) · Zbl 1215.62089
[20] Godambe, VP, An optimum property of regular maximum likelihood estimation, Ann. Math. Stat., 31, 1208-1211 (1960) · Zbl 0118.34301
[21] Huang, W. K., Stein, M. L., McInerney, D. J., Sun, S., Moyer, E. J.: Estimating changes in temperature extremes from millennial scale climate simulations using generalized extreme value (GEV) distributions. arXiv:1512.08775 (2016)
[22] Jenkinson, AF, The frequency distribution of the annual maximum (or minimum) values of meteorological elements, Q. J. R. Meteorolog. Soc., 81, 158-171 (1955)
[23] Kosmelj, K.; Billard, L., Symbolic covariance matrix for interval-valued variables and its application to principal component analysis: a case study, Metodoloski Zvezki, 11, 1, 1-20 (2014)
[24] Le Rademacher, J.; Billard, L., Likelihood functions and some maximum likelihood estimators for symbolic data, J. Stat. Plan. Inference, 141, 1593-1602 (2011) · Zbl 1204.62026
[25] Le Rademacher, J.; Billard, L., Principal component analysis for histogram-valued data, Adv. Data Anal. Classif., 11, 2, 327-351 (2013) · Zbl 1414.62213
[26] Lee, Y.; Yoon, S.; Murshed, S.; Kim, M-K; Cho, C.; Baek, H-J; Park, J-S, Spatial modeling of the highest daily maximum temperature in korea via max-stable processes, Adv. Atmos. Sci., 30, 6, 160-1620 (2013)
[27] Li, F., Sang, H.: On approximating optimal weighted composite likelihood method for spatial models. J. Rapid Dissem. Stat. Res. 7(1), (2018)
[28] Lin, H., Caley, M. J., Sisson, S. A.: Estimating global species richness using symbolic data meta-analysis. (2017) arXiv:1711.03202
[29] Lindsay, B. G.: Composite likelihood methods. In: Prabhu, N.U. (ed.) Statistical Inference from Stochastic Processes (Ithaca, NY, 1987), Volume 80 of Contemp. Math. pp. 221-239. American Mathematical Society, Providence, RI
[30] Padoan, SA; Ribatet, M.; Sisson, S., Likelihood-based inference for max-stable processes, J. Am. Stat. Assoc., 105, 263-277 (2010) · Zbl 1397.62172
[31] Resnick, SI, Extreme Values, Regular Variation, and Point Processes (1987), New York: Springer, New York
[32] Ribatet, M.: Spatialextremes: Modelling spatial extremes - r package version 2.0-2 (2015)
[33] Sang, H.; Genton, MG, Tapered composite likelihood for spatial max-stable models, Spat. Stat., 8, 1, 86-103 (2014)
[34] Schlather, M., Models for stationary max-stable random fields, Extemes, 5, 1, 33-44 (2002) · Zbl 1035.60054
[35] Scott, DW; Sheather, SJ, Kernel density estimation with binned data, Commun. Stat. Theory Methods, 14, 6, 1353-1359 (1985)
[36] Silva, APD; Brito, P., Discriminant analysis of interval data: an assessment of parametric and distance-based approaches, J. Classif., 32, 516-541 (2015) · Zbl 1331.62305
[37] Sisson, SA; Fan, Y.; Beaumont, MA, Handbook of Approximate Bayesian Computation (2018), Boca Raton: Chapman and Hall/CRC Press, Boca Raton
[38] Smith, R. L.: Max-stable processes and spatial extemes. Unpublished manuscript (1990)
[39] Stocker, T., Qin, D., Plattner, G., Tignor, M., Allen, S., Boschung, J., Nauels, A., Xia, Y., Bex, V., Midgley, P.: Climate change 2013: The Physical Science Basis. Contribution of Working Group I to the Fifth Assessment Report of the Intergovernmental Panel on Climate Change. Technical Report, Intergovernmental Panel on Climate Change (2013)
[40] Varin, C.; Vidoni, P., A note on composite likelihood inference and model selection, Biometrika, 92, 519-528 (2005) · Zbl 1183.62037
[41] Varin, C.; Reid, N.; Firth, D., An overview of composite likelihood methods, Stat. Sin., 21, 5-42 (2011) · Zbl 05849508
[42] Wand, MP, Data-based choice of histogram bin width, Am. Stat., 51, 1, 59-64 (1997)
[43] Wang, X.; Zhang, Z.; Li, S., Set-valued and interval-valued stationary time series, J. Multivar. Anal., 145, 208-223 (2016) · Zbl 1332.62347
[44] Whitaker, T., Beranger, B., Sisson, S. A.: Logistic regression models for aggregated data. arXiv:1912.03805 (2019)
[45] Zhang, X.: Probabilistic modelling of symbolic data and blocking collapsed Gibbs samplers for topic models. Ph. D. thesis, UNSW Sydney (2017)
[46] Zhang, X.; Beranger, B.; Sisson, SA, Constructing likelihood functions for interval-valued random variables, Scand. J. Stat., 47, 1, 1-35 (2020) · Zbl 1444.62139
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.