Multiscale clustering of nonparametric regression curves. (English) Zbl 1456.62078

Summary: In a wide range of modern applications, one observes a large number of time series rather than only a single one. It is often natural to suppose that there is some group structure in the observed time series. When each time series is modeled by a nonparametric regression equation, one may in particular assume that the observed time series can be partitioned into a small number of groups whose members share the same nonparametric regression function. We develop a bandwidth-free clustering method to estimate the unknown group structure from the data. More precisely speaking, we construct multiscale estimators of the unknown groups and their unknown number which are free of classical bandwidth or smoothing parameters. In the theoretical part of the paper, we analyze the statistical properties of our estimators. Our theoretical results are derived under general conditions which allow the data to be dependent both in time series direction and across different time series. The technical analysis of the paper is complemented by simulated and real-data examples.


62G08 Nonparametric regression and quantile regression
62H30 Classification and discrimination; cluster analysis (statistical aspects)
62M10 Time series, auto-correlation, regression, etc. in statistics (GARCH)
62P20 Applications of statistics to economics


Full Text: DOI arXiv


[1] Abraham, C.; Cornillon, P. A.; Matzner-Løber, E.; Molinari, N., Unsupervised curve clustering using B-splines, Scand. J. Stat., 30, 581-595 (2003) · Zbl 1039.91067
[2] Andrews, D. W.K., Heteroskedasticity and autocorrelation consistent covariance matrix estimation, Econometrica, 59, 817-858 (1991) · Zbl 0732.62052
[3] Armstrong, T. B.; Chan, H. P., Multiscale adaptive inference on conditional moment inequalities, J. Econometrics, 194, 24-43 (2016) · Zbl 1431.62181
[4] Boneva, L.; Linton, O.; Vogt, M., A semiparametric model for heterogeneous panel data with fixed effects, J. Econometrics, 188, 327-345 (2015) · Zbl 1337.62242
[5] Boneva, L.; Linton, O.; Vogt, M., The effect of fragmentation in trading on market quality in the UK equity market, J. Appl. Econometrics, 31, 192-213 (2016)
[6] Bonhomme, S.; Manresa, E., Grouped patterns of heterogeneity in panel data, Econometrica, 83, 1147-1184 (2015) · Zbl 1410.62100
[7] Box, G. E.P.; Hamming, W. J.; Tiao, G. C., A statistical analysis of the Los Angeles ambient carbon monoxide data, J. Air Pollut. Control Assoc., 25, 1129-1136 (1975)
[8] Chaudhuri, P.; Marron, J., SiZer for exploration of structures in curves, J. Amer. Statist. Assoc., 94, 807-823 (1999) · Zbl 1072.62556
[9] Chaudhuri, P.; Marron, J., Scale space view of curve estimation, Ann. Statist., 28, 408-428 (2000) · Zbl 1106.62318
[10] Chiou, J.-M.; Li, P.-L., Functional clustering and identifying substructures of longitudinal data, J. R. Stat. Soc. Ser. B Stat. Methodol., 69, 679-699 (2007)
[11] Dahlhaus, R., Fitting time series models to nonstationary processes, Ann. Statist., 25, 1-37 (1997) · Zbl 0871.62080
[12] Degras, D.; Xu, Z.; Zhang, T.; Wu, W. B., Testing for parallelism among trends in multiple time series, IEEE Trans. Signal Process., 60, 1087-1097 (2012) · Zbl 1391.62164
[13] Degryse, H.; De Jong, F.; Van Kervel, V., The impact of dark trading and visible fragmentation on market quality, Rev. Finance, 1-36 (2014)
[14] Dümbgen, L.; Spokoiny, V. G., Multiscale testing of qualitative hypotheses, Ann. Statist., 29, 124-152 (2001) · Zbl 1029.62070
[15] Eckle, K.; Bissantz, N.; Dette, H., Multiscale inference for multivariate deconvolution, Electron. J. Stat., 11, 4179-4219 (2017) · Zbl 1380.62143
[16] Hansen, B., Uniform convergence rates for kernel estimation with dependent data, Econometric Theory, 24, 726-748 (2008) · Zbl 1284.62252
[17] Hastie, T.; Tibshirani, R.; Friedman, J., The Elements of Statistical Learning (2009), Springer
[18] Horowitz, J. L.; Spokoiny, V. G., An adaptive, rate-optimal test of a parametric mean-regression model against a nonparametric alternative, Econometrica, 69, 599-631 (2001) · Zbl 1017.62012
[19] Jacques, J.; Preda, C., Functional data clustering: a survey, Adv. Data Anal. Classif., 8, 231-255 (2014) · Zbl 1414.62018
[20] James, M.; Sugar, C. A., Clustering for sparsely sampled functional data, J. Amer. Statist. Assoc., 98, 397-408 (2003) · Zbl 1041.62052
[21] de Jong, R. M.; Davidson, J., Consistency of kernel estimators of heteroscedastic and autocorrelated covariance matrices, Econometrica, 68, 407-423 (2000) · Zbl 1016.62030
[22] Luan, Y.; Li, H., Clustering of time-course gene expression data using a mixed-effects model with B-splines, Bioinformatics, 19, 474-482 (2003)
[23] Niu, X.; Tiao, G. C., Modeling satellite ozone data, J. Amer. Statist. Assoc., 90, 969-983 (1995) · Zbl 0843.62105
[24] O’Hara, M.; Ye, M., Is fragmentation harming market quality?, J. Financ. Econ., 100, 459-474 (2009)
[25] Proksch, K.; Werner, F.; Munk, A., Multiscale scanning in inverse problems, Ann. Statist., 46, 3569-3602 (2018) · Zbl 1410.62064
[26] Reinsel, G. C.; Tiao, G. C.; DeLuisi, J. J.; Basu, S.; Carriere, K., Trend analysis of aerosol-corrected Umkehr ozone profile data through 1987, J. Geophys. Res. Atmospheres, 94, 16373-16386 (1989)
[27] Robinson, P. M., Nonparametric estimation of time-varying parameters, (Hackl, P., Statistical Analysis and Forecasting of Economic Structural Change (1989), Springer), 253-264
[28] Sacks, J.; Ylvisaker, D., Designs for regression problems with correlated errors. III, Ann. Math. Stat., 41, 2057-2074 (1970) · Zbl 0234.62025
[29] Schmidt-Hieber, J.; Munk, A.; Dümbgen, L., Multiscale methods for shape constraints in deconvolution: confidence statements for qualitative features, Ann. Statist., 41, 1299-1328 (2013) · Zbl 1293.62104
[30] Su, L.; Ju, G., Identifying latent grouped patterns in panel data models with interactive fixed effects, J. Econometrics, 206, 554-573 (2018) · Zbl 1452.62960
[31] Su, L.; Shi, Z.; Phillips, P. C.B., Identifying latent structures in panel data, Econometrica, 84, 2215-2264 (2016) · Zbl 1410.62110
[32] Tarpey, T., Linear transformations and the \(k\)-means clustering algorithm, Amer. Statist., 61, 34-40 (2007)
[33] Tarpey, T.; Kinateder, K. K.J., Clustering functional data, J. Classification, 20, 93-114 (2003) · Zbl 1112.62327
[34] Vogt, M.; Linton, O., Nonparametric estimation of a periodic sequence in the presence of a smooth trend, Biometrika, 101, 121-140 (2014) · Zbl 1285.62047
[35] Vogt, M.; Linton, O., Classification of non-parametric regression functions in longitudinal data models, J. R. Stat. Soc. Ser. B Stat. Methodol., 79, 5-27 (2017) · Zbl 1414.62282
[36] Wang, W.; Phillips, P. C.B.; Su, L., Homogeneity pursuit in panel data models: theory and application, J. Appl. Econometrics, 33, 797-815 (2018)
[37] Ward, J. H., Hierarchical grouping to optimize an objective function, J. Amer. Statist. Assoc., 58, 236-244 (1963)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.