×

The importance of being clustered: uncluttering the trends of statistics from 1970 to 2015. (English) Zbl 1420.62266

Summary: In this paper, we retrace the recent history of statistics by analyzing all the papers published in five prestigious statistical journals since 1970, namely: The Annals of Statistics, Biometrika, Journal of the American Statistical Association, Journal of the Royal Statistical Society, Series B and Statistical Science. The aim is to construct a kind of “taxonomy” of the statistical papers by organizing and clustering them in main themes. In this sense being identified in a cluster means being important enough to be uncluttered in the vast and interconnected world of the statistical research. Since the main statistical research topics naturally born, evolve or die during time, we will also develop a dynamic clustering strategy, where a group in a time period is allowed to migrate or to merge into different groups in the following one. Results show that statistics is a very dynamic and evolving science, stimulated by the rise of new research questions and types of data.

MSC:

62H30 Classification and discrimination; cluster analysis (statistical aspects)
62M10 Time series, auto-correlation, regression, etc. in statistics (GARCH)
62P99 Applications of statistics
PDFBibTeX XMLCite
Full Text: DOI arXiv Euclid

References:

[1] Ambroise, C. and Govaert, G. (2000). EM Algorithm for Partially Known Labels. In Data analysis, classification, and related methods, 161-166. Springer, Berlin. · Zbl 1029.62056
[2] Banerjee, A., Dhillon, I. S., Ghosh, J. and Sra, S. (2005). Clustering on the unit hypersphere using von Mises-Fisher distributions. J. Mach. Learn. Res.6 1345-1382. · Zbl 1190.62116
[3] Ben-Israel, A. and Iyigun, C. (2008). Probabilistic D-clustering. J. Classification25 5-26. · Zbl 1260.62039 · doi:10.1007/s00357-008-9002-z
[4] Blei, D. M. and Lafferty, J. D. (2006). Dynamic topic models. In ICML ’06 Proceedings of the 23rd international conference on Machine learning 113-120. ACM, New York.
[5] Blei, D. M., Ng, A. Y. and Jordan, M. I. (2003). Latent Dirichlet allocation. J. Mach. Learn. Res.3 993-1022. · Zbl 1112.68379
[6] Bouveyron, C., Latouche, P. and Zreik, R. (2018). The stochastic topic block model for the clustering of vertices in networks with textual edges. Stat. Comput.28 11-31. · Zbl 1505.62078 · doi:10.1007/s11222-016-9713-7
[7] Chang, J. and Blei, D. M. (2009). Relational topic models for document networks. In International Conference on Artificial Intelligence and Statistics 81-88. Avaialble at http://proceedings.mlr.press/v5/chang09a/chang09a.pdf.
[8] Côme, E., Oukhellou, L., Denœux, T. and Aknin, P. (2009). Learning from partially supervised data using mixture models and belief functions. Pattern Recognition42 334-348. · Zbl 1181.68231 · doi:10.1016/j.patcog.2008.07.014
[9] Deerwester, S., Dumais, S., Furnas, G., Landauer, T. and Harshman, R. (1990). Indexing by latent semantic analysis. J. Amer. Soc. Inform. Sci.41 391-407.
[10] Dhillon, I. S. and Modha, D. S. (2001). Concept decompositions for large sparse text data using clustering. Mach. Learn.42 143-175. · Zbl 0970.68167 · doi:10.1023/A:1007612920971
[11] Diaconis, P. (1988). Group Representations in Probability and Statistics. Institute of Mathematical Statistics Lecture Notes—Monograph Series11. IMS, Hayward, CA. · Zbl 0695.60012
[12] Fligner, M. A. and Verducci, J. S. (1986). Distance based ranking models. J. Roy. Statist. Soc. Ser. B48 359-369. · Zbl 0658.62031 · doi:10.1111/j.2517-6161.1986.tb01420.x
[13] Fraley, C. and Raftery, A. E. (2002). Model-based clustering, discriminant analysis, and density estimation. J. Amer. Statist. Assoc.97 611-631. · Zbl 1073.62545 · doi:10.1198/016214502760047131
[14] Hofmann, T. (1999). Probabilistic latent semantic indexing. In Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval 50-57. ACM, New York.
[15] Ji, P. and Jin, J. (2016). Coauthorship and citation networks for statisticians. Ann. Appl. Stat.10 1779-1812. · Zbl 1454.62541 · doi:10.1214/15-AOAS896
[16] Kolar, M. and Taddy, M. (2016). Discussion of “Coauthorship and citation networks for statisticians” [MR3592033]. Ann. Appl. Stat.10 1835-1841. · Zbl 1454.62544 · doi:10.1214/16-AOAS896D
[17] Maitra, R. and Ramler, I. P. (2010). A \(k\) -mean-directions algorithm for fast clustering of data on the sphere. J. Comput. Graph. Statist.19 377-396.
[18] Mallows, C. L. (1957). Non-null ranking models. I. Biometrika44 114-130. · Zbl 0087.34001 · doi:10.1093/biomet/44.1-2.114
[19] Mardia, K. V. and Jupp, P. E. (2000). Directional Statistics, 2nd ed. Wiley Series in Probability and Statistics. Wiley, Chichester. · Zbl 0935.62065
[20] McLachlan, G. and Peel, D. (2000). Finite Mixture Models. Wiley Series in Probability and Statistics: Applied Probability and Statistics. Wiley Interscience, New York. · Zbl 0963.62061
[21] Murphy, T. B. and Martin, D. (2003). Mixtures of distance-based models for ranking data. Comput. Statist. Data Anal.41 645-655. · Zbl 1429.62258 · doi:10.1016/S0167-9473(02)00165-2
[22] Nigam, K., McCallum, A., Thrun, S. and Mitchell, T. (2000). Text classification from labeled and unlabeled documents using em. Mach. Learn.39 103-134. · Zbl 0949.68162 · doi:10.1023/A:1007692713085
[23] Salton, G. and McGill, M. J. (1986). Introduction to Modern Information Retrieval. McGraw-Hill, New York. · Zbl 0523.68084
[24] Shannon, C. E. (1948). A mathematical theory of communication. Bell System Tech. J.27 379-423, 623-656. · Zbl 1154.94303 · doi:10.1002/j.1538-7305.1948.tb01338.x
[25] Sun, Y., Han, J., Gao, J. and Yu, Y. (2009). Itopicmodel: Information network-integrated topic modeling. In Ninth IEEE International Conference on Data Mining 493-502.
[26] Vandewalle, V., Biernacki, C., Celeux, G. and Govaert, G. (2013). A predictive deviance criterion for selecting a generative model in semi-supervised classification. Comput. Statist. Data Anal.64 220-236. · Zbl 1468.62202 · doi:10.1016/j.csda.2013.02.010
[27] Varin, C., Cattelan, M. and Firth, D. (2016). Statistical modelling of citation exchange between statistics journals. J. Roy. Statist. Soc. Ser. A179 1-63.
[28] Zhong, S. and Ghosh, J. (2005). Generative model-based document clustering: A comparative study. Knowledge and Information Systems8 374-384.
[29] Zhu, X., Goldberg, A. B., Brachman, R. and Dietterich, T. (2009). Introduction to Semi-Supervised Learning. Morgan and Claypool, Williston, VT. · Zbl 1209.68435
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.