×

A new Dirichlet process for mining dynamic patterns in functional data. (English) Zbl 1429.68216

Summary: This paper proposes a novel model for mining patterns including dynamic function clustering, segmentation, and forecasting values of dependent variables simultaneously. The proposed Dependent Dirichlet Process Piecewise Regression Mixture (DDPPRM) model is capable of handling dynamic nature of data by detecting evolving clusters at each time step. This evolution manifests dynamically in three states of clusters: newly created, existing, and transient states of clusters. The model is also able to generate clusters over time infinitely. It is capable of learning the optimal number of clusters rather than using the fixed, predefined clusters. However, other clustering methods such as Fuzzy C-Regression Model and Piecewise Regression Mixture technique support none of those capabilities. The proposed model is also capable of showing regime changes and segmenting functions in regression/time series problems. A two-step Gibbs sampling method is utilized for assigning data to clusters. Expectation-Maximization method is used for finding the optimal values of parameters of functions and probability distributions. The model is validated by using some numerical experiments and calculating three validity indexes as well as Mean Square Error of the model. The results indicate that the proposed method outperforms other clustering, segmentation, and forecasting models in literature.

MSC:

68T05 Learning and adaptive systems in artificial intelligence
62H30 Classification and discrimination; cluster analysis (statistical aspects)
PDFBibTeX XMLCite
Full Text: DOI

References:

[1] Ahmed, A.; Xing, E., Dynamic non-parametric mixture models and the recurrent Chinese restaurant process: with applications to evolutionary clustering, (Proceedings of the 2008 SIAM International Conference on Data Mining, SIAM (2008)), 219-230
[2] Anh, N. K.; Tam, N. T.; Linh, N. V., Document Clustering using Dirichlet Process Mixture Model of Von Mises-Fisher Distributions (2013), SoICT: SoICT Danang, Vietnam, December 05-06 2013
[3] Arbel, J.; Mengersen, K.; Rousseau, J., Bayesian Nonparametric Dependent Model for the Study of Diversity for Species Data (2011), arXiv:1402.3093v1
[4] Banfield, J. D.; Raftery, A. E., Model-based Gaussian and non-Gaussian clustering, Biometrics, 49, 803-821 (1993) · Zbl 0794.62034
[5] Bezdek, J. C., Cluster validity with fuzzysets, J. Cybernet., 3, 58-72 (1974)
[6] Bezdek, J. C., Pattern Recognition with Fuzzy Objective Function Algorithms (1981) · Zbl 0503.68069
[7] Bezdek, J. C.; Ehrlich, R.; Full, W., FCM: the fuzzy c-means clustering algorithm, Comput. Geosci., 10, 191-203 (1984)
[8] Blei, D. M.; Frazier, P. I., Distance dependent chinese restaurant processes, J. Mach. Learn. Res., 12, 2383-2410 (2011) · Zbl 1280.68157
[9] Blei, D. M.; Jordan, M. I., Variational inference for Dirichlet process mixtures, Bayesian Anal., 1, 1, 121-144 (2006) · Zbl 1331.62259
[10] Campbell, T. D.J., Multi-Agent Planning with Bayesian Non-Parametric Asymptotics (2013), MIT, M.S. Thesis
[11] Campbell, T.; Kulis, B.; How, J. P.; Carin, L., Dynamic clustering via asymptotic of the dependent Dirichlet process mixture, (Proceedings of Campbell13_NIPS, Advances in Neural Information Processing Systems (2013)), 449-457
[12] Chamroukhi, F.; Same, A.; Govaert, G.; Aknin, P., A hidden process regression model for functional data description: application to curve discrimination, Neurocomputing, 73, 1210-1221 (2010)
[13] Chamroukhi, F., Piecewise regression mixture for simultaneous functional data clustering and optimal segmentation, J. Classification, 33 (2016) · Zbl 1364.62148
[14] Fan, W.; Bouguila, N., Online learning of a Dirichlet process mixture of generalized Dirichlet distributions for simultaneous clustering and localized feature selection, (JMLR: Workshop and Conference Proceedings, 25 (2012)), 113-128
[15] Fan, W.; Bouguila, N., Online variational learning of generalized Dirichlet mixture models with feature selection, Neurocomputing, 126, 166-179 (2014)
[17] Gershman, S. J.; Blei, D., A tutorial on Bayesian nonparametric models, J. Math. Psychol., 56, 1-12 (2012) · Zbl 1237.62062
[18] Griffin, J. E.; Steel, M. J., Order-based dependent Dirichlet processes, J. Am. Stat. Assoc., 101, 179-194 (2006) · Zbl 1118.62360
[19] Hastie, T.; Tibshirani, R., Discriminant analysis by Gaussian mixtures, J. R. Statistical Soc. B, 58, 155-176 (1996) · Zbl 0850.62476
[20] Hathaway, R. J.; Bezdek, J. C., Switching regression models and fuzzy clustering, IEEE Trans. Fuzzy Syst., 1, 195-204 (1993)
[21] James, G.; Sugar, C., Clustering for sparsely sampled functional data, J. Am. Stat. Assoc., 98, 397-408 (2003) · Zbl 1041.62052
[22] Kaican, L.; Zhi, G., Convergence rate of Gibbs sampler and its application, Sci. Chin. Ser. A Math., 48, 1430-1439 (2005) · Zbl 1125.60079
[23] Kharratzadeh, M.; Renard, B.; Coates, M. J., Bayesian topic model approaches to online and time-dependent clustering, Digit. Signal Process., 47, 25-35 (2015)
[24] Kim, S.; Tadesse, M. G., Variable selection in clustering via Dirichlet process mixture models, Biometrika., 93, 4, 877-893 (2006) · Zbl 1436.62266
[25] Lau, J. W.; So, M. K.P., Bayesian mixture of autoregressive models, Comput. Stat. Data Anal., 53, 38-60 (2008) · Zbl 1452.62655
[26] Lin, D.; Grimson, E.; Fisher, J., Construction of dependent Dirichlet processes based on poisson processes, Adv. Neural Inf. Process. Syst., 23 (2010)
[27] Liu, X.; Yang, M., Simultaneous curve registration and clustering for functional data, Comput. Stat Data Anal., 53, 1361-1376 (2009) · Zbl 1452.62993
[28] MacQueen, J. B., Some methods for classification and analysis of multivariate observations, (Proceedings of 5th Berkeley Symposium on Mathematical Statistics and Probability (1967), University of California Press), 281-297 · Zbl 0214.46201
[29] MacEachern, S. N., Dependent nonparametric processes, (Proceedings of the Bayesian Statistical Science Section, American Statistical Association (1999))
[30] Narendra, K. S.; Parthasarathy, K., Identification and control of dynamical systems using neural networks, IEEE Trans. Neural Netw., 1, 4-27 (1990)
[31] Neiswanger, W.; Wood, F.; Xing, E., The dependent Dirichlet process mixture of objects for detection-free tracking and object modeling, (Proceedings of the Seventeenth International Conference on Artificial Intelligence and Statistics (2014)), 660-668
[32] Nieto-Barajas, L. E.; Contreras-Cristan, A., A Bayesian nonparametric approach for time series clustering, Bayesian Anal., 9, 1, 147-170 (2014) · Zbl 1327.62473
[33] Reich, B. J.; Bondell, H. D., A spatial Dirichlet process mixture model for clustering population genetics data, Biometrics, 67, 381-390 (2011) · Zbl 1217.62187
[34] Roberts, G. O.; Smith, A. F.M., Simple conditions for the convergence of the Gibbs sampler and Metropolis-Hastings algorithms, Stoch. Processes Appl., 49, 207-216 (1994) · Zbl 0803.60067
[35] Rodriguez, A.; ter Horst, E., Bayesian dynamic density estimation, Bayesian Anal., 3, 339-366 (2008) · Zbl 1330.62180
[36] Rodriguez, A.; Dunson, D., Nonparametric Bayesian models through probit stick-breaking processes, Bayesian Anal., 6, 145-178 (2011) · Zbl 1330.62120
[37] Samé, A.; Chamroukhi, F.; Govaert, G.; Aknin, P., Model-based clustering and segmentation of time series with changes in regime, Adv. Data Anal. Classif., 5, 301-321 (2011) · Zbl 1274.62427
[38] Sato, I.; Tanaka, S.; Kurihara, K.; Miyashita, S.; Nakagawa, H., Quantum annealing for Dirichlet process mixture models with applications to network clustering, Neurocomputing, 121, 523-531 (2013)
[39] Spellman, P. T.; Sherlock, G.; Zhang, M. Q.; Iyer, V. R.; Anders, K.; Eisen, M. B.; Brown, P. O.; Botstein, D.; Futcher, B., Comprehensive identification of cell cycle-regulated genes of the yeast saccharomyces cerevisiae by microarray hybridization, Mol. Biol. Cell, 9, 3273-3297 (1998)
[40] Sugeno, M.; Yasukawa, T., A fuzzy-logic-based approach to qualitative modeling, IEEE Trans. Fuzzy Syst., 1, 7-31 (1993)
[41] Tayal, A.; Poupart, P.; Li, Y., Hierarchical double Dirichlet process mixture of Gaussian processes, (Twenty-Sixth AAAI Conference on Artificial Intelligence (2012))
[42] Wang, P.; Zhang, P.; Zhou, Ch.; Li, Zh.; Yang, H., Hierarchical evolving Dirichlet processes for modeling nonlinear evolutionary traces in temporal data, Data Min. Knowl. Disc., 1-33 (2016)
[43] Wong, C.; Li, W., On a mixture autoregressive model, J. R. Stat. Soc. Ser. B Stat. Methodol., 62, 95-115 (2000) · Zbl 0941.62095
[44] Xiong, Y.; Yeung, D., Time series clustering with arma mixtures, Pattern Recogn., 37, 1675-1689 (2004) · Zbl 1117.62488
[45] Xu, L.; Jordan, M. I., On convergence properties of the EM algorithm for Gaussian mixtures, Neural Comput., 8, 129-151 (1996)
[46] Zhang, W. F.; Liu, C. C.; Yan, H., Clustering of temporal gene expression data by regularized spline regression and an energy based similarity measure, Pattern Recognit., 43, 3969-3976 (2010) · Zbl 1207.68333
[47] Zhu, X.; Ghahramani, Z.; Lafferty, J., Time-Sensitive Dirichlet Process Mixture Models (2005), Carnegie Mellon University, Technical report
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.