##
**An application of Dirichlet process in clustering subjects via variance shift models: a course-evaluation study.**
*(English)*
Zbl 07289489

Summary: In this article, the Dirichlet process (DP) is applied to cluster subjects with longitudinal observations. The basis of clustering is the ability of subjects to adapt themselves to new circumstances. Indeed, the basis of clustering depends on the time of changing response variability. This is done by providing a random change-point time in the variance structure of mixed-effects models. The DP is assumed as a prior for the distribution of the random change point. The discrete nature of the DP is utilized to cluster subjects according to the time of adaption. The proposed model is useful to identify groups of subjects with distinctive time-based progressions or declines. Transition mixed-effects models are also used to account for the serial correlation among observations over time. A joint modelling approach is utilized to handle the bias created in these models. The Gibbs sampling technique is adopted to achieve parameter estimates. Performance of the proposed method is evaluated via conducting a simulation study. The usefulness of the proposed model is assessed on a course-evaluation dataset.

### MSC:

62-XX | Statistics |

### Keywords:

course-evaluation data; Dirichlet process prior; Markov chain Monte Carlo; model-based clustering; transition mixed-effects models; variance shift models
Full Text:
DOI

### References:

[1] | Airila, A, Hakanen, JJ, Luukkonen, R, Lusa, S, Punakallio, A, Leino-Arjas, P (2014) Developmental trajectories of multisite musculoskeletal pain and depressive symptoms: The effects of job demands and resources and individual factors. Psychology and Health, 29, 1421-41. |

[2] | Babadi, B, Rasekh, A, Rasekhi, AA, Zare, K, Zadkarami, MR (2014) A variance shift model for detection of outliers in the linear measurement error model. Abstract and Applied Analysis, 2014, 9. DOI: http://dx.doi.org/10.1155/2014/396875. · Zbl 1472.62070 |

[3] | Bock, H-H (2014) Model-based clustering methods for time series, In Gaul W, Geyer- Schulz A, Baba Y and Okada A eds. German-Japanese Interchange of Data Analysis Results, pages 3-15. Cham: Springer. |

[4] | Carbonneau, R, Boivin, M, Brendgen, M, Nagin, D, Tremblay, RE (2016) Comorbid development of disruptive behaviours from age 1/12 to 5 years in a population birth-cohort and association with school adjustment in first grade. Journal of Abnormal Child Psychology, 44, 677-90. |

[5] | Collins, BA, O’Connor, EE, Supplee, L (2016) Behaviour problems in elementary school among low-income boys: The role of teacher-child relationships. The Journal of Educational Research, DOI: 10.1080/00220671.2015.1039113 |

[6] | Crouchley, R, Davies, RB (2001) A comparison of GEE and random effects models for distinguishing heterogeneity, nonstationarity and state dependence in a collection of short binary event series. Statistical Modelling: An International Journal, 1, 271-85. · Zbl 1106.62024 |

[7] | Diggle, P, Heagerty, P, Liang, KY, Zeger, S (2002) Analysis of Longitudinal Data, 2nd edition New York, NY: Oxford University Press. |

[8] | Escobar, MD (1994) Estimating normal means with a Dirichlet process prior. Journal of the American Statistical Association, 89, 268-77. · Zbl 0791.62039 |

[9] | Ferguson, TS (1973) A Bayesian analysis of some nonparametric problems. Annals of Statistics, 1, 209-30. · Zbl 0255.62037 |

[10] | Fleig, L, Kuper, C, Lippke, S, Schwarzer, R, Wiedemann, AU (2015) Cross-behaviour associations and multiple health behaviour change: A longitudinal study on physical activity and fruit and vegetable intake. Journal of Health Psychology, 20, 525-34. |

[11] | Frühwirth-Schnatter, S (2006) Finite Mixture and Markov Switching Models. New York, NY: Springer-Verlag. · Zbl 1108.62002 |

[12] | Frühwirth-Schnatter, S (2011) Panel data analysis: A survey on model-based clustering of time series. Advances in Data Analysis and Classification, 5, 251-80. · Zbl 1274.62591 |

[13] | Geman, S, Geman, D (1984) Stochastic relaxation, Gibbs distributions and the Bayesian restoration of images. IEEE Transactions on Pattern Analysis and Machine Intelligence, 6, 721-41. · Zbl 0573.62030 |

[14] | Goldhaber, D, Hansen, M (2013) Is it just a bad class? Assessing the long-term stability of estimated teacher performance. Economica, 80, 589-612. |

[15] | Graziane, JA, Beer, JC, Snitz, BE, Chang, C-C H, Ganguli, M (2016) Dual trajectories of depression and cognition: A longitudinal population-based study. The American Jou- rnal of Geriatric Psychiatry, 24, 364-73. |

[16] | Gumedze, FN, Chatora, TD (2014) Detection of outliers in longitudinal count data via overdispersion. Computational Statistics and Data Analysis, 79, 192-202. · Zbl 06984064 |

[17] | Hastings, WK (1970) Monte Carlo sampling methods using Markov chains and their applications. Biometrika, 57, 97-109. · Zbl 0219.65008 |

[18] | Heinzl, F, Tutz, G (2013) Clustering in linear mixed models with approximate Dirichlet process mixtures using EM algorithm. Statistical Modelling, 13, 41-67. · Zbl 07257449 |

[19] | Heinzl, F, Fahrmeir, L, Kneib, T (2012) Additive mixed models with Dirichlet process mixture and P-spline priors. Adva- nces in Statistical Analysis, 96, 47-68. · Zbl 1443.62098 |

[20] | Hsiao, C (2002) Analysis of Panel Data, 2nd edition. Cambridge, UK: Cambridge University Press. |

[21] | Ishwaran, H, James, LF (2001) Gibbs sampling methods for stick-breaking priors. Journal of American Statistical Association, 96, 161-73. · Zbl 1014.62006 |

[22] | Juarez, MA, Steel, MFJ (2010) Model-based clustering of non-Gaussian panel data based on skew-t distributions. Journal of Business and Economic Statistics, 28, 52-66. · Zbl 1198.62097 |

[23] | Kazemi, I, Crouchley, R (2006) Modelling the initial conditions in dynamic regression models of panel data with random effects. In Baltagi, BH, ed. Panel Data Econometrics, Theoretical Contributions and Empirical Applications. Amsterdam: Elsevier. pages 91-117. |

[24] | Kazemi, I, Davies, RB (2002) The asymptotic bias of MLEs for dynamic panel data models. In Stasinopoulos M and Touloumi G, eds, Proceedings of the 17th International Workshop on Statistical Modelling, Greece, Chania, pages 391-95. |

[25] | Li, Y, Lin, X, Muller, P (2010) Bayesian in- ference in semiparametric mixed models for longitudinal data. Biometrics, 66, 70-78. · Zbl 1187.62057 |

[26] | Li, F, Tian, Z, Xiao, Y, Chen, Z (2015) Variance change-point detection in panel data models. Economics Letters, 126, 140-43. · Zbl 1321.62110 |

[27] | Lunn, D, Spiegelhalter, D, Thomas, A, Best, N (2009) The BUGS project: Evolution, critique and future directions (with discussion). Statistics in Medicine, 28, 3049-82. |

[28] | MacEachern, SN (1994) Estimating normal means with a conjugate style Dirichlet process prior. Communications in Statistics, 23, 727-41. · Zbl 0825.62053 |

[29] | The MathWorks, Inc. (2012) MATLAB and Statistics Toolbox Release. Natick, MA: Author. |

[30] | McLachlan, GJ, Peel, D (2000) Finite Mixture Models. New York, NY: Wiley. |

[31] | McNicholas, PD (2016) Mixture Model-based Classification. Boca Raton, Florida: Chapman and Hall/CRC. · Zbl 1454.62005 |

[32] | Morgan, GB, Hodge, KJ, Trepinksi, TM, Anderson, LW (2014) The stability of teacher performance and effectiveness: Implications for policies concerning teacher evaluation. Education Policy Analysis Archives, 22, 1-18. |

[33] | Morse, CK (1993) Does variability increase with age? An archival study of cognitive measures. Psychology and Aging, 8, 156-64. |

[34] | Nagin, D (1999) Analyzing developmental trajec- tories: A semiparametric, group-based app- roach. Psychological Methods, 4, 139-57. |

[35] | Nagin, D (2005) Group-based Modeling of Development. Cambridge, MA: Harvard University Press. |

[36] | Nie, G, Chen, Y, Zhang, L, Guo, Y (2010) Credit card customer analysis based on panel data clustering. Procedia Computer Science, 1, 2489-97. |

[37] | Nigg, CR (2001) Explaining adolescent exercise behaviour change: A longitudinal application of the transtheoretical model. Annals of Behavioral Medicine, 23, 11-20. |

[38] | Pennell, ML, Dunson, DB (2007) Fitting semiparametric random effects models to large datasets. Biostatistics, 8, 821-34. |

[39] | Rathod, V, Yadav, OP, Rathore, A (2011) Probabilistic modelling of fatigue damage accumulation for reliability prediction. International Journal of Quality, Statistics, and Reliability, 2011, 10. DOI:10.1155/2011/718901. · Zbl 1233.90133 |

[40] | Rikhtehgaran, R, Kazemi, I (2016) The determination of uncertainty levels in robust clustering of subjects with longitudinal observations using the Dirichlet process mixture. Advances in Data Analysis and Classification, 10, 541-62. DOI:10.1007/s11634-016-0262-x · Zbl 1414.62268 |

[41] | Sethuraman, J (1994) A constructive definition of Dirichlet priors.Statistica Sinica4639-50. · Zbl 0823.62007 |

[42] | Suarez, AJ, Ghosal, S (2016) Bayesian clustering of functional data using local features. Bayesian Analysis, 11, 71-98. · Zbl 1359.62264 |

[43] | Theodore, RF, Broadbent, J, Nagin, D, Ambler, A, Hogan, S, Ramrakha, S, Cutfield, W, Williams, MJA, Harrington, H, Moffitt, TE, Caspi, A, Milne, B, Poulton, R (2015) Childhood to Early-midlife systolic blood pressure trajectories: Early-life predictors, effect modifiers, and adult cardiovascular outcomes. Hypertension, 66, 1108-15. |

[44] | Tobia, R, Inauen, J (2010) Gathering time-series data for evaluating behavior-change campaigns in developing countries: Reactivity of diaries and interviews. Evaluation Review, 34, 367-90. |

[45] | Vermunt, JK (2010) Longitudinal research using mixture models. In vanMontfort K, Oud JHL and Satorra A eds. Longitudinal Research with Latent Variables, Chapter 4, Heidelberg: Springer, pages 119-152. |

[46] | Vogt, M, Linton, O (2016) Classification of non-parametric regression functions in longitudinal data models. Journal of the Royal Statistical Society: Series B. DOI: 10.1111/rssb.12155 · Zbl 1414.62282 |

[47] | Wang, P, Coit, DW (2007) Reliability and degradation modelling with random or uncertain failure threshold. In Proceeding of the Annual Reliability and Maintainability Symposium, IEEE Reliability Society, pages 392-97 Orlando, FL. |

[48] | Wang, L, Wang, X (2013) Hierarchical Dirichlet process model for gene expression clustering. EURASIP Journal on Bioinformatics and Systems Biology, 2013, 5. |

[49] | Weisburd, D, Bushway, S, Lum, C, Yang, S (2004) Trajectories of crime at places: A longitudinal study of street segments in the city of seattle. Criminology, 42, 283-22. |

[50] | Xu, R, Wunsch, D (2005) Survey of clustering algorithms. IEEE Transactions on Neural Networks, 16, 645-78. |

[51] | Xu, M, Zhong, P-S, Wang, W (2016) Detecting variance change-points for blocked time series and dependent panel data. Journal of Business and Economic Statistics, 34, 213-26. |

This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.