×

Learning from missing data with the binary latent block model. (English) Zbl 1477.62005

Summary: Missing data can be informative. Ignoring this information can lead to misleading conclusions when the data model does not allow information to be extracted from the missing data. We propose a co-clustering model, based on the binary Latent Block Model, that aims to take advantage of this nonignorable nonresponses, also known as Missing Not At Random data. A variational expectation-maximization algorithm is derived to perform inference and a model selection criterion is presented. We assess the proposed approach on a simulation study, before using our model on the voting records from the lower house of the French Parliament, where our analysis brings out relevant groups of MPs and texts, together with a sensible interpretation of the behavior of non-voters.

MSC:

62-08 Computational methods for problems pertaining to statistics
62D10 Missing data
62H30 Classification and discrimination; cluster analysis (statistical aspects)
PDF BibTeX XML Cite
Full Text: DOI arXiv

References:

[1] Baudry, JP; Celeux, G., EM for mixtures, Stat. Comput., 25, 4, 713-726 (2015) · Zbl 1331.62301
[2] Bhatia, P., Iovleff, S., Govaert, G.: blockcluster: An R package for model based co-clustering. https://hal.inria.fr/hal-01093554, working paper or preprint (2014)
[3] Biernacki, C., Celeux, G., Govaert, G.: Assessing a mixture model for clustering with the integrated classification likelihood. Technical Report. RR-3521, INRIA. https://hal.inria.fr/inria-00073163 (1998) · Zbl 0933.68117
[4] Biernacki, C.; Celeux, G.; Govaert, G., Choosing starting values for the EM algorithm for getting the highest likelihood in multivariate Gaussian mixture models, Comput. Stat. Data Anal., 41, 561-575 (2003) · Zbl 1429.62235
[5] Celeux, G.; Diebolt, J., The SEM algorithm: a probabilistic teacher algorithm derived from the EM algorithm for the mixture problem, Comput. Stat. Q., 2, 73-82 (1985)
[6] Corneli, M.; Bouveyron, C.; Latouche, P., Co-clustering of ordinal data via latent continuous random variables and not missing at random entries, J. Comput. Graph. Stat., 1, 21 (2020)
[7] Dhillon, I.S.: Co-clustering documents and words using bipartite spectral graph partitioning. In: Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 269-274 (2001)
[8] Dhillon, I.S., Mallela, S., Modha, D.S.: Information-theoretic co-clustering. In: Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Association for Computing Machinery, New York, KDD ’03, pp. 89—98. doi:10.1145/956750.956764 (2003)
[9] Ding, C., Li, T., Peng, W., Park, H.: Orthogonal nonnegative matrix t-factorizations for clustering. In: Proceedings of the Twelfth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining 2006, pp. 126-135 (2006). doi:10.1145/1150402.1150420
[10] George, T., Merugu, S.: A scalable collaborative filtering framework based on co-clustering. In: Fifth IEEE International Conference on Data Mining (ICDM). doi:10.1109/ICDM.2005.14 (2005)
[11] Govaert, G.; Nadif, M., Block clustering with Bernoulli mixture models: comparison of different approaches, Comput. Stat. Data Anal., 52, 6, 3233-3245 (2008) · Zbl 1452.62444
[12] Hernández-Lobato, J.M., Houlsby, N., Ghahramani, Z.: Probabilistic matrix factorization with non-random missing data. In: Proceedings of the 31st International Conference on Machine Learning, pp. 1512-1520 (2014)
[13] Jaakkola, T.S.: Tutorial on variational approximation methods. In: Mean, Advanced (ed.) Opper M, Saad D, pp. 129-159. Theory and Practice, MIT Press, Field Methods (2000)
[14] Jacques, J.; Biernacki, C., Model-based co-clustering for ordinal data, Comput. Stat. Data Anal., 123, 101-115 (2018) · Zbl 1469.62086
[15] Jordan, MI; Ghahramani, Z.; Jaakkola, TS; Saul, LK, An introduction to variational methods for graphical models, Mach. Learn., 37, 2, 183-233 (1999) · Zbl 0945.68164
[16] Keribin, C., Brault, V., Celeux, G., Govaert, G.: Model selection for the binary latent block model. In: Proceedings of COMPSTAT (2012) · Zbl 1331.62149
[17] Keribin, C.; Brault, V.; Celeux, G.; Govaert, G., Estimation and selection for the latent block model on categorical data, Stat. Comput., 25, 6, 1201-1216 (2015) · Zbl 1331.62149
[18] Kim, Y.D., Choi, S.: Bayesian binomial mixture model for collaborative prediction with non-random missing data. In: Eighth ACM Conference on Recommender Systems (RecSys), pp. 201-208 (2014)
[19] Kluger, Y.; Basri, R.; Chang, J.; Gerstein, M., Spectral biclustering of microarray data: coclustering genes and conditions, Genome Res., 13, 703-716 (2003)
[20] Labiod, L., Nadif, M.: Co-clustering for binary and categorical data with maximum modularity. In: 11th IEEE International Conference on Data Mining (ICDM), pp. 1140-1145. doi:10.1109/ICDM.2011.37 (2011)
[21] Latouche, P.; Birmelé, E.; Ambroise, C., Overlapping stochastic block models with application to the French political blogosphere, Ann. Appl. Stat., 5, 1, 309-336 (2011) · Zbl 1220.62083
[22] Little, RJA; Rubin, DB, Introduction Statistical Analysis with Missing Data, 1-23 (1986), New York: Wiley, New York
[23] Lomet, A.: Sélection de modèle pour la classification croisée de données continues. PhD thesis, Université de technologie de Compiègne, http://www.theses.fr/2012COMP2041, thèse de doctorat dirigée par Govaert, Gérard et Grandvalet, Yves Technologies de l’information et des systèmes Compiègne (2012)
[24] Lomet, A., Govaert, G., Grandvalet, Y.: Design of artificial data tables for co-clustering analysis. Université de technologie de Compiègne, France, Technical report (2012) · Zbl 1416.62349
[25] Marlin, B.M, Zemel, R.S, Roweis, S.T., Slaney, M.: Collaborative filtering and the missing at random assumption. In: Twenty-Third Conference on Uncertainty in Artificial Intelligence (UAI), pp. 267-275 (2007)
[26] Marlin, B.M., Zemel, R.S., Roweis, S.T., Slaney, M.: Recommender systems, missing data and statistical model estimation. In: Proceedings of the 22nd International Joint Conference on Artificial Intelligence (IJCAI), pp. 2686-2691 (2011)
[27] Nadif, M.; Govaert, G., Latent block model for contingency table, Commun. Stat. Theory Methods, 39, 3, 416-425 (2010) · Zbl 1187.62117
[28] Papalexakis, EE; Sidiropoulos, N.; Bro, R., From k-means to higher-way co-clustering: multilinear decomposition with sparse latent factors, IEEE Trans. Signal Process., 61, 2, 493-506 (2013)
[29] Parisi, G.: Statistical field theory. Frontiers in Physics, Addison-Wesley. https://cds.cern.ch/record/111935 (1988) · Zbl 0984.81515
[30] Paszke A, Gross S, Massa F, Lerer A, Bradbury J, Chanan G, Killeen T, Lin Z, Gimelshein N, Antiga L, Desmaison A, Kopf A, Yang E, DeVito Z, Raison M, Tejani A, Chilamkurthy S, Steiner B, Fang L, Bai J, Chintala S (2019) Pytorch: An imperative style, high-performance deep learning library. In: Wallach H, Larochelle H, Beygelzimer A, d’Alché-Buc F, Fox E, Garnett R (eds) Advances in Neural Information Processing Systems 32, Curran Associates, Inc., pp. 8024-8035
[31] Pontes, B.; Giráldez, R.; Aguilar-Ruiz, JS, Biclustering on expression data: a review, J. Biomed. Inf., 57, 163-180 (2015)
[32] Rohe, K.; Chatterjee, S.; Yu, B., Spectral clustering and the high-dimensional stochastic blockmodel, Ann. Stat., 39, 4, 1878-1915 (2011) · Zbl 1227.62042
[33] Rubin, DB, Inference and missing data, Biometrika, 63, 3, 581-592 (1976) · Zbl 0344.62034
[34] Schnabel T, Swaminathan A, Singh A, Chandak N, Joachims T (2016) Recommendations as treatments: debiasing learning and evaluation. In: Proceedings of the 33nd International Conference on Machine Learning (ICML), pp. 1670-1679. http://proceedings.mlr.press/v48/schnabel16.html
[35] Selosse, M.; Jacques, J.; Biernacki, C., Model-based co-clustering for mixed type data, Comput. Stat. Data Anal., 144, 106866 (2020) · Zbl 07160684
[36] Selosse, M.; Jacques, J.; Biernacki, C., Textual data summarization using the self-organized co-clustering model, Pattern Recogn., 103, 107315 (2020)
[37] Shan, H., Banerjee, A.: Bayesian co-clustering. In: 2008 Eighth IEEE International Conference on Data Mining, pp. 530-539 (2008)
[38] Shireman, E.; Steinley, D.; Brusco, M., Examining the effect of initialization strategies on the performance of Gaussian mixture modeling, Behav. Res. Methods, 4, 9 (2015)
[39] Steck, H.: Training and testing of recommender systems on data missing not at random. In: 16th ACM International Conference on Knowledge Discovery and Data Mining (SIGKDD), pp. 713-722 (2010)
[40] Tabouy, T.; Barbillon, P.; Chiquet, J., Variational inference for stochastic block models from sampled data, J. Am. Stat. Assoc., 115, 529, 455-466 (2020) · Zbl 1437.62072
[41] Vázquez, DP; Blüthgen, N.; Cagnolo, L.; Chacoff, NP, Uniting pattern and process in plant-animal mutualistic networks: a review, Ann. Bot., 103, 9, 1445-1457 (2009)
[42] Wang, W., Identifiability of linear mixed effects models, Electr. J. Stat., 7, 244-263 (2013) · Zbl 1337.62182
[43] Wasserman, L., All of Statistics: A Concise Course in Statistical Inference (2004), Berlin: Springer, Berlin · Zbl 1053.62005
[44] Wyse, J.; Friel, N., Block clustering with collapsed latent block models, Stat. Comput., 22, 2, 415-428 (2012) · Zbl 1322.62046
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.