×

Pointed subspace approach to incomplete data. (English) Zbl 07223588

Summary: Incomplete data are often represented as vectors with filled missing attributes joined with flag vectors indicating missing components. In this paper, we generalize this approach and represent incomplete data as pointed affine subspaces. This allows to perform various affine transformations of data, such as whitening or dimensionality reduction. Moreover, this representation preserves the information, which coordinates were missing. To use our representation in practical classification tasks, we embed such generalized missing data into a vector space and define the scalar product of embedding space. Our representation is easy to implement, and can be used together with typical kernel methods. Performed experiments show that the application of SVM classifier on the proposed subspace approach obtains highly accurate results.

MSC:

62H30 Classification and discrimination; cluster analysis (statistical aspects)

Software:

MICE
PDFBibTeX XMLCite
Full Text: DOI arXiv

References:

[1] Aste, M.; Boninsegna, M.; Freno, A.; Trentin, E., Techniques for dealing with incomplete data: a tutorial and survey, Pattern Analysis and Applications, 18, 1, 1-29 (2015) · Zbl 1425.68364
[2] Azur, MJ; Stuart, EA; Frangakis, C.; Leaf, PJ, Multiple imputation by chained equations: what is it and how does it work?, International Journal of Methods in Psychiatric Research, 20, 1, 40-49 (2011)
[3] Berg, A.C., Berg, T.L., Malik, J. (2005). Shape matching and object recognition using low distortion correspondences. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (pp. 26-33). IEEE.
[4] Burke, LE; Dunbar-Jacob, JM; Hill, MN, Compliance with cardiovascular disease prevention strategies: a review of the research, Annals of Behavioral Medicine, 19, 3, 239-263 (1997)
[5] Buuren, S.; Groothuis-Oudshoorn, K., mice: multivariate imputation by chained equations in R, Journal of statistical software, 45, 3, 1-68 (2011)
[6] Chechik, G., Heitz, G., Elidan, G., Abbeel, P., Koller, D. (2007). Max-margin classification of incomplete data. In Advances in Neural Information Processing Systems (pp. 233-240). · Zbl 1225.68160
[7] Chechik, G.; Heitz, G.; Elidan, G.; Abbeel, P.; Koller, D., Max-margin classification of data with absent features, Journal of Machine Learning Research, 9, 1-21 (2008) · Zbl 1225.68160
[8] Conversano, C.; Siciliano, R., Incremental tree-based missing data imputation with lexicographic ordering, Journal of Classification, 26, 3, 361-379 (2009) · Zbl 1337.62128
[9] D’Ambrosio, A.; Aria, M.; Siciliano, R., Accurate tree-based missing data imputation and data fusion within the statistical learning paradigm, Journal of classification, 29, 1-32 (2012) · Zbl 1360.62324
[10] Dekel, O.; Shamir, O.; Xiao, L., Learning to classify with missing and corrupted features, Machine Learning, 81, 2, 149-178 (2010) · Zbl 1470.68095
[11] Dick, U., Haider, P., Scheffer, T. (2008). Learning from incomplete data with infinite imputations. In: Proceedings of the International Conference on Machine Learning (pp. 232-239). ACM.
[12] García-Laencina, PJ; Sancho-Gómez, J.; Figueiras-Vidal, AR, Pattern classification with missing data: a review, Neural Computing and Applications, 19, 2, 263-282 (2010)
[13] Ghahramani, Z., & Jordan, M.I. (1994). Supervised learning from incomplete data via an EM approach. In Advances in Neural Information Processing Systems (pp. 120-127). Citeseer.
[14] Globerson, A., & Roweis, S. (2006). Nightmare at test time: robust learning by feature deletion. In Proceedings of the International Conference on Machine Learning (pp. 353-360). ACM.
[15] Goldberg, A., Recht, B., Xu, J., Nowak, R., Zhu, X. (2010). Transduction with matrix completion: three birds with one stone. In Advances in neural information processing systems (pp. 757-765).
[16] Grangier, D, & Melvin, I. (2010). Feature set embedding for incomplete data. In Advances in Neural Information Processing Systems (pp. 793-801).
[17] Hazan, E., Livni, R., Mansour, Y. (2015). Classification with low rank and missing data. In Proceedings of The 32nd International Conference on Machine Learning (pp. 257-266).
[18] Liao, X., Li, H., Carin, L. (2007). Quadratically gated mixture of experts for incomplete data classification. In Proceedings of the International Conference on Machine Learning (pp. 553-560) ACM.
[19] Little, R.J. A., & Rubin, D.B. (2014). Statistical analysis with missing data. Wiley. · Zbl 0665.62004
[20] Liu, Z-G; Pan, Q.; Dezert, J.; Martin, A., Adaptive imputation of missing values for incomplete pattern classification, Pattern Recognition, 52, 85-95 (2016)
[21] McKnight, P.E., McKnight, K.M., Sidani, S., Figueredo, A.J. (2007). Missing data: a gentle introduction. Guilford Press.
[22] Pelckmans, K.; De Brabanter, J.; Suykens, JAK; De Moor, B., Handling missing values in support vector machine classifiers, Neural Networks, 18, 5, 684-692 (2005) · Zbl 1077.68777
[23] Schafer, J.L. (1997). Analysis of incomplete multivariate data. CRC Press. · Zbl 0997.62510
[24] Shivaswamy, PK; Bhattacharyya, C.; Smola, AJ, Second order cone programming approaches for handling missing and uncertain data, Journal of Machine Learning Research, 7, 1283-1314 (2006) · Zbl 1222.68305
[25] Smieja, M., Struski, L., Tabor, J. (2017). Generalized RBF kernel for incomplete data. arXiv:1612.01480.
[26] Smola, A.J., Vishwanathan, S.V.N., Hofmann, T. (2005). Kernel methods for missing variables. In Proceedings of the International Conference on Artificial Intelligence and Statistics. Citeseer.
[27] Stahura, FL; Bajorath, J., Virtual screening methods that complement HTS, Combinatorial Chemistry & High Throughput Screening, 7, 4, 259-269 (2004)
[28] Sulis, I.; Porcu, M., Handling missing data in item response theory. assessing the accuracy of a multiple imputation procedure based on latent class analysis, Journal of Classification, 2, 1-33 (2017) · Zbl 1373.62566
[29] Williams, D., Liao, X., Ya, X., Carin, L. (2005a). Incomplete-data classification using logistic regression. In Proceedings of the International Conference on Machine Learning (pp. 972-979). ACM.
[30] Williams, D., & Carin, L. (2005b). Analytical kernel matrix completion with incomplete multi-view data. In Proceedings of the ICML Workshop on Learning With Multiple Views.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.