Categorical multiblock linear discriminant analysis. (English) Zbl 07479875

Summary: Techniques of credit scoring have been developed these last years in order to reduce the risk taken by banks and financial institutions in the loans that they are granting. Credit Scoring is a classification problem of individuals in one of the two following groups: defaulting borrowers or non-defaulting borrowers. The aim of this paper is to propose a new method of discrimination when the dependent variable is categorical and when a large number of categorical explanatory variables are retained. This method, Categorical Multiblock Linear Discriminant Analysis, computes components which take into account both relationships between explanatory categorical variables and canonical correlation between each explanatory categorical variable and the dependent variable. A comparison with three other techniques and an application on credit scoring data are provided.


62Pxx Applications of statistics
Full Text: DOI


[1] Abdi, H., Barycentric Correspondence Analysis (BADIA), Encyclopedia of Measurements and Statistics, Sage, Thousands Oaks, CA, 2007, pp. 1-10.
[2] Anderson, J., Neurocomputing: Foundations of Research (1988), MIT Press: MIT Press, Cambridge
[3] Bastien, Ph.; Vinzi, V.; Tenenhaus, M., PLS generalised linear regression, Comput. Statist. Data Anal., 48, 17-46 (2005) · Zbl 1429.62316
[4] Bougeard, S.; Qannari, E. M.; Chauvin, C., Multiblock method for categorical variables. Application to the study of antibiotic resistance, Proc. COMPSTAT’ 2010, 133, 389-396 (2010) · Zbl 1436.62241
[5] Bougeard, S.; Qannari, E. M.; Fablet, C., Extension de l’ACPVI multibloc à la discrimination qualitative. Application en épidémiologie, Rev. MODULAD, 41, 1-16 (2011)
[6] Bougeard, S.; Qannari, E. M.; Rose, N., Multiblock redundancy analysis: interpretation tools and application in epidemiology, J. Chemometr., 25, 467-475 (2011)
[7] Breiman, L.; Friedmnan, J. L.; Ohlsen, R. A.; Stone, C. J., Classification and Regression Trees (1984), Chapman and Hall/CRC: Chapman and Hall/CRC, London
[8] Carroll, J. D., Generalization of canonical correlation analysis to three or more sets of variables, Proc. 76th Annu. Conven. Amer. Psychol. Assoc., 3, 227-228 (1968)
[9] Casin, P., L’analyse factorielle discriminante de tableaux multiples, J. Soc.Francaise Statist., 156, 1-20 (2015) · Zbl 1341.62173
[10] Efron, B., The efficiency of the logistic regression compared to normal discriminant analysis, J. Amer. Statist. Assoc., 70, 892-898 (1975) · Zbl 0319.62039
[11] Eslami, A.; Qannari, E. M.; Kohler, A., Multivariate analysis of multiblock and multigroup data, Chemometr. Intell. Lab., 133, 63-69 (2014)
[12] Fisher, R. A., The use of multiple measurements in taxonomics problems, Ann. Eugen., 7, 179-188 (1936)
[13] Gilbert, E. S., On discrimination using qualitative variables, J. Amer. Statist. Assoc., 63, 1399-1412 (1968)
[14] Hubert, L.; Arabie, P., Comparing partitions, J. Classif., 93-218 (1985)
[15] Jollife, I. T., Principal Component Analysis (2002), Springer: Springer, New York, NY
[16] Keller, J. S.; Grey, M. R.; Givens, J. A., A fuzzy K-nearest neighbor algorithm, IEEE Trans. Syst., Man. Cybern., 15, 580-585 (1985)
[17] Leclerc, A., L’analyse des correspondances sur juxtaposition de tableaux de contingence, Rev. Statist. Appl., 23, 5-16 (1975)
[18] Liberati, C.; Camillo, F.; Saporta, G., Advances in credit scoring: Combining performance and interpretation in Kernel discriminant analysis, Adv. Data Anal. Class., 11, 121-138 (2015) · Zbl 1414.62421
[19] Lo, A., Logit versus discriminant analysis, J. Econom., 31, 151-178 (1986)
[20] Maddala, G. S., Limited Dependent and Qualitative Variables in Econometrics (1983), Cambridge University Press: Cambridge University Press, New York · Zbl 0527.62098
[21] Mc Fadden, D., Conditional logit analysis of qualitative choice behavior, in Frontiers in Econometrics, 105-142 (1973), Academic Press: Academic Press, New York
[22] Meyer, N.; Maumy-Bertrand, M.; Bertrand, F., Comparaison de variantes de régressions logistiques PLS et de régression PLS sur variables qualitatives: application aux données d’alléotypage, J. Soc. Francaise Statist., 151, 1-18 (2010) · Zbl 1316.62159
[23] Quinlan, J. R., Induction of decision trees, Mach. Learn., 1, 81-106 (1986)
[24] Rand, W. M., Objective criteria for the evaluation of clustering methods, J. Amer. Statist. Assoc., 66, 846-850 (1971)
[25] Sabatier, R.; Vivien, M.; Reynes, C., Une nouvelle proposition, l’analyse discriminante Multitableaux: STATIS-LDA, J. Soc. Francaise Statist., 154, 31-43 (2013) · Zbl 1316.62093
[26] Saporta, G., Liaison entre plusieurs ensembles de variables et codage de variables qualitative, These, Université de Paris VI, 1976.
[27] Saporta, G.; Keita, N., Multiple Correspondence Analysis and Related Methods (2006), Chapman and Hall: Chapman and Hall, London
[28] Tenenhaus, A.; Tenenhaus, M., Regularized generalized canonical correlation analysis for multiblock or multigroup data analysis, Eur. J. Oper. Res., 238, 391-403 (2014) · Zbl 1341.62160
[29] Tuffery, S., Data Mining and Statistics for Decision Making (2011), John Wiley and Sons: John Wiley and Sons, Chichester · Zbl 1216.62005
[30] Wold, H., Soft modelling by latent variables; the nonlinear iterative partial least squares approach, Perspectives in Probability and Statistics. Papers in honour of M.S. Barlett, Academic Press, 1975, pp. 117-142.
[31] Wold, S., PLS-regression: A basic tool of chemometrics, Chemometr. Intell. Lab. Syst., 58, 1-10 (2001)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.