×

The analysis of distance of grouped data with categorical variables: categorical canonical variate analysis. (English) Zbl 1360.62370

Summary: We use generalised biplots to develop the important special case of (i) when all variables are categorical and (ii) the samples fall into \(K\) recognised groups. We term this Categorical Canonical Variate Analysis (CatCVA), because it has similar characteristics to Rao’s Canonical Variate Analysis (CVA), especially its visual aspects. It allows centroids of groups to be exhibited in increasing numbers of dimensions, together with information on within-group sample variation. Variables are represented by category-level-points (CLPs) which are a counterpart of numerically calibrated biplot axes for quantitative variables. Mechanisms are provided for relating the samples to their category levels, for giving convex regions to help predict categories, and for adding new samples. Inter-sample distance may be measured by any Euclidean embeddable distance. Computation is minimised by working in the \(K-1\) dimensional space containing the group centroids.
The methodology is illustrated by an example with three groups and 37 samples but the number of samples size is not a serious limitation. The visualisation of group structure is the main focus of this paper; computational efficiency is a bonus.

MSC:

62H99 Multivariate analysis
62H25 Factor analysis and principal components; correspondence analysis
62A09 Graphical methods in statistics
PDFBibTeX XMLCite
Full Text: DOI

References:

[1] Gardner, S.; Gower, J. C.; Le Roux, N. J., A synthesis of canonical variate analysis, generalised canonical correlation analysis and Procrustes analysis, Comput. Statist. Data Anal., 50, 107-134 (2006) · Zbl 1365.62223
[2] Gifi, A., Nonlinear Multivariate Analysis (1990), John Wiley & Sons, Ltd.: John Wiley & Sons, Ltd. Chichester · Zbl 0697.62048
[3] Gower, J. C., Adding a point to vector diagrams in multivariate analysis, Biometrika, 55, 582-585 (1968) · Zbl 0167.17802
[4] Gower, J. C., Generalised biplots, Biometrika, 79, 475-493 (1992) · Zbl 0775.62002
[5] Gower, J. C.; Hand, D. J., Biplots (1996), Chapman and Hall: Chapman and Hall London · Zbl 0867.62053
[6] Gower, J. C.; Harding, S. A., Prediction regions for categorical variables, (Blasius, J.; Greenacre, M. J., Vizualisation of Categorical Variables (1998), Academic Press: Academic Press London), 405-419
[7] Gower, J. C.; Legendre, P., Metric and Euclidean properties of dissimilarity coefficients, J. Classification, 3, 5-48 (1986) · Zbl 0592.62048
[8] Gower, J. C.; Le Roux, N. J.; Lubbe, S., The canonical analysis of distance, J. Classification, 31, 107-128 (2014) · Zbl 1360.62305
[9] Gower, J. C.; Lubbe, S.; Le Roux, N. J., Understanding Biplots (2011), John Wiley & Sons, Ltd: John Wiley & Sons, Ltd Chichester
[10] Rao, C. R., Advanced Statistical Methods in Biometric Research (1952), John Wiley & Sons, Inc.: John Wiley & Sons, Inc. New York · Zbl 0047.38601
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.