×

Dimension reduction via principal variables. (English) Zbl 1452.62408

Summary: For many large-scale datasets it is necessary to reduce dimensionality to the point where further exploration and analysis can take place. Principal variables are a subset of the original variables and preserve, to some extent, the structure and information carried by the original variables. Dimension reduction using principal variables is considered and a novel algorithm for determining such principal variables is proposed. This method is tested and compared with 11 other variable selection methods from the literature in a simulation study and is shown to be highly effective. Extensions to this procedure are also developed, including a method to determine longitudinal principal variables for repeated measures data, and a technique for incorporating utilities in order to modify the selection process. The method is further illustrated with real datasets, including some larger UK data relating to patient outcome after total knee replacement.

MSC:

62H25 Factor analysis and principal components; correspondence analysis
62-08 Computational methods for problems pertaining to statistics
62P10 Applications of statistics to biology and medical sciences; meta analysis

Software:

R; fda (R)
PDFBibTeX XMLCite
Full Text: DOI Link

References:

[1] Al-Kandari, N. M.; Jolliffe, I. T., Variable selection and interpretation of covariance principal components, Comm. Statist.—Simul. Comput., 30, 2, 339-354 (2001) · Zbl 1008.62613
[2] Al-Kandari, N. M.; Jolliffe, I. T., Variable selection and interpretation in correlation principal components, Environmetrics, 16, 659-672 (2005)
[3] Beale, E. M.; Kendall, M. G.; Mann, D. W., The discarding of variables in multivariate analysis, Biometrika, 54, 357-366 (1967)
[4] Berkey, C. S.; Laird, N. M.; Valadian, I.; Gardner, J., Modelling adolescent blood pressure patterns and their prediction of adult pressures, Biometrics, 47, 3, 1005-1018 (1991)
[5] Bossert, W.; Weymark, J. A., Utility in social choice, (Barberá, S.; Hammond, P. J.; Seidl, C., Handbook of Utility Theory, Extensions, vol. 2 (2004), Kluwer Academic Publishers: Kluwer Academic Publishers Dordrecht), 1099-1177, (Chapter 20)
[6] Cadima, J.; Jolliffe, I. T., Loadings and correlations in the interpretation of principal components, J. Appl. Statist., 22, 2, 203-214 (1995)
[7] Cattell, R. B., The scree test for the number of factors, Multivariate Behavioral Res., 1, 245-276 (1966)
[8] Cumming, J.A., 2006. Clinical decision support, Ph.D. Thesis, Durham University.; Cumming, J.A., 2006. Clinical decision support, Ph.D. Thesis, Durham University.
[9] Drineas, P., Kannan, R., Mahoney, M.W., 2004. Fast Monte Carlo algorithms for matrices III: computing a compressed approximate matrix decomposition, Technical Report YALEU/DCS/TR-1271, Yale University.; Drineas, P., Kannan, R., Mahoney, M.W., 2004. Fast Monte Carlo algorithms for matrices III: computing a compressed approximate matrix decomposition, Technical Report YALEU/DCS/TR-1271, Yale University. · Zbl 1111.68149
[10] de Falguerolles, A.; Jmel, S., Un critère de choix de variables en analyse en composantes principales fondé sur des modèles graphiques gaussiens particuliers, Canad. J. Statist., 21, 3, 239-256 (1993) · Zbl 0785.62062
[11] Friendly, M., Corrgrams: exploratory displays for correlation matrices, Amer. Statist., 56, 4, 316-324 (2002)
[12] Frontier, S., Étude de la decroissance des valeurs propres dans une analyze en composantes principales: comparison avec le modèle de baton brisé, J. Experimental Marine Biology and Ecology, 25, 341-347 (1976)
[13] Jeffers, J. N.R., Two case studies in the application of principal component analysis, Appl. Statist., 16, 225-236 (1967)
[14] Jolliffe, I. T., Discarding variables in principal component analysis. I: Artificial data, Appl. Statist., 21, 2, 160-173 (1972)
[15] Jolliffe, I. T., Discarding variables in principal component analysis. II: Real data, Appl. Statist., 22, 1, 21-31 (1973)
[16] Jolliffe, I. T., Principal Component Analysis (2002), Springer: Springer New York · Zbl 1011.62064
[17] Kaiser, H. F., The application of electronic computers to factor analysis, Educational and Psychological Measurement, 20, 141-151 (1960)
[18] Krzanowski, W. J., Selection of variables to preserve multivariate data structure, using principal components, Appl. Statist., 36, 1, 22-33 (1987)
[19] Krzanowski, W. J.; Marriott, F. H.C., Multivariate Analysis I: Distributions, ordination and inference. Kendall’s Library of Statistics, vol. I (1994), Arnold Publishers · Zbl 0855.62036
[20] McCabe, G. P., Principal variables, Technometrics, 26, 2, 137-144 (1984) · Zbl 0548.62037
[21] McCaskie, A. W.; Deehan, D. J.; Green, T. P.; Lock, K. R.; Thompson, J. R.; Harper, W. M.; Gregg, P. J., Randomised, prospective study comparing cemented and cementless total knee replacement: results of press-fit condylar total knee replacement at five years, J. Bone Joint Surgery British Volume, 80, 6, 971-975 (1998)
[22] Okamoto, M., Optimality of principal components, (Krishnaiah, P. R., Multivariate analysis II (1969), Academic Press: Academic Press New York), 673-685
[23] Peres-Neto, P. R.; Jackson, D. A.; Somers, K. M., How many principal components? Stopping rules for determining the number of non-trivial axes revisited, Comput. Statist. Data Anal., 49, 4, 974-997 (2005) · Zbl 1429.62223
[24] Prvan, T.; Bowman, A. W., Nonparametric time dependent principal components analysis, Austral. New Zealand Industrial Appl. Math. J., 44, C627-C643 (2003) · Zbl 1078.65526
[25] Ramsay, J. O.; Silverman, B. W., Applied Functional Data Analysis: Methods and Case Studies (2002), Springer: Springer New York · Zbl 1011.62002
[26] R Development Core Team, 2005. R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria.; R Development Core Team, 2005. R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria.
[27] Velicer, W. F., Determining the number of components from a matrix of partial correlations, Psychometrika, 41, 321-327 (1976) · Zbl 0336.62041
[28] Whittaker, J., Graphical Models In Applied Mathematical Multivariate Statistics (1990), Wiley: Wiley Chichester · Zbl 0732.62056
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.