zbMATH — the first resource for mathematics

Interpretable dimension reduction. (English) Zbl 1121.62347
Summary: The analysis of high-dimensional data often begins with the identification of lower dimensional subspaces. Principal component analysis is a dimension reduction technique that identifies linear combinations of variables along which most variation occurs or which best ”reconstruct” the original variables. For example, many temperature readings may be taken in a production process when in fact there are just a few underlying variables driving the process. A problem with principal components is that the linear combinations can seem quite arbitrary. To make them more interpretable, we introduce two classes of constraints. In the first, coefficients are constrained to equal a small number of values (homogeneity constraint). The second constraint attempts to set as many coefficients to zero as possible (sparsity constraint). The resultant interpretable directions are either calculated to be close to the original principal component directions, or calculated in a stepwise manner that may make the components more orthogonal. A small dataset on characteristics of cars is used to introduce the techniques. A more substantial data mining application is also given, illustrating the ability of the procedure to scale to a very large number of variables.

62-XX Statistics
Full Text: DOI
[1] DOI: 10.1080/757584614 · doi:10.1080/757584614
[2] DOI: 10.2307/1267601 · Zbl 0294.62079 · doi:10.2307/1267601
[3] Hausman R. E., Optimisation in Statistics pp 137– (1982)
[4] DOI: 10.2307/2347688 · Zbl 04551723 · doi:10.2307/2347688
[5] DOI: 10.2307/1391088 · doi:10.2307/1391088
[6] DOI: 10.1198/1061860032148 · doi:10.1198/1061860032148
[7] Lock R. H., Journal of Statistics Education pp 1– (1993)
[8] Rao C. R., Linear Statistical Inference and Its Applications (1965) · Zbl 0137.36203
[9] Tibshirani R., Journal of the Royal Statistical Society: Series B 58 pp 267– (1996)
[10] DOI: 10.1111/1467-9876.00204 · Zbl 0965.62052 · doi:10.1111/1467-9876.00204
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.