×

A pure \(L_1\)-norm principal component analysis. (English) Zbl 1349.62247

Summary: The \(L_{1}\) norm has been applied in numerous variations of principal component analysis (PCA). An \(L_{1}\)-norm PCA is an attractive alternative to traditional \(L_{2}\)-based PCA because it can impart robustness in the presence of outliers and is indicated for models where standard Gaussian assumptions about the noise may not apply. Of all the previously-proposed PCA schemes that recast PCA as an optimization problem involving the \(L_{1}\) norm, none provide globally optimal solutions in polynomial time. This paper proposes an \(L_{1}\)-norm PCA procedure based on the efficient calculation of the optimal solution of the \(L_{1}\)-norm best-fit hyperplane problem. We present a procedure called \(L_1\)-PCA\(^\ast\) based on the application of this idea that fits data to subspaces of successively smaller dimension. The procedure is implemented and tested on a diverse problem suite. Our tests show that \(L_1\)-PCA\(^\ast\) is the indicated procedure in the presence of unbalanced outlier contamination.

MSC:

62H25 Factor analysis and principal components; correspondence analysis
62G35 Nonparametric robustness

Software:

pcaPP; pcaL1; R; CPLEX; LAPACK
PDFBibTeX XMLCite
Full Text: DOI

References:

[1] Agarwal, S.; Chandraker, M. K.; Kahl, F.; Kriegman, D.; Belongie, S., Practical global optimization for multiview geometry, Lecture Notes in Computer Science, 3951, 592-605 (2006)
[2] Anderson, E.; Bai, Z.; Bischof, C.; Blackford, S.; Demmel, J.; Dongarra, J.; Du Croz, J.; Greenbaum, A.; Hammarling, S.; McKenney, A.; Sorensen, D., LAPACK Users’ Guide (1999), Society for Industrial and Applied Mathematics: Society for Industrial and Applied Mathematics Philadelpha, PA · Zbl 0934.65030
[3] Appa, G.; Smith, C., On \(L_1\) and Chebyshev estimation, Mathematical Programming, 5, 73-87 (1973) · Zbl 0271.41021
[5] Brooks, J. P.; Dula, J. H., The \(L_1\)-norm best-fit hyperplane problem, Applied Mathematics Letters, 26, 51-55 (2013) · Zbl 1255.65111
[6] Charnes, A.; Cooper, W. W.; Ferguson, R. O., Optimal estimation of executive compensation by linear programming, Management Science, 1, 138-150 (1955) · Zbl 0995.90590
[7] Choulakian, V., \(L_1\)-norm projection pursuit principal component analysis, Computational Statistics and Data Analysis, 50, 1441-1451 (2006) · Zbl 1445.62130
[8] Croux, C.; Ruiz-Gazen, A., High breakdown estimators for prinicpal components: the projection-pursuit approach revisited, Journal of Multivariate Analysis, 95, 206-226 (2005) · Zbl 1065.62040
[9] Daudin, J. J.; Duby, C.; Trecourt, P., Stability of principal component analysis studied by the bootstrap method, Statistics, 19, 241-258 (1988) · Zbl 0643.62043
[12] Galpin, J. S.; Hawkins, D. M., Methods of \(L_1\) estimation of a covariance matrix, Computational Statistics and Data Analysis, 5, 305-319 (1987) · Zbl 0623.62043
[13] Gao, J., Robust L1 principal component analysis and its Bayesian variational inference, Neural Computation, 20, 555-572 (2008) · Zbl 1132.62048
[14] ILOG, ILOG CPLEX Division. 889 Alder Avenue (2009), Incline Village: Incline Village Nevada
[15] Jolliffe, I. T., Principal Component Analysis (2002), Springer · Zbl 1011.62064
[18] Kier, L. B.; Seybold, P. G.; Cheng, C.-K., Cellular Automata Modeling of Chemical Systems: A Textbook and Laboratory Manual (2005), Springer
[19] Kwak, N., Principal component analysis based on \(L 1\)-norm maximization, IEEE Transactions on Pattern Analysis and Machine Intelligence, 30, 1672-1680 (2008)
[20] Li, G.; Chen, Z., Projection-pursuit approach to robust distpersion matrices and principal components: primary theory and Monte Carlo, Journal of the American Statistical Association, 80, 759-766 (1985) · Zbl 0595.62060
[21] Mangasarian, O. L., Arbitrary-norm separating plane, Operations Research Letters, 24, 15-23 (1999) · Zbl 1028.90037
[22] Martini, H.; Schöbel, A., Median hyperplanes in normed spaces—a survey, Discrete Applied Mathematics, 89, 181-195 (1998) · Zbl 0921.90109
[23] McDonald, G. C.; Schwing, R. C., Instabilities of regression estimates relating air pollution to mortality, Technometrics, 15, 463-481 (1973)
[25] Späth, H.; Watson, G. A., On orthogonal linear \(l_1\) approximation, Numerische Mathematik, 51, 531-543 (1987) · Zbl 0633.65010
[26] Wagner, H. M., Linear programming techniques for regression analysis, Journal of the American Statistical Association, 54, 206-212 (1959) · Zbl 0088.35702
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.