Projection pursuit exploratory data analysis.

*(English)*Zbl 0875.62206Summary: Posse (1990) presented a projection pursuit technique, based on a global optimization algorithm and on a chi-squared projection index, for finding the plane in which the data are the most interesting. This paper extends and improves this algorithm providing an exploratory data analysis by projection pursuit that has important advantages over its competitors. The global optimization algorithm, when combined with a structure removal procedure due to Friedman (1987), allows a sequential identification of interesting bidimensional views of decreasing importance. The modified chi-squared index satisfies the five basic demands for a projection index. It is (1) uniquely minimized at the bivariate normal distribution, (2) approximately affine invariant, (3) consistent, (4) resistant to features in the tail of the distribution and, (5) simple enough to permit quick computation even for large data sets. The paper gives simple rules for judging the significance of a structure found by this algorithm. These rules define a stopping criterion for the search process. They are based on theoretical (asymptotic) arguments and are well-supported by simulations. The efficacy of the new algorithm is illustrated through several studies of real and simulated data.

##### MSC:

62H10 | Multivariate distribution of statistics |

62-07 | Data analysis (statistics) (MSC2010) |

##### Keywords:

Cluster analysis; Exploratory data analysis; Global optimization; Invariant chi-squared test statistics; p-values for projections; Projection pursuit; Two-dimensional projections
PDF
BibTeX
XML
Cite

\textit{C. Posse}, Comput. Stat. Data Anal. 20, No. 6, 669--687 (1995; Zbl 0875.62206)

Full Text:
DOI

##### References:

[1] | Asimov, D.: The grand tour: A tool for viewing multidimensional data. SIAM J. Sci. statist. Comput 6, 128-143 (1985) · Zbl 0552.62052 |

[2] | Cook, D.: Grand tour and projection pursuit: exploring multivariate data using projections. Ph.d. dissertation (1993) |

[3] | De Bruijn, N. G.: Asymptotics methods in analysis. (1961) · Zbl 0098.26404 |

[4] | Diaconis, P.; Freedman, D.: Asymptotics of graphical projection pursuit. Ann. statist. 12, 793-815 (1984) · Zbl 0559.62002 |

[5] | Flury, B.; Riedwyl, H.: Multivariate statistics: A practical approach. (1988) · Zbl 0495.62057 |

[6] | Friedman, J. H.: Exploratory projection pursuit. J. amer. Statist. assoc. 82, 249-266 (1987) · Zbl 0664.62060 |

[7] | Friedman, J. H.; Tukey, J. W.: A projection pursuit algorithm for exploratory data analysis. IEEE trans. Comput. C 23, 881-889 (1974) · Zbl 0284.68079 |

[8] | Glick, N.: Consistency condition for probability estimators and integrals of density estimators. Utilitas math. 6, 61-74 (1974) · Zbl 0295.62045 |

[9] | Glover, D. M.; Hopke, P. K.: Exploration of multivariate chemical data by projection pursuit. Chem. intel. Lab. syst. 16, 45-59 (1992) |

[10] | Hall, P.: Estimating the direction in which a data set is most interesting. Probab. theory related fields 80, 51-77 (1988) · Zbl 0637.62037 |

[11] | Hall, P.: On polynomial-based projection indices for exploratory projection pursuit. Ann. statist. 17, 589-605 (1989) · Zbl 0717.62051 |

[12] | Hampel, F. R.: The influence curve and its role in robust estimation. J. amer. Statist. assoc. 62, 1179-1186 (1974) · Zbl 0305.62031 |

[13] | Huber, P. J.: Projection pursuit (with discussion). Ann. statist. 13, 435-475 (1985) · Zbl 0595.62059 |

[14] | Huber, P. J.: Data analysis and projection pursuit. Technical report PJH-90-1 (1990) |

[15] | Huber, P. J.: Algorithms for projection pursuit. Technical report PJH-90-3 (1990) |

[16] | Hurley, C.; Buja, A.: Analyzing high-dimensional data with motion graphics. SIAM J. Sci. statist. Comput. 11, 1193-1211 (1990) · Zbl 0705.68099 |

[17] | Jones, M. C.; Sibson, R.: What is projection pursuit (with discussion)?. J. roy. Statist. soc. A 150, 1-38 (1987) · Zbl 0632.62059 |

[18] | Malkovich, J. F.; Afifi, A. A.: On tests for multivariate normality. J. amer. Statist. assoc. 68, 176-179 (1953) |

[19] | Nason, G. F.; Sibson, R.: Measuring multimodality. Statist. comput. 2, 153-160 (1992) |

[20] | Posse, C.: An effective two-dimensional projection pursuit algorithm. Comm. statist. Simulation comput. 19, 1143-1164 (1990) · Zbl 0850.62482 |

[21] | Posse, C.: Tools for two-dimensional exploratory projection pursuit. J. comput. Graph. statist. 4, 83-100 (1995) |

[22] | Rao, R. Ranga: Relations between weak and uniform convergence of measures with applications. Ann. math. Statist. 33, 659-680 (1962) · Zbl 0117.28602 |

[23] | Sun, J.: Significance levels in exploratory projection pursuit. Biometrika 78, 759-769 (1991) · Zbl 0753.62067 |

[24] | Sun, J.: Tail probabilities of the maxima of Gaussian random fields. Ann. probab. 21, 34-71 (1993) · Zbl 0772.60038 |

[25] | Tukey, P. A.; Tukey, J. W.: Graphical display of data in three and higher dimensions. Interpreting multivariate data (1981) · Zbl 0527.62003 |

[26] | Woo, T. L.; Morant, G. M.: A biometric study of the ”flatness” of the facial skeleton in man. Biometrika 26, 196-250 (1934) |

This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.