New robust dynamic plots for regression mixture detection. (English) Zbl 1306.62079

Summary: The forward search is a powerful general method for detecting multiple masked outliers and for determining their effect on inferences about models fitted to data. From the monitoring of a series of statistics based on subsets of data of increasing size we obtain multiple views of any hidden structure. One of the problems of the forward search has always been the lack of an automatic link among the great variety of plots which are monitored. Usually it happens that a lot of interesting features emerge unexpectedly during the progression of the forward search only when a specific combination of forward plots is inspected at the same time. Thus, the analyst should be able to interact with the plots and redefine or refine the links among them. In the absence of dynamic linking and interaction tools, the analyst risks to miss relevant hidden information. In this paper we fill this gap and provide the user with a set of new robust graphical tools whose power will be demonstrated on several regression problems. Through the analysis of real and simulated data we give a series of examples where dynamic interaction with different “robust plots” is used to highlight the presence of groups of outliers and regression mixtures and appraise the effect that these hidden groups exert on the fitted model.


62F35 Robustness and adaptive procedures (parametric inference)
62J05 Linear regression; mixed models
62J20 Diagnostics, and linear inference and regression
62-07 Data analysis (statistics) (MSC2010)
62A09 Graphical methods in statistics


Full Text: DOI


[1] Atkinson AC, Riani M (2000) Robust diagnostic regression analysis. Springer, New York · Zbl 0964.62063
[2] Atkinson AC, Riani M (2002) Forward search added variable t tests and the effect of masked outliers on model selection. Biometrika 89: 939–946 · Zbl 1034.62045
[3] Atkinson AC, Riani M, Cerioli A (2004) Exploring multivariate data with the forward search. Springer, New York · Zbl 1049.62057
[4] Buja A, Cook D, Asimov D, Hurley C (2009) Theory of dynamic projections in high-dimensional data visualization. Electron J Stat
[5] Chen C, Härdle W, Unwin A (eds) (2008) Handbook of data visualization, vol XIV of springer handbooks of computational statistics. Springer, Berlin
[6] Friendly M (2005) Milestones in the history of data visualization: a case study in statistical historiography. In: Weihs C, Gaul W (eds) Classification: the ubiquitous challenge. Springer, New York, pp 34–52
[7] Martinez WL, Martinez AR (2004) exploratory data analysis with MATLAB. Computer science and data analysis series. Chapman & Hall/CRC, London
[8] Perrotta D, Torti F (2009) Detecting price outliers in European trade data with the forward search. In: Data analysis and classification: from exploration to confirmation, studies in classification, data analysis, and knowledge organization. Springer, Berlin (Forecoming)
[9] Riani M, Atkinson AC (2007) Fast calibrations of the forward search for testing multiple outliers in regression. Adv Data Anal Classif 1: 123–141. doi: 10.1007/s11634-007-0007-y · Zbl 1301.62069
[10] Riani M, Atkinson AC, Cerioli A (2009) Finding an unknown number of multivariate outliers. J Royal Stat Soc Ser B 71: 201–221 · Zbl 1248.62091
[11] Riani M, Cerioli A, Atkinson A, Perrotta D, Torti F (2008) Fitting mixtures of regression lines with the forward search. In: Fogelman-Soulie F, Perrotta D, Piskorski J, Steinberger R (eds) Mining massive data sets for security. IOS Press, Amsterdam, pp 271–286
[12] Rousseeuw PJ (1984) Least median of squares regression. J Am Stat Assoc 79: 871–880 · Zbl 0547.62046
[13] Spence R (2001) Information visualization. Addison Wesley, California
[14] Tufte ER (1983) The visual display of quantitative information. Graphics Press, Cheshire
[15] Wilhelm A (2008) Linked views for visual exploration, vol XIV. Chen, Härdle, and Unwin, pp 199–215 · Zbl 1145.68570
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.