Robust diagnostic regression analysis.

*(English)*Zbl 0964.62063
Springer Series in Statistics. New York, NY: Springer. xvi, 327 p. (2000).

This book presents highly informative graphical methods to understand how a fitted regression model depends on individual observations and/or on groups of observations. The approach taken is a combination of robust methods, data diagnostics and computer graphics. The book is intended to be accessible to students and practitioners, and at the same time provides useful material for professional statisticians. Elementary outline of the necessary theory is provided as appropriate. The book is more or less self-contained. It appears to be suitable as a text book for a second course in data analysis and regression modelling.

The main idea underlying the methods presented in the book is the following. Any given data set may be viewed as consisting of a subset of data that is more or less conform to the model being fitted and the remaining data that do not conform to the model are outlying observations. Conventional model fitting uses all of the data in obtaining a fit. This results in masking of the effects of individual data values in determining the fit and the actual structure present in the data is often left unrevealed. Rather than the conventional approach, the authors propose to use a Forward Analysis or Forward Search, which consists of robust fitting of the model to an appropriately chosen subset of the data, calculating residuals corresponding to all the data values based on this fit, ordering the data according to the magnitudes of these residuals, and then enlarging the subset of data and refitting the model by successively including data points in the order determined by their residuals. The entire process is monitored graphically by examining plots of various key diagnostic statistics and model inadequacies are identified by visual examination of the plots. Throughout, it is assumed that the response is univariate and that the errors are independently distributed.

The book has six chapters, an appendix containing most of the data sets used in the book, a bibliography section, an author index and a subject index. There are several exercises included in each chapter. At the end of each chapter, solutions to these exercises are given.

Chapter 1 introduces the idea of forward search in regression using three examples and shows how single and multiple outliers can be identified in a multiple regression analysis. Chapter 2 provides an outline of the theory of regression, develops regression diagnostic statistics, and describes the method of forward search and its properties. In Chapter 3, four more example data sets are analyzed in detail. The discussions include choice of a suitable linear model as well as transformation of the response variable. Chapter 4 is devoted to a discussion of transformation of the response variable to normality. This chapter includes both theory and illustrative examples. Transformation of both the dependent variables and the predictor variables is illustrated via an example. Applications of the forward search approach to nonlinear models and generalized linear models are the topics of Chapter 5 and Chapter 6, respectively. In particular, it is shown how the forward analysis may be applied to contingency tables and to binary data.

An S-Plus library of functions for implementing many of the methods presented in the book is available from http://stat.econ.unipr.it/riani/ar. This makes it very easy for anyone to apply the ideas in the book to conduct exploratory data analysis on any regression data set. By providing a systematic approach for carrying out the forward search analysis and the software needed to implement the methods, the authors have made an important contribution to the field of regression modelling and exploratory data analysis.

The main idea underlying the methods presented in the book is the following. Any given data set may be viewed as consisting of a subset of data that is more or less conform to the model being fitted and the remaining data that do not conform to the model are outlying observations. Conventional model fitting uses all of the data in obtaining a fit. This results in masking of the effects of individual data values in determining the fit and the actual structure present in the data is often left unrevealed. Rather than the conventional approach, the authors propose to use a Forward Analysis or Forward Search, which consists of robust fitting of the model to an appropriately chosen subset of the data, calculating residuals corresponding to all the data values based on this fit, ordering the data according to the magnitudes of these residuals, and then enlarging the subset of data and refitting the model by successively including data points in the order determined by their residuals. The entire process is monitored graphically by examining plots of various key diagnostic statistics and model inadequacies are identified by visual examination of the plots. Throughout, it is assumed that the response is univariate and that the errors are independently distributed.

The book has six chapters, an appendix containing most of the data sets used in the book, a bibliography section, an author index and a subject index. There are several exercises included in each chapter. At the end of each chapter, solutions to these exercises are given.

Chapter 1 introduces the idea of forward search in regression using three examples and shows how single and multiple outliers can be identified in a multiple regression analysis. Chapter 2 provides an outline of the theory of regression, develops regression diagnostic statistics, and describes the method of forward search and its properties. In Chapter 3, four more example data sets are analyzed in detail. The discussions include choice of a suitable linear model as well as transformation of the response variable. Chapter 4 is devoted to a discussion of transformation of the response variable to normality. This chapter includes both theory and illustrative examples. Transformation of both the dependent variables and the predictor variables is illustrated via an example. Applications of the forward search approach to nonlinear models and generalized linear models are the topics of Chapter 5 and Chapter 6, respectively. In particular, it is shown how the forward analysis may be applied to contingency tables and to binary data.

An S-Plus library of functions for implementing many of the methods presented in the book is available from http://stat.econ.unipr.it/riani/ar. This makes it very easy for anyone to apply the ideas in the book to conduct exploratory data analysis on any regression data set. By providing a systematic approach for carrying out the forward search analysis and the software needed to implement the methods, the authors have made an important contribution to the field of regression modelling and exploratory data analysis.

Reviewer: Hariharan Iyer (Fort Collins)

##### MSC:

62-01 | Introductory exposition (textbooks, tutorial papers, etc.) pertaining to statistics |

62J20 | Diagnostics, and linear inference and regression |

62-02 | Research exposition (monographs, survey articles) pertaining to statistics |

62-09 | Graphical methods in statistics (MSC2010) |

62-07 | Data analysis (statistics) (MSC2010) |