Practical analysis of extreme values.(English)Zbl 0888.62003

Leuven: Leuven University Press. vi, 170 p. (1996).
In the last half century the theory of extremes has developed into a self-contained discipline of high level, see, e.g., the reviewer, “The asymptotic theory of extreme order statistics”. 2nd ed. (1987; Zbl 0634.62044). However, the methodology for evaluating statistical data from distributions related to extremes remains far behind the theory. The understandable reason for such a deviation of theory and practice lies in the nature of these subject matters: the theory of extremes is an asymptotic theory while real data come from a variety of distributions for which the asymptotic theory to be valid requires in most cases a very large sample size. Another problem, not present in other branches of statistics, is the fact that only a relatively small part of the observations are relevant for the extremes which is an additional reason for the need of a very large sample size. When one ignores these impositions on the sample size and uses statistical methods nonetheless, one can arrive at conclusions from real data which are either contradictory to what one can expect or just unacceptable to practitioners.
The book under review is an attempt for closing the gap between theory and practice. A number of techniques are presented for estimating critical parameters, and graphical techniques are developed for making judgements on the upper tail of the population distribution from empirical distributions. Several real data evaluations accompany the recommended methodology. The data are taken from the insurance industry and from recorded daily high winds at several locations in the USA. The authors do not aim at covering the existing literature, hence only a few references will be added here which are essential for placing the book in proper prospective.
First some notation. Let $$X_1,X_2,\dots,X_n$$ be independent observations on a random quantity $$X$$ with distribution function $$F(z)$$. The quantity $$X$$ itself is assumed to be an extreme of some other quantities, such as high wind, flood levels of a river, high insurance claims, and others. Hence, $$F(z)$$ is close to, but not equal to a classical extreme value distribution function for the maximum, which is known from extreme value theory to be a member of the family $$H_c(u) = \exp[-(1+cu)^{-1/c}]$$ with some number $$c$$, where $$1+cu>0$$ and, for $$c=0$$, $$H_0(u)$$ is the limit of $$H_c(u)$$ as $$u\to 0$$ with $$u\neq 0$$. Hence, $$H_0(u)= \exp(-e^{-u})$$. In all distributions, one of course has the additional two parameters $$A$$ and $$B>0$$ with $$(u-A)/B$$ for $$u$$. One should note that the limitation of the domain of $$H_c(u)$$ entails that $$X$$ is bounded for $$c<0$$, while unbounded for the other cases. Consequently, even though $$H_c$$ changes very little with small changes in $$c$$, the difference between $$H_0$$ and $$H_{-0.05}$$, say, is very significant. Yet, no statistical method so far proved to be sensitive enough to distinguish the mentioned two cases.
Now, from the observations $$X_j$$ with $$F$$ unknown, we would like to estimate $$c$$. From what we have said previously it follows that if we rearrange the $$X_j$$ into an increasing order $$X_{1:n}\leq X_{2:n}\leq\dots\leq X_{n:n}$$, only the top observations $$X_{n-r:n}$$, $$0\leq r\leq r_0$$, can be utilized for $$c$$. In particular, with $$r=pn$$, $$0<p<1$$, $$X_{n-r:n}$$ is normal for most $$F$$, and thus it has no relevance either for $$F$$ or the appropriate $$H_c$$. In other words, whatever estimator one chooses, $$r_0/n$$ must tend to zero and $$n$$ must tend to infinity in order to get $$c$$, at least as a limit. The book proposes several variants of the Hill estimator $\widehat{c}=(1/r_0) \sum_{k=1}^{r_0}\log X_{n-k+1:n}-\log X_{n-r_0:n}$ for estimating $$c$$, and several optimal rules are given to determine $$r_0$$. These optimal rules are usually related to fitting ‘the best line’ to the upper tail of the empirical distribution function which is drawn on a probability paper, usually Pareto scaled. Following such a program, the authors come up with $$r_0=1795$$ in a sample of size $$n=6939$$ for estimating $$c$$ for a high wind data set (see pp. 82-88). The value obtained for $$c$$ becomes completely unreliable because of the high value of $$r_0$$, and indeed, it contradicts the values obtained by Simiu and Heckert [J. Struct. Eng. 122, No. 5, 539-547 (1996)].
A more reliable estimate can be obtained if one starts by fitting two straight lines to the upper tail of the empirical distribution function, one to the values of $$X_{n-r:n}$$, $$0\leq r\leq r_0/2$$, and the other to $$X_{n-r:n}$$, $$r_0/2\leq r\leq r_0$$, and from the angle between these two lines one first determines whether $$c>0$$, $$c=0$$ or $$c<0$$, and then proceeds to estimation of values. This method is known as the curvature method, developed by E. Castillo, the reviewer and J. M. Sarabia [Lect. Notes Stat. 51, 181-190 (1989; Zbl 0672.62035)], and presented in detail in the book by E. Castillo, Extreme value theory in engineering. (1988; Zbl 0657.62004); Chapter 6 is the most relevant part.
In spite of the seemingly critical remark in the preceding paragraph, the reviewer finds the book a valuable contribution to practical extreme value evaluations. However, a bit more care is to be taken with the choice of $$r_0$$, and Castillo’s book is recommended for a combination of techniques. In particular, Castillo’s comments at the end of the several examples are very valuable in order to understand the reliability of the results he recommends.

MSC:

 62-01 Introductory exposition (textbooks, tutorial papers, etc.) pertaining to statistics 62G30 Order statistics; empirical distribution functions 62-02 Research exposition (monographs, survey articles) pertaining to statistics 62F10 Point estimation 62F12 Asymptotic properties of parametric estimators 60G70 Extreme value theory; extremal stochastic processes