# zbMATH — the first resource for mathematics

Nonparametric estimation of a regression function. (English) Zbl 0744.62054
Let $$(X_ 1,Y_ 1),\dots,(X_ n,Y_ n)$$ be i.i.d. on $$I\times\mathbb{R}$$, where $$I$$ is a compact set in $$\mathbb{R}^ p$$. Let $$m(x)=E(Y\mid X=x)$$. Let $$F$$ be the marginal distribution function (d.f.) of $$X$$ and let $$F_ n$$ be its empirical d.f. It is assumed that $$\sup_ x E(Y^{4s}\mid X=x)<\infty$$ for some integer $$s\geq 2$$. Let $$\{W_{nk}:\;k=(k_ 1,\dots,k_ p)\in D_ n\}$$ be a sequence of weight functions (depending on $$F$$) on $$I\times I$$, where $$D_ n$$ is an index set and $$\hbox{card}(D_ n)=K_ n$$ with $$K_ n/n^ s\to 0$$.
From the above sequence of weight functions the authors construct a sequence of estimates $\hat m_ k(x)=\sum Y_ jW_ k(x,X_ j,F_ n)/n,\qquad k\in D_ n.$ A data-dependent method of choosing the (smoothness) index $$k$$ which minimizes the prediction square error is proposed. Since this leads to a $$\tilde k$$ which depends on an unknown distribution of $$(X,Y)$$, the authors heuristically motivate applying $$\hat k$$ which is the minimizer of $\hat T_ n(k)=n^{- 2}\sum\hat\varepsilon^ 2_{kj}[1+2n^{-1}W_ k(X_ j,X_ j,F_ n)],$ where $$\hat\varepsilon_{kj}=Y_ j-\hat m_ k(X_ j)$$. Then they use $$\hat m_ k$$ as an estimate of the unknown regression function $$m(x)$$. This estimate can be specialized to piecewise polynomial, spline, orthogonal series, kernel and nearest neighbor methods. The main optimality result is that for all of these methods $$L_ n(\hat k)/L_ n(\tilde k)\to 1$$ in probability, where $L_ n(k)=\int(\hat m_ k(x)- m(x))^ 2 dF(x).$ Further results of this kind and a numerical example are also given.

##### MSC:
 62G07 Density estimation 62J02 General nonlinear regression
Full Text: