On projection pursuit regression.

*(English)*Zbl 0698.62041A mathematical model of projection pursuit regression based on kernel estimation (of “marginal density of errors in projection direction”) is proposed and the necessary mathematical calculus for projection approximation of the target function \(G(X)=E Y\) is built up. The main results are explicit formulae for bias and error about the mean in orientation estimates and curve estimates. They show that the estimate of orientation (of projection) has most its error in the form of bias.

They also prove that the kernel-based projection pursuit regression does estimate corresponding projections with convergence rates identical to those known from one-dimensional estimation, namely \(O(h^ r)=O((nh)^{-1/2})\) (h being the bandwidth of kernel estimator). The estimator of G(x) based on projections to the direction \(\theta\) (say \(\hat G_{\theta}(x))\) is required to minimize \[ \hat S(\theta)=n^{- 1}\sum^{n}_{k=1}(Y_ k-\hat G_{\theta}(x_ k))^ 2 \] (where \((Y_ k,X_ k)^ n_{k=1}\), \(Y_ k\in R\), \(X_ k\in R^ p\) are data). The main idea how to construct \(\hat G_{\theta}(x)\) is to do it through estimating \(\theta\) by \({\hat \theta}\) which minimizes \[ \tilde S(\theta)=n^{-1}\sum^{n}_{k=1}(Y_ k-\hat G^ k_{\theta}(x_ k))^ 2 \] where \(\hat G^ k_{\theta}(x_ k)\) is the nonparametric (kernel) estimate of G(x) based on all points \((x_ i)^ n_{i=1}\) except of \(x_ k\). At the end, some alternative approaches (with randow window, two-stage algorithm etc.) are discussed.

They also prove that the kernel-based projection pursuit regression does estimate corresponding projections with convergence rates identical to those known from one-dimensional estimation, namely \(O(h^ r)=O((nh)^{-1/2})\) (h being the bandwidth of kernel estimator). The estimator of G(x) based on projections to the direction \(\theta\) (say \(\hat G_{\theta}(x))\) is required to minimize \[ \hat S(\theta)=n^{- 1}\sum^{n}_{k=1}(Y_ k-\hat G_{\theta}(x_ k))^ 2 \] (where \((Y_ k,X_ k)^ n_{k=1}\), \(Y_ k\in R\), \(X_ k\in R^ p\) are data). The main idea how to construct \(\hat G_{\theta}(x)\) is to do it through estimating \(\theta\) by \({\hat \theta}\) which minimizes \[ \tilde S(\theta)=n^{-1}\sum^{n}_{k=1}(Y_ k-\hat G^ k_{\theta}(x_ k))^ 2 \] where \(\hat G^ k_{\theta}(x_ k)\) is the nonparametric (kernel) estimate of G(x) based on all points \((x_ i)^ n_{i=1}\) except of \(x_ k\). At the end, some alternative approaches (with randow window, two-stage algorithm etc.) are discussed.

Reviewer: J.Á.Víšek

##### MSC:

62G05 | Nonparametric estimation |

62H99 | Multivariate analysis |

62H05 | Characterization and structure theory for multivariate probability distributions; copulas |