Rates of convergence of estimates, Kolmogorov’s entropy and the dimensionality reduction principle in regression. (English) Zbl 0909.62063
Let $$(X_1,Y_1),\dots, (X_n,Y_n)$$ be a random sample of $$n$$ independent pairs, copies of $$(X,Y)$$. The random vector $$X$$ is distributed in $$I=[0,1]^d$$. Conditionally on $$X_1= x_1,\dots, X_n=x_n$$, the r.v. $$Y_1,\dots, Y_n$$ are independent, each having density $$f(y\mid x_i, \theta(x_i))$$, $$i=1,\dots, n$$, of known form. The unknown function $$\theta$$ is an element of $$\Theta_{q,d}$$, the space of $$q$$-smooth real-valued functions in $$I$$. Like in Y. G. Yatracos [Ann. Stat. 17, No. 4, 1597-1607 (1989; Zbl 0694.62018)] a statistical interpretation of $$\theta$$ is not specified, whether for example it is either a mean or a median.
In the present paper $$L_1$$-optimal estimates $$\widetilde{\theta}_n$$ of $$\theta$$ are constructed for the models of two following types, in the presence or without interactions:
I. The additive supermodel, $\theta(x)= \sum_{j=1}^K \theta_{1j}(b_j^Tx)+ \sum_{j=1}^L \psi_j(x_{m_1},\dots, x_{m_{r_j}});$ II. The multiplicative supermodel, $\theta(x)= \prod_{j=1}^K \theta_{1j} (b_j^Tx)\cdot \prod_{j=1}^L \psi_j(x_{m_1},\dots, x_{m_{r_j}}).$ Here $$b_j$$ are the unit vectors in $$\mathbb{R}^d$$. The parameter $$r=\max_{1\leq j\leq L}r_j$$ is called the dimension of the model. For the supermodels without interactions the dimension is $$r=1$$.
Y. G. Yatracos [Ann. Stat. 13, 768-774 (1985; Zbl 0576.62057)] constructed $$L_1$$-estimates of a probability measure under the assumption of i.i.d. observations and related the $$L_1$$-rate of convergence of the estimates to Kolmogorov’s entropy of the parameter space. G. G. Roussas and Y. G. Yatracos [D. Pollard et al. (eds.), Festschrift for Lucien Le Cam: Research Papers in Probability and Statistics, 337-344 (1997; Zbl 0892.62023)] provided $$L_1$$-estimates of a probability measure on the basis of observations from a $$\varphi$$-mixing sequence of r.v. All these methods as well as the method of the present paper are close relatives to U. Grenander’s method of sieves [“Abstract inference.” (1981; Zbl 0505.62069)].
The obtained rates of convergence of $$\widetilde{\theta}_n$$ to the true value depend on Kolmogorov’s entropy of the assumed model and confirm C. J. Stone’s [Ann. Stat. 13, 689-705 (1985; Zbl 0605.62065)] heuristic dimensionality reduction principle that the optimal rate of convergence is $$n^{-q/(2q+r)}$$. The proof is based on the inequalities of W. Hoeffding [J. Am. Stat. Assoc. 58, 13-30 (1963; Zbl 0127.10602)]. Rates of convergence are also obtained for the error in estimating the derivatives of a regression type function.

##### MSC:
 62J02 General nonlinear regression 62G20 Asymptotic properties of nonparametric inference 62G05 Nonparametric estimation 62G30 Order statistics; empirical distribution functions
