# zbMATH — the first resource for mathematics

Selection of variables in two-group discriminant analysis by error rate and Akaike’s information criteria. (English) Zbl 0591.62053
The author considers two criteria for selecting the ”best” subset of variables for the linear discriminant function in the case of two p- variate normal populations $$\Pi_ 1$$, $$\Pi_ 2$$ with different means and a common covariance matrix, the means and the matrix being unknown and are to be estimated by random samples of unequal sizes $$N_ 1$$, $$N_ 2.$$
One criterion is based on minimizing G. J. McLachlan’s asymptotic unbiased estimate [Biometrics 36, 501-510 (1980; Zbl 0442.62046)] for the error rate of misclassification $M(j)=\Phi [-2^{-1}D_ j+2^{- 1}(k_ j-1)(N_ 1^{-1}+N_ 2^{-1})/D_ j+\quad \{32(N_ 1+N_ 2-2)\}^{-1}D_ j\{4(4k_ j-1)-D^ 2_ j\}]$ where $$D_ j$$ is the j-subset sample Mahalanobis distance between $$\Pi_ 1$$ and $$\Pi_ 2$$, and $$k_ j$$ is the dimension of this subset.
The other selection criterion is based on a ”no additional information” model minimizing Akaike’s information criterion $A(j)=(N_ 1+N_ 2)\log \{1+(p-k_ j)F(j)/(N_ 1+N_ 2-p-1)\}+2(k_ j-p),$ $where\quad F(j)=\{(N_ 1+N_ 2-p-1)/(p-k_ j)\}(D^ 2-D^ 2_ j)/\{(N_ 1+N_ 2-2)(N_ 1^{-1\quad}+N_ 2^{-1})+D_ j^ 2\},$ D being the p-variate Mahalanobis distance. It is shown that the expected error rate is closely related to the no additional information model. The asymptotic distributions and error rate risks of both criteria are obtained and are shown to be identical for these criteria, so in this sense the two criteria considered are asymptotically equivalent.
Reviewer: V.Yu.Urbakh

##### MSC:
 62H30 Classification and discrimination; cluster analysis (statistical aspects) 62E20 Asymptotic distribution theory in statistics 62F07 Statistical ranking and selection procedures
Full Text:
##### References:
  Akaike, H, A new look at the statistical model identification, IEEE trans. automat. control., AC-19, 716-723, (1974) · Zbl 0314.62039  Eisenbeis, R.A; Gilbert, G.G; Avery, R.B, Investigating the relative importance of individual variables and variable subsets in discriminant analysis, Comm. statist., 2, 205-219, (1973) · Zbl 0322.62073  Fujikoshi, Y, A criterion for variable selection in multiple discriminant analysis, Hiroshima math. J., 13, 203-214, (1983) · Zbl 0531.62059  Hablema, J.D.F; Hermans, J, Selection of variables in discriminant by F-statistic and error rate, Technometrics, 19, 487-493, (1977) · Zbl 0369.62002  Krishnaiah, P.R, Selection of variables in discriminant analysis, (), 805-820 · Zbl 0506.62047  Lachenbruch, P; Mickey, M, Estimation of error rates in discriminant analysis, Technometrics, 10, 1-11, (1968)  Lachenbruch, P.A, ()  McLachlan, G.J, An asymptotic unbiased technique for estimating the error rates in discriminant analysis, Biometrics, 30, 239-249, (1974) · Zbl 0288.62027  McLachlan, G.J, A criterion for selecting variables for the linear discriminant function, Biometrics, 32, 529-534, (1976) · Zbl 0334.62023  McLachlan, G.J, On the relationship between the F test and the overall error rate for variable selection in two-group discriminant analysis, Biometrics, 36, 501-510, (1980) · Zbl 0442.62046  Okamoto, M, An asymptotic expansion for the distribution of the linear discriminant function, Ann. math. statist., 34, 1286-1301, (1963) · Zbl 0117.37101  Rao, C.R, Inference on discriminant function coefficients, (), 537-602  Rao, C.R, ()  Shibata, R, Selection of the order of an autoregressive model by Akaike’s information criterion, Biometrika, 63, 117-126, (1976) · Zbl 0358.62048  Spitzer, F, A combinatorial lemma and its application to probability theory, Trans. amer. math. soc., 82, 323-339, (1956) · Zbl 0071.13003
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.