×

A concrete statistical realization of Kleinberg’s stochastic dicrimination for pattern recognition. I: Two-class classification. (English) Zbl 1100.68609

Summary: The method of Stochastic Discrimination (SD) introduced by Kleinberg is a new method in statistical pattern recognition. It works by producing many weak classifiers and then combining them to form a strong classifier. However, the strict mathematical assumptions in [E. M. Kleinberg, Ann. Stat. 24, 2319–2349 (1996; Zbl 0877.68102)] are rarely met in practice. This paper provides an applicable way to realize the SD algorithm. We recast SD in a probability-space framework and present a concrete statistical realization of SD for two-class pattern recognition. We weaken Kleinberg’s theoretically strict assumptions of uniformity and indiscernibility by introducing near uniformity and weak indiscernibility. Such weaker notions are easily encountered in practical applications. We present a systematic resampling method to produce weak classifiers and then establish corresponding classification rules of SD. We analyze the performance of SD theoretically and explain why SD is overtraining-resistant and why SD has a high convergence rate. Testing results on real and simulated data sets are also given.

MSC:

68T10 Pattern recognition, speech recognition
68T05 Learning and adaptive systems in artificial intelligence

Citations:

Zbl 0877.68102
PDFBibTeX XMLCite
Full Text: DOI

References:

[1] Berlind, R. (1994). An alternative method of stochastic discrimination with applications to pattern recognition. Ph.D. dissertation, Dept. Mathematics, State Univ. New York, Buffalo.
[2] Breiman, L. (1996). Bagging predictors. Machine Learning 24 123–140. · Zbl 0858.68080
[3] Chen, D. (1998). Estimates of classification accuracies for Kleinberg’s method of stochastic discrimination in pattern recognition. Ph.D. dissertation, Dept. Mathematics, State Univ. New York, Buffalo.
[4] Duda, R. O., Hart, P. E. and Stork, D. G. (2001). Pattern Classification , 2nd ed. Wiley, New York. · Zbl 0968.68140
[5] Fisher, R. A. (1938). The statistical utilization of multiple measurements. Annals of Eugenics 8 376–386. · Zbl 0019.35703 · doi:10.1111/j.1469-1809.1938.tb02189.x
[6] Freund, Y. and Schapire, R. E. (1996). Experiments with a new boosting algorithm. In Proc. 13th International Conference on Machine Learning 148–156. Morgan Kauffman, San Francisco.
[7] Freund, Y. and Schapire, R. E. (1997). A decision–theoretic generalization of on-line learning and an application to boosting. J. Comput. System Sci. 55 119–139. · Zbl 0880.68103 · doi:10.1006/jcss.1997.1504
[8] Friedman, J., Hastie, T. and Tibshirani, R. (2000). Additive logistic regression: A statistical view of boosting (with discussion). Ann. Statist. 28 337–407. · Zbl 1106.62323 · doi:10.1214/aos/1016218223
[9] Fukunaga, K. (1990). Introduction to Statistical Pattern Recognition , 2nd ed. Academic Press, New York. · Zbl 0711.62052
[10] Ho, T. K. (1995). Random decision forests. In Proc. Third International Conference on Document Analysis and Recognition (M. Kavanaugh and P. Storms, eds.) 1 278–282. IEEE Computer Society Press, New York.
[11] Ho, T. K. (1998). The random subspace method for constructing decision forests. IEEE Transactions on Pattern Analysis and Machine Intelligence 20 832–844.
[12] Ho, T. K. and Baird, H. S. (1998). Pattern classification with compact distribution maps. Computer Vision and Image Understanding 70 101–110.
[13] Ho, T. K. and Kleinberg, E. M. (1996). Building projectable classifiers of arbitrary complexity. In Proc. 13th International Conference on Pattern Recognition (M. E. Kavanaugh and B. Werner, eds.) 2 880–885. IEEE Computer Society Press, New York.
[14] Hoeffding, W. (1963). Probability inequalities for sums of bounded random variables. J. Amer. Statist. Assoc. 58 13–30. · Zbl 0127.10602 · doi:10.2307/2282952
[15] Kleinberg, E. M. (1990). Stochastic discrimination. Ann. Math. Artif. Intell. 1 207–239. · Zbl 0870.68071 · doi:10.1007/BF01531079
[16] Kleinberg, E. M. (1996). An overtraining-resistant stochastic modeling method for pattern recognition. Ann. Statist. 24 2319–2349. · Zbl 0877.68102 · doi:10.1214/aos/1032181157
[17] Kleinberg, E. M. (2000). On the algorithmic implementation of stochastic discrimination. IEEE Transactions on Pattern Analysis and Machine Intelligence 22 473–490.
[18] Kleinberg, E. M. and Ho, T. K. (1993). Pattern recognition by stochastic modeling. In Proc. Third International Workshop on Frontiers in Handwriting Recognition (M. Bosker and R. Casey, eds.) 175–183. Partners Press, Buffalo.
[19] McLachlan, G. J. (1992). Discriminant Analysis and Statistical Pattern Recognition. Wiley, New York. · Zbl 1108.62317
[20] Ripley, B. D. (1996). Pattern Recognition and Neural Networks. Cambridge Univ. Press. · Zbl 0853.62046
[21] Schapire, R. E. (1990). The strength of weak learnability. Machine Learning 5 197–227. · Zbl 1142.62372
[22] Smith, J., Everhart, J., Dickson, W., Knowler, W. and Johannes, R. (1988). Using the ADAP learning algorithm to forecast the onset of diabetes mellitus. In Proc. Symposium on Computer Applications and Medical Care 261–265. IEEE Computer Society Press, New York.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.