×

Bayesian multiscale smoothing in supervised and semi-supervised kernel discriminant analysis. (English) Zbl 1328.62177

Summary: In kernel discriminant analysis, it is common practice to select the smoothing parameter (bandwidth) based on the training data and use it for classifying all unlabeled observations. But this method of selecting a single scale of smoothing ignores the major issue of model uncertainty. Moreover, in addition to depending on the training sample, a good choice of bandwidth may also depend on the observation to be classified, and a fixed level of smoothing may not work well in all parts of the measurement space. So, instead of using a single smoothing parameter, it may be more useful in practice to study classification results for multiple scales of smoothing and judiciously aggregate them to arrive at the final decision. This paper adopts a Bayesian approach to carry out one such multiscale analysis using a probabilistic framework. This framework also helps us to extend our multiscale method for semi-supervised classification, where, in addition to the training sample, one uses unlabeled test set observations to form the decision rule. Some well-known benchmark data sets are analyzed to show the utility of these proposed methods.

MSC:

62F15 Bayesian inference
62H30 Classification and discrimination; cluster analysis (statistical aspects)
62G07 Density estimation

Software:

SiZer
PDFBibTeX XMLCite
Full Text: DOI

References:

[1] Bernardo, J. M.; Smith, A. F.M., Bayesian Theory (1994), Wiley: Wiley Chichester · Zbl 0796.62002
[2] Castelli, V.; Cover, T., The relative value of labeled and unlabeled samples in pattern recognition with an unknown mixing parameter, IEEE Trans. Inf. Theory, 42, 348-363 (1996) · Zbl 0873.68185
[3] Celeux, G.; Govaert, G., A classification EM algorithm for clustering and two stochastic versions, Comput. Statist. Data Anal., 14, 315-332 (1992) · Zbl 0937.62605
[4] (Chapelle, O.; Zien, A.; Scholkopf, B., Semi-Supervised Learning (2006), MIT Press: MIT Press Cambridge)
[5] Chaudhuri, P.; Marron, J. S., SiZer for exploration of structures in curves, J. Amer. Statist. Assoc., 94, 807-823 (1999) · Zbl 1072.62556
[6] Chaudhuri, P.; Marron, J. S., Scale space view of curve estimation, Ann. Statist., 28, 408-428 (2000) · Zbl 1106.62318
[7] Chickering, D.; Heckerman, D., Efficient approximations for the marginal likelihood of Bayesian networks with hidden variables, Mach. Learn., 29, 181-212 (1997) · Zbl 0894.68124
[8] Cohen, I.; Cozman, F. G.; Sebe, N.; Cirelo, M. C.; Huang, T. S., Semi-supervised learning of classifiers: theory, algorithms and their application to human-computer interaction, IEEE Trans. Pattern Anal. Mach. Intell., 26, 1553-1568 (2004)
[9] Dean, N.; Murphy, T. B.; Downey, G., Using unlabeled data to update classification rules with applications in food authenticity studies, J. Roy. Statist. Soc. Ser. C, 55, 1-14 (2006) · Zbl 1490.62155
[10] Dempster, A.; Larid, N.; Rubin, D., Maximum likelihood from incomplete data via the EM algorithm, J. R. Stat. Soc. Ser. B, 39, 1-38 (1977) · Zbl 0364.62022
[11] Efron, B., Estimating the error rate of a prediction rule: improvement on cross-validation, J. Amer. Statist. Assoc., 78, 316-331 (1983) · Zbl 0543.62079
[12] Erästö, P.; Holmström, L., Bayesian multiscale smoothing for making inferences about features in scatter plots, J. Comput. Graph. Statist., 14, 569-589 (2005)
[13] Escobar, M.; West, M., Bayesian density estimation and inference using mixtures, J. Amer. Statist. Assoc., 90, 577-588 (1995) · Zbl 0826.62021
[14] Ferguson, T. S., A Bayesian analysis for some nonparametric problem, Ann. Statist., 1 (1973), 2009-230
[15] Fisher, R. A., The use of multiple measurements in taxonomic problems, Ann. Eugenics, 7, 179-188 (1936)
[16] Friedman, J. H., On bias, variance, 0-1 loss, and the curse of dimensionality, Data Min. Knowl. Discov., 1, 55-77 (1997)
[17] Gangopadhyay, A. K.; Cheung, K. N., Bayesian approach to the choice of smoothing parameter in kernel density estimation, J. Nonparametr. Stat., 14, 655-664 (2002) · Zbl 1013.62038
[18] Geman, S.; Geman, D., Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images, IEEE Trans. Pattern Anal. Mach. Intell., 6, 721-741 (1984) · Zbl 0573.62030
[19] Ghosh, A. K.; Chaudhuri, P., Optimal smoothing in kernel discriminant analysis, Statist. Sinica, 14, 457-483 (2004) · Zbl 1045.62024
[20] Ghosh, A. K.; Chaudhuri, P.; Murthy, C. A., On visualization and aggregation of nearest neighbor classifiers, IEEE Trans. Pattern Anal. Mach. Intell., 27, 1592-1602 (2005)
[21] Ghosh, A. K.; Chaudhuri, P.; Sengupta, D., Classification using kernel density estimates: multi-scale analysis and visualization, Technometrics, 48, 120-132 (2006)
[22] Ghosh, A. K.; Hall, P., On error rate estimation in nonparametric classification, Statist. Sinica, 18, 1081-1100 (2008) · Zbl 1149.62026
[23] Gilks, W. R.; Richardson, S.; Spiegelhalter, D. J., Markov Chain Monte Carlo in Practice (1996), Chapman and Hall: Chapman and Hall London · Zbl 0832.00018
[24] Godtliebsen, F.; Marron, J. S.; Chaudhuri, P., Significance in scale space for bivariate density estimation, J. Comput. Graph. Statist., 11, 1-22 (2002)
[25] Hall, P.; Kang, K.-H., Bandwidth choice for nonparametric classification, Ann. Statist., 33, 284-306 (2005) · Zbl 1064.62075
[26] Hall, P.; Wand, M. P., On nonparametric discrimination using density differences, Biometrika, 75, 541-547 (1988) · Zbl 0651.62029
[27] Hoeting, J. A.; Madigan, D.; Raftery, A. E.; Volinsky, C. T., Bayesian model averaging: a tutorial, Statist. Sci., 14, 382-417 (1999) · Zbl 1059.62525
[28] Holmes, C. C.; Adams, N. M., A probabilistic nearest neighbor method for statistical pattern recognition, J. R. Stat. Soc. Ser. B, 64, 295-306 (2002) · Zbl 1059.62065
[29] Holmes, C. C.; Adams, N. M., Likelihood inference in nearest-neighbor classification methods, Biometrika, 90, 99-112 (2003) · Zbl 1034.62053
[30] Johnson, R. A.; Wichern, D. W., Applied Multivariate Statistical Analysis (1992), Prentice Hall: Prentice Hall New Jersey · Zbl 0745.62050
[31] Jones, M. C.; Marron, J. S.; Sheather, S. J., A brief summary of bandwidth selection for density estimation, J. Amer. Statist. Assoc., 91, 401-407 (1996) · Zbl 0873.62040
[32] Lachenbruch, P. A.; Mickey, M. R., Estimation of error rates in discriminant analysis, Technometrics, 10, 1-11 (1968)
[33] Liang, F.; Mukherjee, S.; West, M., The use of unlabeled data in predictive modeling, Statist. Sci., 22, 189-205 (2007) · Zbl 1246.62157
[34] Rousseeuw, P. J.; Van Driessen, K., A fast algorithm for the minimum covariance determinant estimator, Technometrics, 41, 212-223 (1999)
[35] Same, A.; Ambroise, C.; Govaert, G., A classification EM algorithm for binned data, Comput. Statist. Data Anal., 51, 466-480 (2006) · Zbl 1157.62445
[36] Scott, D. W., Multivariate Density Estimation: Theory, Practice and Visualization (1992), Wiley: Wiley New York · Zbl 0850.62006
[37] Silverman, B. W., Density Estimation for Statistics and Data Analysis (1986), Chapman and Hall: Chapman and Hall London · Zbl 0617.62042
[38] Tyler, D. E., A distribution free \(M\) estimator of multivariate scatter, Ann. Statist., 15, 234-251 (1987) · Zbl 0628.62053
[39] West, M., 1990. Bayesian kernel density estimation. Discussion Paper 90-A02. Duke University Institute of Statistics and Decision Sciences.; West, M., 1990. Bayesian kernel density estimation. Discussion Paper 90-A02. Duke University Institute of Statistics and Decision Sciences.
[40] Zhang, X.; King, M. L.; Hyndman, R. J., A Bayesian approach to bandwidth selection for multivariate kernel density estimation, Comput. Statist. Data Anal., 50, 3009-3031 (2006) · Zbl 1445.62077
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.