×

Singling out ill-fit items in a classification. Application to the taxonomy of Enterobacteriaceae. (English) Zbl 1348.92015

Summary: We address the problem of evaluating the quality of cluster assignment of individual items in a classification. The problem can also be viewed as outlier detection in classifications. We describe simple methods for this task based on the use of Naive Bayes classification. Applied to two existing classifications of 5313 strains of bacteria the method indicated that one classification is far more robust than the other. The observations that fit badly to their clusters are typically items whose classification is suspect also from other considerations. Removing these elements from the data set, performing clustering on the reduced data set, and adding the outliers back one-by-one yielded a clustering that has a higher likelihood than the previous accepted classifications. Investigation of this new clustering lead to suggested changes in the classification of 69 strains in the material.

MSC:

92B15 General biostatistics
62H30 Classification and discrimination; cluster analysis (statistical aspects)
62F15 Bayesian inference
PDFBibTeX XMLCite