×

Outlier detection in contingency tables based on minimal patterns. (English) Zbl 1325.62117

Summary: A new technique for the detection of outliers in contingency tables is introduced, where outliers are unusual cell counts with respect to classical loglinear Poisson models. Subsets of cell counts called minimal patterns are defined, corresponding to non-singular design matrices and leading to potentially uncontaminated maximum-likelihood estimates of the model parameters and thereby the expected cell counts. A criterion to easily produce minimal patterns in the two-way case under independence is derived, based on the analysis of the positions of the chosen cells. A simulation study and a couple of real-data examples are presented to illustrate the performance of the newly developed outlier identification algorithm, and to compare it with other existing methods.

MSC:

62H17 Contingency tables
62F35 Robustness and adaptive procedures (parametric inference)

Software:

R
PDFBibTeX XMLCite
Full Text: DOI arXiv

References:

[1] Agresti, A.: Categorical Data Analysis, 2nd edn. Wiley, New York (2002) · Zbl 1018.62002
[2] Barnett, V., Lewis, T.: Outliers in Statistical Data, 3rd edn. Wiley, New York (1994) · Zbl 0801.62001
[3] Davies, L., Gather, U.: The identification of multiple outliers. J. Am. Stat. Assoc. 88, 782–792 (1993) · Zbl 0797.62025 · doi:10.1080/01621459.1993.10476339
[4] Fuchs, C., Kenett, R.: A test for detecting outlying cells in the multinomial distribution and two-way contingency tables. J. Am. Stat. Assoc. 75, 395–398 (1980) · Zbl 0462.62041 · doi:10.1080/01621459.1980.10477483
[5] Gupta, A.K., Nguyen, T., Pardo, L.: Residual analysis and outliers in loglinear models based on -divergence statistics. J. Stat. Plan. Inference 137(4), 1407–1423 (2007) · Zbl 1107.62057 · doi:10.1016/j.jspi.2006.03.005
[6] Glass, D.V., Berent, J.: Social Mobility in Britain. International Library of Sociology and Social Reconstruction. Routledge & Kegan Paul, London (1954)
[7] Goodman, L.A.: A simple simultaneous test procedure for quasi-independence in contingency tables. J. R. Stat. Soc., Ser. C 20(2), 165–177 (1971)
[8] Hubert, M.: The breakdown value of the L 1 estimator in contingency tables. Stat. Probab. Lett. 33, 419–425 (1997) · Zbl 0899.62073 · doi:10.1016/S0167-7152(96)00156-3
[9] Kuhnt, S.: Ausreißeridentifikation im Loglinearen Poissonmodell für Kontingenztafeln unter Einbeziehung robuster Schätzer. Ph.D. thesis, Universität Dortmund, Dortmund (2000) · Zbl 1042.62554
[10] Kuhnt, S.: Outlier identification procedures for contingency tables using maximum likelihood and L 1 estimates. Scand. J. Stat. 31, 431–442 (2004) · Zbl 1063.62086 · doi:10.1111/j.1467-9469.2004.02_057.x
[11] Kuhnt, S.: Breakdown concepts for contingency tables. Metrika 71, 281–294 (2010) · Zbl 1185.62105 · doi:10.1007/s00184-008-0230-3
[12] Mosteller, F., Parunak, A.: Identifying extreme cells in a sizable contingency table: probabilistic and exploratory approaches. In: Hoaglin, D.C., Mosteller, F., Tukey, J.W. (eds.) Exploring Data Tables, Trends, and Shapes, pp. 189–224. Wiley, New York (2006)
[13] McKinley, J.: Social networks, lay consultation and help-seeking behavior. Soc. Forces 51, 275–291 (1973) · doi:10.1093/sf/51.3.275
[14] R Development Core Team: R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna (2012)
[15] Rapallo, F.: Algebraic Markov bases and MCMC for two-way contingency tables. Scand. J. Stat. 30(2), 385–397 (2003) · Zbl 1055.65018 · doi:10.1111/1467-9469.00337
[16] Rapallo, F.: Outliers and patterns of outliers in contingency tables with algebraic statistics. Scand. J. Stat. 39(4), 784–797 (2012) · Zbl 1253.62043 · doi:10.1111/j.1467-9469.2012.00790.x
[17] Riani, M., Atkinson, A.C.: Robust diagnostic data analysis: transformations in regression. Technometrics 42(4), 384–394 (2000) · Zbl 1014.62086 · doi:10.1080/00401706.2000.10485711
[18] Simonoff, J.S.: Detecting outlying cells in two-way contingency tables via backwards stepping. Technometrics 30(3), 339–345 (1988) · doi:10.1080/00401706.1988.10488407
[19] Shane, K.V., Simonoff, J.S.: A robust approach to categorical data analysis. J. Comput. Graph. Stat. 10(1), 135–157 (2001) · Zbl 04565164 · doi:10.1198/10618600152418683
[20] Terbeck, W., Davies, L.: Interactions and outliers in the two-way analysis of variance. Ann. Stat. 26, 1279–1305 (1998) · Zbl 0930.62070 · doi:10.1214/aos/1024691243
[21] Upton, G.J.: Contingency table analysis: log-linear models. Qual. Quant. 14(1), 155–180 (1980) · doi:10.1007/BF00154797
[22] Upton, G.J., Guillen, M.: Perfect cells, direct models and contingency table outliers. Commun. Stat., Theory Methods 24(7), 1843–1862 (1995) · Zbl 0937.62598 · doi:10.1080/03610929508831590
[23] von Eye, A.: Configural Frequency Analysis: Methods, Models, and Applications. Lawrence Erlbaum Associates, Mahwah (2002)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.