Cohen’s linearly weighted kappa is a weighted average. (English) Zbl 1284.62348

Summary: An \(n \times n\) agreement table \(F=\{f_{ij}\}\) with \(n \geq 3\) ordered categories can for fixed \(m\) (\(2 \leq m \leq n - 1\)) be collapsed into \({\binom{n-1}{m-1}}\) distinct \(m \times m\) tables by combining adjacent categories. It is shown that the components (observed and expected agreement) of Cohen’s weighted kappa with linear weights can be obtained from the \(m \times m\) subtables. A consequence is that weighted kappa with linear weights can be interpreted as a weighted average of the linearly weighted kappas corresponding to the \(m\times m\) tables, where the weights are the denominators of the kappas. Moreover, weighted kappa with linear weights can be interpreted as a weighted average of the linearly weighted kappas corresponding to all nontrivial subtables.


62H20 Measures of association (correlation, canonical correlation, etc.)
62P10 Applications of statistics to biology and medical sciences; meta analysis
62P15 Applications of statistics to psychology
Full Text: DOI


[1] Abramowitz M, Stegun IA (1970) Handbook of mathematical functions (with formulas, graphs and mathematical tables). Dover Publications, New York · Zbl 0171.38503
[2] Agresti A (1990) Categorical data analysis. Wiley, New York · Zbl 0716.62001
[3] Berry KJ, Mielke PW (1988) A generalization of Cohen’s kappa agreement measure to interval measurement and multiple raters. Educ Psychol Meas 48: 921–933
[4] Brennan RL, Prediger DJ (1981) Coefficient kappa: Some uses, misuses, and alternatives. Educ Psychol Meas 41: 687–699
[5] Brenner H, Kliebsch U (1996) Dependence of weighted kappa coefficients on the number of categories. Epidemiology 7: 199–202
[6] Cicchetti D, Allison T (1971) A new procedure for assessing reliability of scoring EEG sleep recordings. Am J EEG Technol 11: 101–109
[7] Cohen J (1960) A coefficient of agreement for nominal scales. Educ Psychol Meas 20: 213–220
[8] Cohen J (1968) Weighted kappa: nominal scale agreement with provision for scaled disagreement or partial credit. Psychol Bull 70: 213–220
[9] Conger AJ (1980) Integration and generalization of kappas for multiple raters. Psychol Bull 88: 322–328
[10] Fleiss JL, Cohen J (1973) The equivalence of weighted kappa and the intraclass correlation coefficient as measures of reliability. Educ Psychol Meas 33: 613–619
[11] Holmquist NS, McMahon CA, Williams EO (1968) Variability in classification of carcinoma in situ of the uterine cervix. Obstet Gynecol Surv 23: 580–585
[12] Hsu LM, Field R (2003) Interrater agreement measures: Comments on kappan, Cohen’s kappa, Scott’s {\(\pi\)} and Aickin’s {\(\alpha\)}. Underst Stat 2: 205–219
[13] Jakobsson U, Westergren A (2005) Statistical methods for assessing agreement for ordinal data. Scand J Caring Sci 19: 427–431
[14] Kraemer HC, Periyakoil VS, Noda A (2004) Tutorial in biostatistics: Kappa coefficients in medical research. Stat Med 21: 2109–2129
[15] Kundel HL, Polansky M (2003) Measurement of observer agreement. Radiology 288: 303–308
[16] Landis JR, Koch GG (1977) An application of hierarchical kappa-type statistics in the assessment of majority agreement among multiple observers. Biometrics 33: 363–374 · Zbl 0357.62037
[17] Mielke PW, Berry KJ, Johnston JE (2007) The exact variance of weighted kappa with multiple raters. Psychol Rep 101: 655–660
[18] Mielke PW, Berry KJ, Johnston JE (2008) Resampling probability values for weighted kappa with multiple raters. Psychol Rep 102: 606–613
[19] Nelson JC, Pepe MS (2000) Statistical description of interrater variability in ordinal ratings. Stat Methods Med Res 9: 475–496 · Zbl 1121.62644
[20] Schouten HJA (1986) Nominal scale agreement among observers. Psychometrika 51: 453–466
[21] Schuster C (2004) A note on the interpretation of weighted kappa and its relations to other rater agreement statistics for metric scales. Educ Psychol Meas 64: 243–253
[22] Vanbelle S, Albert A (2009a) Agreement between two independent groups of raters. Psychometrika 74: 477–491 · Zbl 1272.62135
[23] Vanbelle S, Albert A (2009b) Agreement between an isolated rater and a group of raters. Stat Neerlandica 63: 82–100 · Zbl 1272.62135
[24] Vanbelle S, Albert A (2009c) A note on the linearly weighted kappa coefficient for ordinal scales. Stat Methodol 6: 157–163 · Zbl 1220.62172
[25] Warrens MJ (2008a) On the equivalence of Cohen’s kappa and the Hubert-Arabie adjusted Rand index. J Classif 25: 177–183 · Zbl 1276.62043
[26] Warrens MJ (2008b) On similarity coefficients for 2 {\(\times\)} 2 tables and correction for chance. Psychometrika 73: 487–502 · Zbl 1301.62125
[27] Warrens MJ (2010a) Inequalities between kappa and kappa-like statistics for k {\(\times\)} k tables. Psychometrika 75: 176–185 · Zbl 1272.62138
[28] Warrens MJ (2010b) A formal proof of a paradox associated with Cohen’s kappa. J Classif 27: 322–332 · Zbl 1337.62143
[29] Warrens MJ (2010c) Inequalities between multi-rater kappas. Adv Data Anal Classif 4: 271–286 · Zbl 1284.62338
[30] Warrens MJ (2010d) A Kraemer-type rescaling that transforms the odds ratio into the weighted kappa coefficient. Psychometrika 75: 328–330 · Zbl 1234.62088
[31] Warrens MJ (2010e) Cohen’s kappa can always be increased and decreased by combining categories. Stat Methodol 7: 673–677 · Zbl 1232.62161
[32] Warrens MJ (2011a) Weighted kappa is higher than Cohen’s kappa for tridiagonal agreement tables. Stat Methodol 8: 268–272 · Zbl 1213.62187
[33] Warrens MJ (2011b) Cohen’s linearly weighted kappa is a weighted average of 2 {\(\times\)} 2 kappas. Psychometrika 76: 471–486 · Zbl 1284.62763
[34] Warrens MJ (2011c) Cohen’s quadratically weighted kappa is higher than linearly weighted kappa for tridiagonal agreement tables. Stat Methodol (in press) · Zbl 1213.62187
[35] Warrens MJ (2011d) Cohen’s kappa is weighted average. Stat Methodol (in press)
[36] Zwick R (1988) Another look at interrater agreement. Psychol Bull 103: 374–378
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.