Assessing observer agreement in studies involving replicated binary observations.


We propose a new approach to evaluating agreement between two observers making binary observations on a set of subjects. This approach compares the probability of disagreement (or discordance) between the observers to the probability of disagreement between replicated observations made by the same observer on the same subject. We consider two situations: (1) a symmetric assessment of agreement between two observers, and (2) an assessment of the agreement of a new observer with an imperfect "gold standard". We develop a nonparametric method for estimation of the new agreement coefficients when observers make replicated readings on each subject. The reliability of the estimation method is examined via a simulation study. Data from a study aimed at determining the validity of diagnosis of breast cancer based on mammograms is used to illustrate the new concepts and methods.

