We wish to make clear at the outset of this section our gratitude to Michelle A. Amazeen for graciously making a draft copy of Checking available to help us conduct our review.
The Trail of Inquiry
When we inquired of Amazeen about Checking, she provided us a draft copy and invited questions about her research.
We sent Amazeen a number of questions, including whether the data set in Checking offers better evidence supporting fact checkers’ selection bias than a similarity in post-selection evaluation.
ZEBRA FACT CHECK
Some of the elite fact checkers have already cited this or related research in support of their own reliability. And agreement near the 95 percent level sounds like a good support at first blush. But thanks in part to the research methodology, the data set allows for minimum agreement of 91.7 percent for the most robust comparison (Factcheck.org and PolitiFact). Only three fact checker evaluations failed to find fault. One could mix the data set randomly and end up with a minimum 91.7 percent agreement. Isn’t this essential context for understanding the meaning of the 98 percent agreement reported in the paper? Doesn’t this speak more to the tendency of the fact checkers to focus on reviewing dubious claims than to an agreement in their evaluations?
MICHELLE A. AMAZEEN
If this study was based upon a sample of fact-checks from the 2008 presidential election, then one would certainly need to account for sampling error. However, this was a census of all fact-checks by the leading national fact-checkers, so no projections are being made. Factcheck.org and PolitiFact.com agreed 98% of the time when they evaluated the same claims during the 2008 election.
I do note on p.27 that fact-checkers do not seek out accurate statements to check. Nevertheless, whatever the selection criteria they used, when the national fact-checkers examined the same claims, they agreed on its accuracy 98% of the time in 2008.
When Amazeen’s reply appeared to miss or skirt our point we sent follow-up questions, but we received no reply to those.
We next sought an expert on research and statistics to comment on the methods in Checking. Through the American Statistical Association, our query reached James Cochran, professor of statistics at the University of Alabama.
We provided brief background on the main point of Checking, then repeated to Cochran our key question to Amazeen.
Cochran responded by saying he had a number of concerns about Amazeen’s approach, much like the concerns we had expressed.
With Cochran’s permission, we forwarded his response to Amazeen.
Amazeen rejected the criticism:
While I appreciate your interest in this study, it has already been through double-blind academic peer review. The concerns raised by your … expert are addressed elsewhere in my report (one reason why it is not advisable to rely upon reviews of isolated portions of a study). Rather than relying solely upon frequency of agreement, I also use Krippendorff’s alpha as a conservative reliability estimate indicating the agreement between fact-checkers is not due to chance.
We wondered what bearing Krippendorff’s alpha might have on the problems Cochran pointed out. Krippendorff’s alpha helps ensure consistency in coding. A Krippendorff’s alpha figure of 0 represents reliability equal to chance. A figure of 1 represents perfect reliability. A figure of .66 means about one-third of the results were equivalent to chance results. Checking’s researchers examined the evaluations of fact checkers to categorize (code) them as either false or true. We noted Checking’s application of Krippendorff’s alpha to that segment of the research, but Checking also applies Krippendoff’s alpha to the content analysis of the fact checkers, though filtered, as far as we can tell, through the researchers’ coding. We originally overlooked this dual application of Krippendorff’s alpha, in part since Checking provides little description of how or why it treats fact checkers like coders.
In a footnote, Checking acknowledges that Krippendorff’s alpha for the fact checkers comes in at a very pedestrian 0.66:
Despite very high levels of agreement, the relatively modest Krippendorff’s α of 0.66 (see Table 3) can be attributed to the constraints imposed by the lack of non-binary data from FactCheck.org. As detailed by Hayes and Krippendorff (2007: 87), the nominal version of α is always lower than the obtained α when the observers’ disagreements adhere to the metric of the selected α. For instance, in their example where ordinal data achieved an α of 0.7598, the same data treated nominally achieved an α of 0.4765. Despite this constraint, however, the agreement levels in the present study are still acceptable.
We suggest a more obvious reason for the relatively poor reliability of fact-checker agreement: The data show very low variability. As Klaus Krippendorff says, when no variation occurs in the data there is no evidence of reliability. Checking’s comparison of FactCheck.org with the Washington Post Fact Checker fits Krippendorff’s observation, as we illustrate in the third section. A very low degree of variation leads to low levels of reliability on Krippendorff’s scale.
Krippendorff says, “Consider variables with reliabilities between α = .667 and α = .800 only for drawing tentative conclusions.” We do not know why Checking doesn’t do more to explain using a reliability level below what Krippendorff recommends for tentative conclusions. Instead of viewing the low reliability numbers as a warning about the support for its conclusions, Checking excuses them on dubious grounds and adduces its application of Krippendorff’s alpha as a support for its conclusions.
That approach effectively turns science on its head. See Page 3 for additional notes on Krippendorff’s alpha. We think these notes help make the case that Checking fails to adequately justify accepting .66 as a minimum level of reliability and beyond that errs by treating fact checkers as coders.
Special thanks to Professor Cochran for his ongoing interest in and contributions to this subject.
Page 1: Main Review
Page 3: Additional notes on Krippendorff’s alpha