A skeptical treatment of Nyhan and Reifler’s latest study
The team of Brendan Nyhan and Jason Reifler created yet another stir recently with their latest study of media effects, “The Effects of Fact Checking Threat.” Brendan Nyhan holds a post at Dartmouth College as assistant professor of government. Jason Reifler serves as senior lecturer in politics at the University of Exeter.
What was the buzz all about? Nyhan and Reifler explained themselves in a story published by Politico and carrying the title “Fact checkers can help keep ACA debate honest”:
In our study, 2.7 percent of legislators who were not sent reminders about fact checking received a negative PolitiFact rating or had the accuracy of their statements questioned publicly. Among legislators who were sent warning letters about fact checking, this likelihood declined to just 1 percent — a 63 percent decrease in relative risk. These results, which we describe further in a New America Foundation report and academic working paper, suggest that fact checking can help to change politicians’ behavior, which in turn could have major implications for how well the public understands important policy issues.
Maybe the buzz was disproportionate.
The study in brief
Nyhan and Reifler divided a pool of state legislators into three groups randomized by party and other factors. The legislators all came from states with a PolitiFact franchise. One group, the “treatment group,” was sent letters warning them that researchers were looking at how they might react to having false statements receive a fact check, as from PolitiFact, and warning of the ill consequences of a poor rating. The second group, the placebo group, was sent a shorter letter warning of a study that would look at politicians’ truthfulness but omitting any mention of PolitiFact or the negative consequences of a poor rating. A third group, the control group, was sent no letter at all.
What happened? PolitiFact published 16 fact checks with a “Half True” or worse rating for politicians chosen for the study. If that seems like a small number on which to base a conclusion, Nyhan and Reifler confirm it on Page 14 of their report (bold emphasis added):
While the treatment effect falls just short of significance at the p < .05 (one-tailed) for the negative PolitiFact rating, the effect is in the expected direction.
Statpac.com explains why this matters:
Significance is a statistical term that tells how sure you are that a difference or relationship exists. To say that a significant difference or relationship exists only tells half the story. We might be very sure that a relationship exists, but is it a strong, moderate, or weak relationship? After finding a significant relationship, it is important to evaluate its strength. Significant relationships can be strong or weak. Significant differences can be large or small. It just depends on your sample size.
Nyhan and Reifler found that the “treatment condition”—sending a specific warning letter—reduced the risk of a negative PolitiFact rating by 55 percent. Despite that high percentage, the small sample size placed the measurement below the level of significance, that is, the findings were less than 95 percent reliable.
Why, then, have Nyhan and Reifler insisted on the significance of the study?
It’s true that the 95 percent significance standard leaves room for reliability as high as 94 percent or so. And they assigned a researcher to Lexis Nexis in pursuit of a second line of evidence, looking for news or blog articles questioning statements from their list of subjects. Nyhan and Reifler say that combining that second pool of data with the negative PolitiFact ratings produces a highly significant result, statistically speaking.
What to make of it
The opinions of the researchers and media aside, what does the study signify?
The study has relatively little to do with fact checking, and there are good reasons to question the data it produced.
The reporting on the study, including the version penned by Nyhan and Reifler, somewhat misses the mark. The study does not measure the effect of fact checking on politicians. If the study measures anything it measures the effects of threatening politicians with media attacks. Remember, the placebo group promises fact checking. The treatment group threatens about the consequences of media reports. It mentions perhaps the least-trusted major fact checker by name. How many politicians expect a fair shake even if they speak accurately?
One of the clearest statistical trends in the study was incidental to the researchers’ aim. The letters Nyhan and Reifler sent to politicians stressed it was essential that letter recipients confirm they received the information about the study. The politicians were instructed to confirm receipt of the letter by returning a postage-paid postcard to the researchers.
Thirty-four percent of those in the placebo group returned the postcards.
Twenty-one percent of those in the treatment group, the one mentioning PolitiFact, returned the postcards.
The researchers floated a theory about the disparity:
The postcards themselves provide suggestive evidence that the content of the treatment letter had a significant effect—only 21% of legislators in the treatment group returned a signed postcard compared to 34% of those in the placebo condition, suggesting that it may have displeased its recipients (p<.01; see SM).
Perhaps the threats in the treatment group letter displeased recipients. Or perhaps recipients had trouble taking the study seriously if PolitiFact was doing the fact checking. Could the study have value based on fact check ratings that are essentially subjective? The lopsided percentage for the returned postcards may represent the most significant aspect of the study.
Why question the numbers Nyhan and Reifler produced? Uncertainty multiplies itself, and the study design leaves considerable uncertainty in the data. The researchers do not know how many in the treatment group read their letter. If only half the group read the letter, a 96 percent confidence level in the experimental effect drops in keeping with the shrunken sample. But nobody really knows the percentage. The researchers admit that even a returned postcard can’t guarantee that the targeted politician read the letter.
The Lexis Nexis search also raises questions. The researchers did not report the names of the politicians on their lists, but did describe the search terms given to the researcher who combed Lexis Nexis for fact check stories. The list of terms printed in the report provides evidence of poor use of the Lexis Nexis database. The list includes various terms for lying or deceitfulness, but no use of search wildcards or truncation. If researchers want to look for “mislead,” “misleads” and “misleading” they should use an exclamation point to capture all three results with one search term: “mislead!” The exclamation point signals the search engine to find the entire set of words beginning with “mislead.” The list Nyhan and Reifler produced has a number of instances of the same word with different endings, but it left out many possibilities. The near-duplicates suggest that the research made no use of truncation or wildcards and therefore omitted a potentially substantial number of relevant stories.
The researchers’ seine net features a gaping set of holes, in other words. Perhaps the lone researcher in charge of collecting the data accounted for the problem, but Nyhan and Reifler failed to offer any assurance of it in their working paper. And obviously one cannot replicate their work without an accurate description of the search parameters, including their list of politicians.
Can fact checkers help keep the ACA debate honest?
The study by Nyhan and Reifler provides tenuous evidence at best of fact checkers’ ability to keep any public debate honest. As noted above, the study doesn’t measure the extent to which fact checkers keep politicians honest but rather the extent to which politicians respond to threats of media attacks.
If the fact checkers could at least keep themselves honest about the ACA that would be something. Fact checks involving health care can receive a fair share of spin, as the Wall Street Journal’s James Rago pointed out by example in a Pulitzer Prize-winning editorial back in 2010.
Without a consistently fair and accurate fact checker and a study design that minimizes uncertainties about how many in the treatment group received a meaningful treatment, we learn next to nothing about whether fact checkers can help keep a debate honest.
Clarification, Oct. 17, 2013: When introducing the writing of Nyhan and Reifler in Politico, added the title of the Politico article to make clear the connection to the title of this article.
Correction, Oct. 29, 2013: James Rago’s prize-winning editorial was published in 2010, not 2009 as we originally reported.