The Red Team Challenge (Part 2): The Arbiter’s View

This post is second in a series. The first part is Why I placed a bounty on my own research, the third part is Current and Future Implementations of Red Teams

I recently signed up to be a neutral arbiter in a Red Team Challenge. My position entailed reading the error reports, examining the author’s response and subsequently deciding what counted as minor and major errors.

When Nicholas Coles asked me to serve as an arbiter, I was flattered and surprised. Flattered, because my own experience in error detection consists mainly of being a veteran of many corrections of my own work. This experience led me to institute a bug bounty program for my work, where submitters get paid depending on the severity of the bug that they find. I was not always happy with how I’d dealt with errors in my work, so I was happy to see that someone thought I had displayed enough sense to consider me as an arbiter. Surprised, because when I first met Nicholas at a workshop (topic: predicting the odds that studies would replicate), I basically told him that he was wasting his time still studying facial feedback, the topic of the research being evaluated (although I admitted that it was one of the ideas that had made me want to study psychology before the replication crisis). Funny enough, at that workshop, I was awarded a “prize” for being the person most pessimistic about the odds that a study would replicate. Having evaluated studies in a team with me, Nicholas knew we disagreed frequently, including on the topic of demand characteristics, the other topic of the work he would submit to the Red Team Challenge. It didn’t seem that he was planning to go easy on himself.

I expected I would have to arbitrate many cases where the author refused to see a reasonable point made by the Red Teamers. When I had to issue corrections for my own work, I could, in retrospect, always recall a bargaining phase, where I was not yet willing to accept that I’d made an error. Similarly, when I have critiqued others’ research, I often feel as if the mood is one of a courtroom—admit as little as possible and go on the counter-offensive if you see an opening—rather than a collaborative attempt to get closer to the truth. The temptation to dig in your heels is strong, as Andrew Gelman has documented, in many instances people even double down.

In the end, this was the only disappointment in this process—Nicholas was the picture of a humble scientist and was eager to admit errors. In some cases, I even felt that I should prevent him from admitting to errors that, to me, seemed inconsequential, or not really “errors” but simply alternative approaches.

Red Teamers varied in whether they made a strong case for the problems that they identified. Sometimes they reanalyzed the data or revisited the literature to build their case. Other times, they they just noted a suspicion that something might be off (without following up on it themselves). Although nobody quite threw everything at the wall to see what would stick, thresholds for submitting an error varied. One thing that struck me though, was the contrast with typical peer review. It seems to me that most peer-reviewers follow a low-effort strategy (e.g., suggesting many robustness checks, some of which are certain not to make a substantial difference, or a suspicion without laying out a path how they might be reassured). My sense is that many authors have had the experience that reviewer comments are a mixed bag of crucial issues, “nice-to-haves”, annoying idiosyncrasies, and flat out wrong comments. This is part of my frustration with peer review and a reason why I doubt that peer review always improves a paper.

Viewing the Red Team Challenge through this lens, I thought about how we teach criticism in science. We don’t really have a unified approach, and I get the sense there is no track for the professionalization and specialization of a critic, as we have for research methods. I definitely have a notion of who I would “rehire” for the next Red Team Challenge, and the nascent subculture of error-detection Twitter makes me think that there are people who just have a nose for crucial problems. Given how many errors sneak through undetected using the current approach in academia, it would be great if we could pay these people to do what they’re best at and to hone their skills. I’d take the job in a heartbeat if it was on offer (and if I’d have some reasonable protection against retaliation). I really hope this continues to catch on.

Posts

The Red Team Challenge (Part 2): The Arbiter’s View

1 thought on “The Red Team Challenge (Part 2): The Arbiter’s View”