“The pursuit of knowledge is, I think, mainly actuated by love of power. And so are all advances in scientific technique.”
― Bertrand Russell
Today on the 100% CI we have to stop the jokes for a moment. Today, we will talk about being disappointed by your idols, about science criticism being taken too far. Here, we reveal how Dr. Andrew Gelman, the prominent statistician and statistics blogger, abused his power.
We do not make this accusation lightly, but our discoveries leave no other conclusion: In his skepticism of psychological science, Dr. Gelman lost sight of right and wrong. What follows is a summary of the evidence we obtained.
This March the four of us visited the Applied Statistics Center at Columbia University for a brief workshop on Stan, the probabilistic programming language. We were excited about this opportunity to learn from one of our personal heroes.
The course, however, did not live up to our expectations. Frequently, Dr. Gelman would interrupt the course with diatribes against psychological science. On the second-to-last afternoon, we were supposed to write our first Stan code in a silent study session. We were left alone in the Gelman lab and our minds wandered. Our attention was drawn to a particularly large file drawer that turned out to be unlocked. What we discovered can only be described as profoundly shocking:
But this particular file drawer problem was different: The lab log revealed that Dr Gelman was desperate to obtain evidence against the phenomenon – and failed repeatedly. Initially, he invested enormous resources to run experiments with extraordinary four digit sample sizes to “nail the coffin shut on Power Pose”, as a hand-written note on an early report reads. The data painted a very clear picture, and it was not to his liking. As it dawned on him that, contrary to his personal convictions, Power Posing might be a real phenomenon, he began to stack the deck.
Instead of simple self-reports, he tried manifest behavioral observations and even field studies where the effect was expected to vanish. Power Pose prevailed. He deliberately reduced study samples to the absurdly low numbers often criticized on his very own blog. But even in his last attempts with 1-β almost equal to ɑ: Power Pose prevailed. As more and more evidence in favor of Power Posing was gathered, the research became… sloppy. Conditions were dropped, outliers removed, moderators randomly added, and, yes, even p-values were rounded up. Much to Dr. Gelman’s frustration, Power Pose prevailed. He was *unable* to collect data in favor of the null hypothesis.
He thought he had one final Bayesian trick up his sleeve: By hiring a skilled hypnotist he manipulated his priors, his own beliefs (!) in Power Posing. But even with these inhumane levels of disbelief, the posterior always indicated beyond a doubt: Power Pose prevailed. It was almost like the data were trying to tell him something – but Dr. Gelman had forgotten how to listen to evidence a long time ago.
In a recent publication, Simmons and Simonsohn analyzed the evidential value of the published literature on Power Posing. The centerpiece of their research is a p-curve (figure below, left graph) on the basis of which they “conclusively reject the null hypothesis that the sample of existing studies examines a detectable effect.” Had Dr. Gelman not hidden his findings in a file drawer, Simmons and Simonsohn’s conclusions would have been dramatically different (right graph).
Initially, we couldn’t believe that he would go this far just to win an argument. We were sure there must have been some innocuous explanation – yet we also did not want to confront him with our suspicions right away. We wanted to catch him red-handed.
Thus, we decided to infiltrate one of his studies, which he was covertly advertising under the obvious pseudonym Mr. Dean Wangle. He administered the study wearing a fake moustache and a ridiculous French beret, but his voice is unmistakeable. Below is a video of an experimental session that we were able to record with a hidden camera. The footage is very tough to watch.
Combined, the evidence leaves only one conclusion: Andrew Gelman betrayed science in his war on power posing.
Does playing violent video games increase aggression?No, but violent video game research kinda does. What makes advertisements persuasive?David Hasselhoff, obvs. Are 5%25%75% of the population addicted to socialmedia?It’s almost like humans have a fundamental need for social interactions. Who are these people that watch porn?Literally everyone everywhere Why do we enjoy cat videos so much?WHY???
These are the typical research questions media psychologists are concerned with. Broadly, media psychology describes and explains human behavior, cognition, and affect with regards to the use and effects of media and technology. Thus, it’s a hybrid discipline that borrows heavily from social, cognitive, and educational psychology in both its theoretical approaches and empirical traditions. The difference between a social psychologist and a media psychologist that both study video game effects is that the former publishes their findings in JPSP while the latter designs “What Twilight character are you?” self-testsTEAM EDWARD FTW! for perezhilton.com to avoid starving. And so it goes.
New is always better
A number of media psychologists is interested in improving psychology’s research practices and quality of evidence. Under the editorial oversight of Nicole Krämer, the Journal of Media Psychology (JMP), the discipline’s flagshipBy “flagship” I mean one of two journals nominally dedicated to this research topic, the other being Media Psychology. It’s basically one of those People’s Front of Judea vs. Judean People’s Front situations. journal, not only signed the Transparency and Openness Promotion Guidelines, it has also become one of roughly fifty journals that offer the Registered Reports format.
To promote preregistration in general and the new submission format at JMP in particular, the journal launched a fully preregistered Special Issue on “Technology and Human Behavior” dedicated exclusively to empirical work that employs these practices. Andy PrzybylskiWho, for reasons I can’t fathom, prefers being referred to as a “motivational researcher” and I were fortunate enough to be the guest editors of this issue.
The papers in this issue are nothing short of amazing – do take a look at them even if it is outside of your usual area of interest. All materials, data, analysis scripts, reviews, and editorial letters are available here. I hope that these contributions will serve as an inspiration and model for other (media) researchers, and encourage scientists studying media to preregister designs and share their data and materials openly.
Media Psychology BCBefore Chambers
If you already suspected that in all this interdisciplinary higgledy-piggledy, media psychology did not only inherit its parent disciplines’ merits, but also some of their flaws, you’re probablyunerring-absolute-100%-pinpoint-unequivocally-no-ifs-and-buts-dead-on-the-money correct. Fortunately, Nicole was kind enough to allot our special issue editorial more space than usual in order to report a meta-scientific analysis of the journal’s past and to illustrate how some of the new practices can ameliorate the evidential value of research. For this reason, we surveyed a) availability of data, b) errors in the reporting of statistical analyses, and c) sample sizes and statistical power of all 147 studies in N = 146 original research articles published in JMP between volume 20/1, when it became an English-language publication, and volume 28/2 (the most recent issue at the time this analysis was planned). This blog post is a summary of the analyses in our editorial, which — including its underlying raw data, analysis code, and code book — is publicly available at https://osf.io/5cvkr/.
Availability of Data and Materials
Historically the availability of research data in psychology has been poor. Our sample of JMP publications suggests that media psychology is no exception to this, as we were not able to identify a single publication reporting a link to research data in a public repository or the journal’s supplementary materials.
Statistical Reporting Errors
A recent study by Nuijten et al. (2015) indicates a high rate of reporting errors in reported Null Hypothesis Significance Tests (NHSTs) in psychological research reports. To make sure such inconsistencies were avoided for our special issue, we validated all accepted research reports with statcheck 1.2.2, a package for the statistical programming language R that works like a spellchecker for NHSTs by automatically extracting reported statistics from documents and recomputingp-values are recomputed from the reported test statistics and degrees of freedom. Thus, for the purpose of recomputation, it is assumed that test statistics and degrees of freedom are correctly reported, and that any inconsistency is caused by errors in the reporting of p-values. The actual inconsistencies, however, can just as well be caused by errors in the reporting of test statistics and/or degrees of freedom.p-values.
For our own analyses, we scanned all nemp = 131 JMP publications reporting data from at least one empirical study (147 studies in total) with statcheck to obtain an estimate for the reporting error rate in JMP. Statcheck extracted a total of 1036 NHSTs reported in nnhst = 98 articles. Forty-one publications (41.8% of nnhst) reported at least one inconsistent NHST (max = 21), i.e. reported test statistics and degrees of freedom did not match reported p-values. Sixteen publications (16.3% of nnhst) reported at least one grossly inconsistent NHST (max = 4), i.e. the reported p-value is < .05 while the recomputed p-value is > .05, or vice-versa. Thus, a substantial proportion of publications in JMP seem to contain inaccurately reported statistical analyses, of which some might affect the conclusions drawn from them (see Figure 1).
Caution is advised when speculating about the causes of the inconsistencies. Many of them are probably clerical errors that do not alter the inferences or conclusions in any way.For example, in 20 cases the authors reported p = .000, which is mathematically impossible (for each of these precomputed < .001). Other inconsistencies might be explained by authors not declaring that their tests were one-tailed (which is relevant for their interpretation). However, with some concern, we observe it is unlikely to be the only cause, as in 19 out of 23 cases the reported p-values were equal to or smaller than .05 while the recomputed p-values were larger than .05, whereas the opposite pattern was observed in only four cases. Indeed, if incorrectly reported p-values resulted merely from clerical errors, we would expect inconsistencies in both directions to occur at approximately equal frequencies.
All of these inconsistencies can easily be detected using the freely available R package statcheck or, for those who do not use R, in your browser via www.statcheck.io.
Sample Sizes and Statistical Power
High statistical power is paramount in order to reliably detect true effects in a sample and, thus, to correctly reject the null hypothesis when it is false. Further, low power reduces the confidence that a statistically significant result actually reflects a true effect. A generally low-powered field is more likely to yield unreliable estimates of effect sizes and low reproducibility of results. We are not aware of any previous attempts to estimate average power in media psychology.
Strategy 1: Reported power analyses. One obvious strategy for estimating average statistical power is to examine the reported power analyses in empirical research articles. Searching all papers for the word “power” yielded 20 hits and just one of these was an article that reported an a priori determined sample size.In the 19 remaining articles power is mentioned, for example, to either demonstrate observed or post-hoc power (which is redundant with reported NHSTs), to suggest larger samples should be used in future research, or to explain why an observed nonsignificant “trend” would in fact be significant had the statistical power been higher.
Strategy 2: Analyze power given sample sizes. Another strategy is to examine the power for different effect sizes given the average sample size (S) found in the literature. The median sample size in JMP is 139 with a considerable range across all experiments and surveys (see Table 1). As in other fields, surveys tend to have healthy sample sizes apt to reliably detect medium to large relationships between variables.
For experiments (including quasi-experiments), the outlook is a bit different. With a median sample size per condition/cell of 30.67, the average power of experiments published in JMP to detect small differences between conditions (d = .20) is 12%, 49% for medium effects (d = .50), and 87% for large effects (d = .80). Even when assuming that the average effect examined in the media psychological literature could be as large as those in social psychology (d = .43), our results indicate that the chance that an experiment published in JMP will detect them is 38%, worse than flipping a coin.An operation that would be considerably less expensive.
Table 1. Sample sizes and power of studies published in JMP volumes 20/1 to 28/2. n = Number of published studies; MDS = Median sample size; MDs/cell = Median sample size per condition; 1-ßr=.1/d=.2 / 1-ßr=.3/d=.5 / 1-ßr=.5/d=.8 = Power to detect small/medium/large bivariate relationships/differences between conditions.
For between-subjects, mixed designs, and total we assumed independent t-tests. For within-subjects designs we assumed dependent t-tests. All tests two-tailed, α = .05. Power analyses were conducted with the R package pwr 1.20
Feeling the Future of Media Psychology
The above observations could lead readers to believe that we are concerned about the quality of publications in JMP in particular. If anything, the opposite is true, as this journal recently committed itself to a number of changes in its publishing practices to promote open, reproducible, high-quality research. These analyses are simply another step in a phase of sincere self-reflection. Thus, we would like these findings, troubling as they are, to be taken not as a verdict, but as an opportunity for researchers, journals, and organizations to reflect similarly on their own practices and hence improve the field as a whole.
One key area which could be improved in response to these challenges is how researchers create, test, and refine psychological theories used to study media. Like other psychology subfields, media psychology is characterized by frequent emergence of new theories which purport to explain phenomena of interest.As James Anderson recently put it in a very clever paper (as usual): “Someone entering the field in 2014 would have to learn 295 new theories the following year.” This generativity may, in part, be a consequence of the fuzzy boundaries between exploratory and confirmatory modes of social sciences research.
Both modes of research – confirming hypotheses and exploring uncharted territory – benefit from preregistration. Drawing this distinction helps the reader determine which hypotheses carefully test ideas derived from theory and previous empirical studies, and it liberates exploratory research from the pressure to present an artificial hypothesis-testing narrative.
As technology experts, media psychology researchers are well positioned to use and study new tools that shape our science. A range of new web-based platforms have been built by scientists and engineers at the Center for Open Science, including their flagship, the OSF, and preprint services like PsyArXiv. Designed to work with scientists’ existing research flows, these tools can help prevent data loss due to hardware malfunctions, misplacement,Including mindless grad students and hungry dogs or relocations of researchers, while enabling scientists to claim more credit by allowing others to use and cite their materials, protocols, and data. A public repository for media psychology research materials is already in place.
Like psychological science as a whole, media psychology faces a pressing credibility gap. Unlike some other areas of psychological inquiry,such as meta science however, media research — whether concerning the Internet, video games, or film — speaks directly to everyday life in the modern world. It affects how the public forms their perceptions of media effects, and how professional groups and governmental bodies make policies and recommendations. In part because it is key to professional policy, empirical findings disseminated to caregivers, practitioners, and educators should be built on an empirical foundation with sufficient rigor.
We are, on balance, optimistic that media psychologists can meet these challenges and lead the way for psychologists in other areas. This special issue and the registered reports submission track present an important step in this direction and we thank the JMP editorial board, our expert reviewers, and of course, the dedicated researchers who devoted their limited resources to this effort.
The promise of building an empirically-based understanding of how we use, shape, and are shaped by technology is an alluring one. We firmly believe that incremental steps taken towards scientific transparency and empirical rigor will help us realize this potential.
If you read this entire post, there’s a 97% chance you’re on Team Edward.
It’s almost like humans have a fundamental need for social interactions.
Literally everyone everywhere
TEAM EDWARD FTW!
By “flagship” I mean one of two journals nominally dedicated to this research topic, the other being Media Psychology. It’s basically one of those People’s Front of Judea vs. Judean People’s Front situations.
Who, for reasons I can’t fathom, prefers being referred to as a “motivational researcher”
p-values are recomputed from the reported test statistics and degrees of freedom. Thus, for the purpose of recomputation, it is assumed that test statistics and degrees of freedom are correctly reported, and that any inconsistency is caused by errors in the reporting of p-values. The actual inconsistencies, however, can just as well be caused by errors in the reporting of test statistics and/or degrees of freedom.
For example, in 20 cases the authors reported p = .000, which is mathematically impossible (for each of these precomputed < .001). Other inconsistencies might be explained by authors not declaring that their tests were one-tailed (which is relevant for their interpretation).
In the 19 remaining articles power is mentioned, for example, to either demonstrate observed or post-hoc power (which is redundant with reported NHSTs), to suggest larger samples should be used in future research, or to explain why an observed nonsignificant “trend” would in fact be significant had the statistical power been higher.
An operation that would be considerably less expensive.
As James Anderson recently put it in a very clever paper (as usual): “Someone entering the field in 2014 would have to learn 295 new theories the following year.”
(this post was jointly written by Malte & Anne; in a perfect metaphor for academia, WordPress doesn’t know how to handle multiple authors)
We believe in scientific openness and transparency, and consider unrestricted access to data underlying publications indispensable. Therefore, weNot just the authors of this post, but all four of us. signed the Peer Reviewers’ Openness (PRO) Initiative, a commitment to only offer comprehensive review for or recommend the publication of a manuscript if the authors make their data and materials publicly available, unless they provide compelling reasonsThe data-hungry dog of a former grad student whose name you forgot is not a compelling reason. why they cannot do so (e.g. ethical or legal restrictions).
As reviewers, we enthusiastically support PRO and its values.
Also as reviewers, we think PRO can be a pain in the arse.
Ok, not really. But advocating good scientific practice (like data sharing) during peer review can result in a dilemma.
This is how it’s supposed to work: 1) You accept an invitation for review. 2) Before submitting your review, you ask the editor to relay a request to the authors to share their data and materials (unless they have already done so). 3) If authors agree – fantastic. If authors decline and refuse to provide a sensible rationale why their data cannot be shared – you reject the paper. Simple.
So far, so PRO. But here’s where it gets hairy: What happens when the action editor handling the paper refuses to relay the request for data, or even demands that such a request is removed from the written review?
Here is a reply Anne recently got from an editor after a repeatedThe editor apologised for overlooking the first email – most likely an honest mistake. Talk to Chris Chambers if you want to hear a few stories about the funny tendency of uncomfortable emails to get lost in the post. PRO request:
“We do not have these requirements in our instructions to authors, so we can not ask for this without discussing with our other editors and associate editors. Also, these would need to involve the publication team. For now, we can relieve you of the reviewing duties, since you seem to feel strongly about your position.
Let me know if this is how we should proceed so we do not delay the review process further for the authors.”
Much like judicial originalists insist on interpreting the US constitution literally as it was written by a bunch of old white dudes more than two centuries ago, editors will sometimes cite existing editorial guidelines by which authors obligate themselves to share data on request, but only after a paper has been published, which has got to be the “Don’t worry, I use protection” argument of academia.We picked a heterosexual male perspective here but we’re open to suggestions for other lewd examples. Also, we know that this system simply does. not. work.
As reviewers, it is our duty to evaluate submitted research reports, and data are not just an optional part of empirical research – they are the empirical research (the German Psychological Society agrees!). You wouldn’t accept a research report based on the promise that “theory and hypotheses are available on request”, right?Except when reviewing for Cyberpsychology.
PRO sets “data or it didn’t happen” as a new minimum standard for scientific publications. As a consequence, comprehensive review should only be offered for papers that meet this minimum standard. The technically correctThe best kind of being correct. application of the PRO philosophy for the particular case of the Unimpressed Editor is straightforward: When they decide – on behalf of the authors! – that data should or will not be shared, the principled consequence is to withdraw the offer to review the submission. As they say, PRO before the status quo.
Withdrawing from the process, however, decreases the chance that the data will be made publicly accessible, and thus runs counter to PRO’s ideals. As we say in German, “Operation gelungen, Patient tot” – surgery successful, patient deceased.
Adhering strictly to PRO would work great if everybody participated: The pressure on non-compliant journals would become too heavy. Then again, if everybody already participated, PRO wouldn’t be a thing. In the world of February 2017, editors can just appoint the next best reviewerWithdrawing from review might still have an impact in the absence of a major boycott by causing the editors additional hassle and delaying the review process – then again, this latter part would unfairly harm the authors, too. who might simply not care about open data – and couldn’t you push for a better outcome if you kept your foot in the door? Then again, if all PRO signatories eroded the initiative’s values that way, the day of reaching the critical mass for a significant boycott would never come.
A major concern here is that the authors are never given the chance to consider the request although they might be receptive to the arguments presented. If increased rates of data sharing is the ultimate goal, what is more effective: boycotting journals that actively suppress such demands by invited reviewers, or loosening up the demands and merely suggest that data should be shared so at least the gist of it gets through?
There are two very different ways to respond to such editorial decisions, and we feel torn because each seems to betray the values of open, valuable, proper scientific research. You ask: What is the best strategy in the long run? Door in the face! Foot in the door! Help, I’m trapped in a revolving door! We would really like to hear your thoughts on this!
RE: Door in the face
Thank you very much for the quick response.
Of course I would have preferred a different outcome, but I respect your decision not to request something from the authors that wasn’t part of the editorial guidelines they implicitly agreed to when they submitted their manuscript.
What I do not agree with are the journal’s editorial guidelines themselves for the reasons I provided in my previous email. It seems counterproductive to invite peers as “gatekeepers” while withholding relevant information that are necessary for them to fulfill their duty until the gatekeeping process has been completed.
Your decision not even to relay my request for data sharing to the authors (although they might gladly do so!), unfortunately, bars me from providing a comprehensive review of the submission. It is literally impossible for me to conclude a recommendation about the research as a whole when I’m only able to consider parts of it.
Therefore, I ask that you unassign me as a reviewer, and not invite me again for review except for individual manuscripts that meet these standards, or until the editorial policy has changed.
RE: Foot in the door
Thank you very much for the quick response.
Of course I would have preferred a different outcome, but I respect your decision not to request something from the authors that wasn’t part of the editorial guidelines they implicitly agreed to when they submitted their manuscript.
In fact, those same principles should apply to me as a reviewer, as I, too, agreed to review the submission under those rules. Therefore, in spite of the differences in my own personal standards versus those presented in your editorial guidelines, I have decided to complete my review of the manuscript as originally agreed upon.
You will see that I have included a brief paragraph on the benefits of data sharing in my review. I neither demand the authors share their data nor will I hold it against them if they refuse to do so at this point. I simply hope they are persuaded by the scientific arguments presented in my review and elsewhere — In fact, I hope that you are too.
I appreciate this open and friendly exchange, and I hope that you will consider changing the editorial guidelines to increase the openness, robustness, and quality of the research published in your journal.
Not just the authors of this post, but all four of us.
The data-hungry dog of a former grad student whose name you forgot is not a compelling reason.
The editor apologised for overlooking the first email – most likely an honest mistake. Talk to Chris Chambers if you want to hear a few stories about the funny tendency of uncomfortable emails to get lost in the post.
We picked a heterosexual male perspective here but we’re open to suggestions for other lewd examples.
Withdrawing from review might still have an impact in the absence of a major boycott by causing the editors additional hassle and delaying the review process – then again, this latter part would unfairly harm the authors, too.
A research parasite, a destructo-critic, a second-stringer, and a methodological terrorist walk into a bar. Their collective skepticism creates a singularity, so they morph into a flairless superbug and start a blog just to make things worse for everyone.
This is, roughly, our origin story. Who are we? We are The 100% CI, bound by a shared passion for horrible puns and improving our inferences through scientific openness and meta science.
You know what I need in my life right now? Another blog on meta science!, said no one ever. Ok, sure, that’s fair, BUT:
We are 4 Germans, which approximates 1 Gelman according to our calculationsAnalysis scripts are available upon request.If you request them, we will not respond to your emails for several months. Also a grad student ate lost them.
We will be blogging about other stuff. This week alone we will have posts on