What’s an age-effect net of all time-varying covariates?

TL;DR: What’s an age-effect net of all time-varying covariates?
The sound of one hand clapping.

Recently, we submitted a paper with some age trajectories of measures of individuals’ (un-)well-being. We thought of these trajectories in the most descriptive way: How do these measures change across the life course, all things considered? And really while this might not be the most interesting research question because it doesn’t directly answer why stuff happens, I’m a fan of simple descriptive studies and think they should have a place in our domain; Paul Rozin wrote a great piece on the importance of descriptive studies.

Anyway, the editor asked us to justify why we did not include any time-varying covariates (e.g. income, education, number of children, health) in our analysis of age trajectories. I thought the editor had requested an actual justification; my co-author (an economist) thought the editor just wanted to tell us that we should throw in all sorts of covariates. I felt too lazy to re-run all analyses and create new figures and tables, plus I always get a weird twitch in my left eye when somebody asks for “statistical control” without additional justification, so instead I looked into the (scientific) literature on the midlife crisis and tried to figure out how people have justified the inclusion of control variables in the analyses of age effects on well-being.[1]Ruben, on the other hand, would probably get a twitch in his right eye if he found out I did not automate making my figures and tables.

Cat ownership, a time-varying covariate. (Pic: pixabay.com)

Whether or not life satisfaction dips in middle adulthood (somewhere between age 45-64) before rising again in older age[2]before dipping again in a terminal decline. If you read that footnote, you’re now in the mortality salience condition of my Terror Management Theory study. has been hotly debated by psychologists and economists. There are a lot of papers out there on that subject and personally, I’m totally agnostic regarding the existence of the midlife crisis – ask me again in 20 years, if I’m not too busy driving my Porsche. But there are a lot of interesting methodological questions that arise when trying to answer this question.

A brief list of stuff I don’t want to talk about in this post, which are important nonetheless:

  • the Age-Period-Cohort conundrum: In short, this requires us to make certain assumptions when we want to identify age/period/cohort effects. That’s okay though, every researcher needs to make assumptions from time to time.
  • longitudinal vs. cross-sectional data: Both can have their pros and cons.
  • statistical control for covariates that don’t change over time, such as gender or race, as systematic differences in the composition in the age groups can cause issues
  • what we can learn from lab studies in which researchers recruit older people and then compare their performance on an arbitrary task X to the performance of their convenient undergraduate sample. How do you reasonably match 60 year old people that decided to participate in lab studies onto a younger sample of psych majors that really just want to get their freakin’ course credit?
  • lots of other interesting stuff you can do with longitudinal data that is more interesting than simple descriptive trajectories

But let’s get back to the topic our editor raised: Should we control for time-varying covariates such as income, marital status, health? The logic seems straightforward: Wouldn’t we want to “purge” the relationship between age and life satisfaction from other factors?

Quite obviously, a lot of stuff changes as we age. We get older, get our degrees, start a decent job and make some money,[3]Or, alternatively, start a blog. maybe marry and settle down or travel to an Ashram to awaken our inner goddess and spite our conservative parents or maybe just get a lot of cats.

Mount Midlife Crisis, not to be confused with Mount Doom. (Art: Hakuin Ekaku)

To control for these variables might be wrong for two distinct reasons, and I will start with the somewhat more obscure one.

First, our time-varying covariate might actually be causally affected by life satisfaction. This argument has been raised regarding the statistical control of marital status by the late Norval Glenn (2009). He simulated a data situation in which (1) life satisfaction is stable across the life course and (2) starting from 21, only the 10 happiest people marry each year. He then demonstrated that controlling for marital status will result in a spurious trajectory, that is, a pronounced decline of life satisfaction over the life course even though we know that there’s no age effect in the underlying data. If you have read this blog before and the data situation sounds somewhat familiar to you: Marital status would be one of the infamous colliders that you should not control for because if you do, I will come after you.[4]And you should be scared because I can deliver angry rants about inapproriate treatment of third variables. I might bring cookies and hope that you have decent coffee because this might take longer. If marital status is affected by age (the older you are, the more likely you are to be married), and if satisfied people are more likely to marry, marital status becomes a collider of its two causes and should not be controlled.

The second reason is somewhat more obvious: In many cases, the time-varying covariates will mediate the effects of age on your outcome. That is probably most obvious for health: Health declines with age. Decreases in health affect life satisfaction.[5]They obviously do, though the fact that life satisfaction remains stable until a certain age despite decreases in health has been labeled the happiness paradox. So life satisfaction might decrease with age because of declining health. Now what does it mean if we control for this potential mediator?

Well, it means that we estimate the age effect net of the parts that are mediated through health. That is not inherently nonsensical, we just have to interpret the estimate properly. For example, Andrew Oswald was cited in Vol. 30 of the Observer: “[But] encouragingly, by the time you are 70, if you are still physically fit then on average you are as happy and mentally healthy as a 20 year old.” Now this might be indeed encouraging for people who think they are taking great care of their health and predict that they will be healthy by the time they are 70; but whether it’s encouraging on average strongly depends on the average health at age 70.
For example, if we assume that only the luckiest 1% of the population will be physically fit at that age, 99% will end up unhappier than 20 year olds (whether or not 20-year-olds are very happy is a different question). That doesn’t sound very optimistic any more, does it? The lucky one percent might also be very special with respect to other characteristics such as income, and a message such as “the wealthy will be still happy at age 70, whereas the poor are wasting away because of a lack of health care” again sounds not very encouraging. For the record, I’m not claiming that this is happening, but those are all scenarios that are aligned with the simple statement that those who are physically fit at age 70 are as mentally healthy as 20 year olds.

So the estimated association has its own justification but must be interpreted carefully.[6]Actually, there is yet another problem that has been pointed out by Felix Thoemmes in the comments section of this post. A mediator is also almost always a collider, as it is per definition caused by the independent variable of interest and (most likely) by some other factors. So you would actually have to control the backdoor paths from the mediator in turn, or else your estimate of the direct effect, whatever it reflects, will actually be biased again. Additionally, it renders the “remaining” age effect hard to interpret, so it might not be very enlightening to look at age effects net of the effects of time-varying covariates. Let’s assume you “control” all sorts stuff that happens in life as people age – marital status, education, income, number of children, maybe also number of good friends, cat ownership, changes in health, and when we are already at it, why don’t we also control for the stuff that is underlying changes in health, such as functioning of organs and cell damage? – and still find a significant age effect.

What does that mean? Well, it means that you haven’t included all time-varying covariates that are relevant to life satisfaction because age effects must necessarily be mediated through something. The sheer passing of time only has effects because stuff happens in that time.
The “stuff” might be all sorts of things, and we might be inclined to consider that stuff more or less psychologically meaningful. For example, we might not consider changes in marital status or physical health to be “genuinely psychological”, so we might decide to control for these things to arrive at a purely psychological age effect. Such a “purely psychological” age effect might then be driven by e.g. people’s attitude towards the world. For example, people might get more optimistic and thus more satisfied controlling for other life circumstances. But I would again be careful with those interpretations, because of the collider problem outlined before and because of the somewhat arbitrary distinction between physical changes and social role changes as opposed to psychological changes.

In other words: what you should or shouldn’t control for always depends on your research question. If you study living things and control for life, don’t be surprised if your results seem a bit dull.

Footnotes   [ + ]

1. Ruben, on the other hand, would probably get a twitch in his right eye if he found out I did not automate making my figures and tables.
2. before dipping again in a terminal decline. If you read that footnote, you’re now in the mortality salience condition of my Terror Management Theory study.
3. Or, alternatively, start a blog.
4. And you should be scared because I can deliver angry rants about inapproriate treatment of third variables. I might bring cookies and hope that you have decent coffee because this might take longer.
5. They obviously do, though the fact that life satisfaction remains stable until a certain age despite decreases in health has been labeled the happiness paradox.
6. Actually, there is yet another problem that has been pointed out by Felix Thoemmes in the comments section of this post. A mediator is also almost always a collider, as it is per definition caused by the independent variable of interest and (most likely) by some other factors. So you would actually have to control the backdoor paths from the mediator in turn, or else your estimate of the direct effect, whatever it reflects, will actually be biased again.

That one weird third variable problem nobody ever mentions: Conditioning on a collider

Scroll to the very end of this post for an addendum.[1]If you only see footnotes, you have scrolled too far.

Reading skills of children correlate with their shoe size. Number of storks in an area correlates with birth rate. Ice cream sales correlate with deaths by drowning. Maybe they used different examples to teach you, but I’m pretty sure that we’ve all learned about confounding variables during our undergraduate studies. After that, we’ve probably all learned that third variables ruin inference, yadda yadda, and obviously the only way to ever learn anything about cause and effect are proper experiments, with randomization and stuff. End of the story, not much more to learn about causality.[2]YMMV and I hope that there are psych programs out there that teach more about causal inference in non-experimental settings. Throw in some “control variables” and pray to Meehl that some blanket statement [3]“Experimental studies are needed to determine whether…” will make your paper publishable anyway.

Here is the deal though, there is much more to learn about causal inference. If you want to invest more time in this topic, I suggest you take a look at Morgan and Winship’s Counterfactuals and Causal Inference: Methods and Principles for Social Research.[4]Added bonus: After reading this, you will finally know how to decide whether or not a covariate is necessary, unnecessary, or even harmful. If you don’t have the time to digest a whole book[5]I have been informed that only grad students can afford to actually read stuff, which is kind of bad, isn’t it?, read Felix Elwert’s chapter on Graphical Causal Models and maybe also his paper on colliders.

Causal inference from observational data boils down to assumptions you have to make[6]There’s no free lunch in causal inference. Inference from your experiment, for example, depends on the assumption that your randomization worked. And then there’s the whole issue that the effects you find in your experiment might have literally nothing to do with the world that happens outside the lab, so don’t think that experiments are an easy way out of this misery. and third variables you have to take into account. I’m going to talk about a third variable problem today, conditioning on a collider. You might not have heard of this before, but every time you condition on a collider, a baby stork gets hit by an oversized shoe filled with ice cream[7]Just to make sure: I don’t endorse any form of animal cruelty. and the quality of the studies supporting your own political view deteriorates.[8]If you are already aware of colliders, you will probably want to skip the following stupid jokes and smugness and continue with the last two paragraphs in which I make a point about viewpoint bias in reviewers’ decisions.

Let’s assume you were interested in the relationship between conscientiousness and intelligence. You collect a large-ish sample of N = 10,000[9]As we say in German: “Gönn dir!” and find a negative correlation between intelligence and conscientiousness of r = – .372 (see Figure 1).

Figure 1: The relationship between IQ and conscientiousness in your hypothetical college sample. Anne expressed doubts regarding college students with an IQ of 75-85 and she might be right about that, but that’s what you get for sloppy data simulations.

However, your sample consisted only of college students. Now you might be aware that there is a certain range restriction in intelligence of college students (compared to the overall population), so you might even go big and claim that the association you found is probably an underestimation! Brilliant.

The collider – being a college student – rears its ugly head. Being a college student is positively correlated with intelligence (r = .426). It is also positively correlated with conscientiousness (r = .433).[10]Just to make sure: This is fake data. Fake data should not be taken as evidence for the actual relationship between a set of variables (though some of the more crafty and creative psychologists might disagree). Let’s assume that conscientiousness and intelligence have a causal effect on college attendance, and that they are actually not correlated at all in the general population, see Figure 2.

Figure 2. Oh no, where did your correlation go?

If you select a college sample (i.e. the pink dots), you will find a negative correlation between conscientiousness and intelligence of, guess what, exactly r = -.372, because this is how I generated my data. There is a very intuitive explanation for the case of dichotomous variables:[11]The collider problem is just the same for continuous measures. In the population, there are smart lazy people, stupid diligent people, smart diligent people and stupid lazy people.[12]Coincidentally, you will find each of the four combinations represented among the members of The 100% CI at any given point in time, but we randomly reassign these roles every week. In your hypothetical college sample, you would have smart lazy people, stupid diligent people, smart diligent people but no stupid lazy people because they don’t make it to college.[13]Ha, ha, ha. Thus, in your college sample, you will find a spurious correlation between conscientiousness and intelligence.[14]Notice that you might be very well able to replicate this association in every college sample you can get. In that sense, the negative correlation “holds” in the population of all college students, but it is a result from selection into the sample (and not causal processes between conscientiousness and intelligence, or even good old fashioned confounding variables) and doesn’t tell you anything about the correlation in the general population.
By the way, additionally sampling a non-college sample and finding a similar negative correlation among non-college peeps wouldn’t strengthen your argument: You are still conditioning on a collider. From Figure 2, you can already guess a slight negative relationship in the blue cloud,[15]If you are really good at guessing correlations (it’s a skill you can train!) you might even see that it’s about r = -.200, and pooling all data points and and estimating the relationship between IQ and conscientiousness while controlling for the collider results in r = -.240. Maybe a more relevant example: If you find a certain correlation in a clinical sample, and you find the same correlation in a non-clinical sample, that doesn’t prove it’s real in the not-so-unlikely case that ending up in the clinical sample is a collider caused by the variables you are interested in.

On an abstract level: Whenever X1 (conscientiousness) and X2 (intelligence) both cause Y (college attendance) in some manner, conditioning on Y will bias the relationship between X1 and X2 and potentially introduce a spurious association (or hide an existing link between X1 and X2, or exaggerate an existing link, or reverse the direction of the association…). Conditioning can mean a range of things, including all sort of “control”: Selecting respondents based on their values on Y?[16]or anything that is caused by Y, because the whole collider logic also extends to so-called descendants of a collider That’s conditioning on a collider. Statistically controlling for Y? That’s conditioning on a collider. Generating propensity scores based on Y to match your sample for this variable? That’s conditioning on a collider. Running analyses separately for Y = 0 and Y = 1? That’s conditioning on a collider. Washing your hair in a long, relaxing shower at CERN? You better believe that’s conditioning on a collider. If survival depends on Y, there might be no way for you to not condition on Y unless you raise the dead.

When you start becoming aware of colliders, you might encounter them in the wild, aka everyday life. For example, I have noticed that among my friends, those who study psychology (X1) tend to be less aligned with my own political views (X2). The collider is being friends with me (Y): Psychology students are more likely to become friends with me because, duh, that’s how you find your friends as a student (X1->Y). People who share my political views are more likely to become friends with me (X2->Y). Looking at my friends, they are either psych peeps or socialist anti-fascist freegan feminists.[17]This might sound like I want to imply that the other authors of this blog are fascists, but that wasn’t true last time I checked. Even though those two things are possibly positively correlated in the overall population,[18]Actually, I’m pretty damn sure that the average psych student is more likely to be a socialist anti-fascist freegan feminist than the average person who is not a psychology student. the correlation in my friends sample is negative (X1 and X2 are negatively correlated conditional on Y).

Other examples: I got the impression that bold claims are negatively correlated with methodological rigor in the published psychological literature, but maybe that’s just because both flashy claims and methodological rigor increase chances of publication and we just never get to see the stuff that is both boring and crappy?[19]This might come as less of a surprise to you if you’re a journal editor because you get to see the whole range.
At some point, I got the impression that female (X1) professors were somewhat smarter (X2) than male professors, and based on that, one might conclude that women are smarter than men. But female professors might just be smarter because tenure (Y) is less attainable for women (X1->Y)[20]For whatever reason, you can add mediators such as discrimination and more likely for smart people (X2->Y), so that only very smart women become professors but some mediocre males can also make it. The collider strikes again!

Tenure and scientific eminence are nice examples in general because they are colliders for a fuckload of variables. For example, somebody had suggested that women were singled out as instances of bad science because of their gender. Leaving aside the issue whether women are actually overrepresented among the people who have been shamed for sloppy research,[21]I actually have no clue whether that’s true or not, I just don’t have any data and no intuition on that matter such an overrepresentation would neither tells us that women are unfairly targeted nor that women are more prone to bad research practices.[22]Notice that both accounts would equal a causal effect of gender, as the arrows are pointing away from “gender” and end at “being criticised for bad research”, no matter what happens in between. Of course, the parts in between might be highly informative. Assuming that women (X1) have worse chances to get into the limelight than men, but overstating the implications of your evidence (X2) helps with getting into the limelight; we could find that women in the limelight (conditioning on Y) are more likely to have overstated their evidence because the more tempered women simply didn’t make it. That’s obviously just wild speculation, but in everyday life, people are very willing to speculate about confounding variables, so why not speculate a collider for a change?

Which leads to the last potential collider that I would like you to consider. Let’s assume that the methodological rigor of a paper (X1) makes you more likely to approve of it as a reviewer. Furthermore, let’s assume that you – to some extent – prefer papers that match your own bias (X2).[23]For example, I believe that the metric system is objectively superior to others, so I wouldn’t approve of a paper that champions the measurement of baking ingredients in the unit of horse hooves. If you think I chose this example because it sounds so harmless, you haven’t heard me rant about US-letter format yet. Even if research that favors your point of view is on average just as good as research that tells a different story (X1 and X2 are uncorrelated), your decision to let a paper pass or not (Y) will introduce a negative correlation: The published papers that match your viewpoint will on average be worse.[24]Plomin et al. claimed that the controversy surrounding behavioral genetics led to the extra effort necessary to build a stronger foundation for the field, which is the flipside of this argument.

So peeps, if you really care about a cause, don’t give mediocre studies an easy time just because they please you: At some point, the whole field that supports your cause might lose its credibility because so much bad stuff got published.

The End. (credit: Martin Brümmer)

Addendum: Fifty Shades of Colliders

Since publishing this post, I have learned that a more appropriate title would have been “That one weird third variable problem that gets mentioned quite a bit across various contexts but somehow people seem to lack a common vocabulary so here is my blog post anyway also time travel will have had ruined blog titles by the year 2100.”

One of my favorite personality psychologists,[25]Also: One of the few people I think one can call “personality psychologist” without offending them. Not sure though. *hides* Sanjay Srivastava, blogged about the “selection-distortion effect” before it was cool, back in 2014.

Neuro-developmental psychologist Dorothy Bishop talks about the perils of correlational data in the research of developmental disorders in this awesome blog post and describes the problems of within-groups correlations.

Selection related to phenotypes can bias correlations in genetic studies which has be pointed out by (1) James Lee in Why It Is Hard to Find Genes Associated With Social Science Traits by Chris Chabris et al. and (2) in Collider Scope by Marcus Munafò et al.

Last but not least, Patrick Forscher just started a series of blog post about causality (first and second post are already up), starting from the very scratch. I highly recommend his blog for a more systematic yet entertaining introduction to the topic![26]No CERN jokes though. Those are the100.ci-exclusive!

Footnotes   [ + ]

1. If you only see footnotes, you have scrolled too far.
2. YMMV and I hope that there are psych programs out there that teach more about causal inference in non-experimental settings.
3. “Experimental studies are needed to determine whether…”
4. Added bonus: After reading this, you will finally know how to decide whether or not a covariate is necessary, unnecessary, or even harmful.
5. I have been informed that only grad students can afford to actually read stuff, which is kind of bad, isn’t it?
6. There’s no free lunch in causal inference. Inference from your experiment, for example, depends on the assumption that your randomization worked. And then there’s the whole issue that the effects you find in your experiment might have literally nothing to do with the world that happens outside the lab, so don’t think that experiments are an easy way out of this misery.
7. Just to make sure: I don’t endorse any form of animal cruelty.
8. If you are already aware of colliders, you will probably want to skip the following stupid jokes and smugness and continue with the last two paragraphs in which I make a point about viewpoint bias in reviewers’ decisions.
9. As we say in German: “Gönn dir!”
10. Just to make sure: This is fake data. Fake data should not be taken as evidence for the actual relationship between a set of variables (though some of the more crafty and creative psychologists might disagree).
11. The collider problem is just the same for continuous measures.
12. Coincidentally, you will find each of the four combinations represented among the members of The 100% CI at any given point in time, but we randomly reassign these roles every week.
13. Ha, ha, ha.
14. Notice that you might be very well able to replicate this association in every college sample you can get. In that sense, the negative correlation “holds” in the population of all college students, but it is a result from selection into the sample (and not causal processes between conscientiousness and intelligence, or even good old fashioned confounding variables) and doesn’t tell you anything about the correlation in the general population.
15. If you are really good at guessing correlations (it’s a skill you can train!) you might even see that it’s about r = -.200,
16. or anything that is caused by Y, because the whole collider logic also extends to so-called descendants of a collider
17. This might sound like I want to imply that the other authors of this blog are fascists, but that wasn’t true last time I checked.
18. Actually, I’m pretty damn sure that the average psych student is more likely to be a socialist anti-fascist freegan feminist than the average person who is not a psychology student.
19. This might come as less of a surprise to you if you’re a journal editor because you get to see the whole range.
20. For whatever reason, you can add mediators such as discrimination
21. I actually have no clue whether that’s true or not, I just don’t have any data and no intuition on that matter
22. Notice that both accounts would equal a causal effect of gender, as the arrows are pointing away from “gender” and end at “being criticised for bad research”, no matter what happens in between. Of course, the parts in between might be highly informative.
23. For example, I believe that the metric system is objectively superior to others, so I wouldn’t approve of a paper that champions the measurement of baking ingredients in the unit of horse hooves. If you think I chose this example because it sounds so harmless, you haven’t heard me rant about US-letter format yet.
24. Plomin et al. claimed that the controversy surrounding behavioral genetics led to the extra effort necessary to build a stronger foundation for the field, which is the flipside of this argument.
25. Also: One of the few people I think one can call “personality psychologist” without offending them. Not sure though. *hides*
26. No CERN jokes though. Those are the100.ci-exclusive!

Things are bad and yet I’m mildly optimistic about the future of psychology

Psychology is fucked. I’m not going to reiterate the whole mess because it already has been aptly summarized in Sanjay Srivastava’s “Everything is fucked: The syllabus”.[1]but cf. more recent real-world fuck ups: For example, the retraction of a paper by William Hart, supposedly because a grad student faked data but the whole story turns out to be way more messy: The study was ridiculous in the first place and either Hart is exceptionally unlucky in his choice of grad students or he could have known about the issue earlier. The order of events has remained a mystery. And, of course, the instant classic fuck up of the grad student who never said no: Brian Wansink recalls fond memories of a grad student who went on “deep data dives” under his guidance that results in four publications; three brilliant data detectives in turn dive into these papers and find about 150 numerical inconsistencies; Wansink promises to clean up the mess (but really how much hope is there when people are not even able to consistently calculate from two numbers the change in percent). But who cares, it’s not like we are doing chemistry here.
Quite a few people seem to be rather pessimistic about the future of psychology. This includes both old farts[3]Self-designation introduced by Roberts (2016) who are painfully aware that calls for improved methodology[4]cf. Paul Meehl, the Nostradamus of psychology. have been ignored before, as well as early-career researchers who watch peers ascend Mount Tenure with the help of a lot of hot air and only little solid science.

But hey, I’m still somewhat optimistic about the future of psychology, here’s why:[5]Alternative explanation: I’m just an optimistic person. But I’ve noticed that heritability estimates don’t really make for entertaining blog posts.

THE PAST

Sometimes, it helps to take a more historical perspective to realize that we have come a long way. Starting from a Austrian dude with a white beard who sort of built his whole theory of the development of the human mind on a single boy who was scared of horses, and who didn’t seem to be overly interested in a rigorous test of his own hypotheses to, well, at least nowadays psychologists acknowledge that stuff should be tested empirically. Notice that I don’t want to imply that Freud was the founding father of psychology.[2]It was, of course, Wilhelm Wundt, and I’m not only saying this because I am pretty sure that the University of Leipzig would revoke my degrees if I claimed otherwise. However, he is of – strictly historical – importance to my own subfield, personality psychology. Comparing the way Freud worked to the way we conduct our research today makes it obvious that things changed for the better. Sure, personality psychology might be more boring and flairless nowadays, but really all I care about is that it is accurate.
You don’t even have to go back in time that far: Sometimes, I have to read journal articles from the 80s.[6]Maybe the main reason why I care about good science is that sloppy studies make literature search even more tedious than it would be anyway. Sure, not all journal articles nowadays are the epitome of honest and correct usage of statistics but really you don’t stumble across “significant at the p < .20 level” frequently these days. And if you’re lucky, you will even get a confidence interval or an effect size estimate!
And you don’t even have to look at psychology. A Short History of Nearly Everything used to be my favorite book when I was in high school and later as a grad student, reading about the blunder years of other disciplines that grew up fine nonetheless[7]to varying degrees, obviously. But hey, did you know that plate tectonics became accepted among geologists as late as the 1960s? gave me great hope that psychology is not lost.

This giraffe predates Sigmund Freud, yet it doesn’t wear a beard. (Photo: Count de Montizon)

THE PRESENT

Psychologists are starting to try to replicate their own as well as other researchers’ work – and often fail, which is great for science because this is how we learn things.[8]For example, that some effects only work under very strict boundary conditions, such as “effect occurs only in this one lab, and probably only at that one point in time.”
We now have Registered Reports in which peer review happens before the results are known, which is such a simple yet brilliant idea to avoid that undesirable results simply disappear in the file drawer.
To date, 367 people have signed the Peer Reviewers’ Openness Initiative and will now request that data, stimuli and materials are made public whenever possible (it can get complicated though), and 114 people have signed the Commitment to Research Transparency that calls for reproducible scripts and open data for all analyses but also states that the grading of a PhD thesis has to be independent of statistical significance[9]Really this seems to be a no-brainer, but then again, some people seem to mistake the ability to find p < .05 with scientific skills. or successful publication.
The psychology department of the Ludwig-Maximilians-Universität Munich explicitely embraced replicability and transparency in their job ad for a social psychology professor. That’s by no means the norm yet, and I’m not sure whether this particular case worked out, but one can always dream.
The publication landscape is changing, too.
People are starting to uploade preprints of their articles which is a long overdue step in the right direction.
Collabra is a new journal with a community-centered model to make Open Access affordable to everyone.
Old journals are changing, too: Psychological Science now requires a data availability statement with each submission. The Journal of Research in Personality requires a research disclosure statement and invites replications.[10]There are more examples but these two come to my mind because of their awesome editors. There are also journals that take a, uhm, more incremental approach to open and replicable science. For example, I think it’s great that the recent editorial of the Journal of Personality and Social Psychology: Attitudes and Social Cognition concludes that underpowered studies are a problem, but somehow I feel like the journal (or the subfield?) is lagging a few years behind in the whole discussion about replicable science.

Additionally, media attention has been drawn to failed replications, sloppy research, or overhyped claims such as power pose, the whole infamous pizzagate[11]Not the story about the pedophile ring, the story about the psychological study that took place at an all-you-can-eat pizza buffet. story, and the weak evidence behind brain training games. Now you might disagree about that, but I take it as a positive sign that parts of media are falling out of love with catchy one-shot studies because I feel like that whole love affair has probably been damaging psychology by rewarding all the wrong behaviors. [12]Anne is skeptical about this point because she doubts that this is indicative of actual rethinking as compared to a new kind of sexiness: debunking of previously sexy findings. Julia is probably unable to give an unbiased opinion on this as she happens to be the first author of a very sexy debunking paper. Now please excuse me while I will give yet another interview about the non-existent effects of birth order on personality.

And last but not least, we are using the internet now. A lot of the bad habits of psychologists – incomplete method sections, unreported failed experiments, data secrecy – are legacy bugs of the pre-internet era. A lot of the pressing problems of psychology are now discussed more openly thanks to social media. Imagine a grad student trapped in a lab stubbornly trying to find evidence for that one effect, filing away one failed experiment after the other. What would that person have done 20 years ago? How would they ever have learned that this is not a weakness of their own lab, but an endemic problem to a system that only allows for the publication of polished to-good-to-be-true results?[13]In case you know, tell me! I’d really like to know what it was like to get a PhD in psychology back then. Nowadays, I’d hope that they would get an anonymous blog and bitch about these issues in public. Science benefits from less secrecy and less veneer.

Students be like: They are doing what with their data? (Photo: user:32408, pixabay.com)

THE FUTURE

Sometimes I get all depressed when I hear senior or mid-career people stating that the current scientific paradigm in psychology is splendid; that each failed replication only tells us that there is another hidden moderator we can discover in an exciting new experiment performed on 70 psychology undergrads; that early-career researchers who are dissatisfied are just lazy or envious or lack flair; and that people who care about statistics are destructive iconoclasts.[14]What an amazing name for a rock band.
This is where we have to go back to the historical perspective.
While I would love to see a complete overhaul of psychology within just one generation of scientists, maybe it will take a bit longer. Social change supposedly happens when cohorts are getting replaced.[15]cf. Ryder, N. B. (1965). The cohort as a concept in the study of social change. American Sociological Review
30: 843-861.
Most people who are now professors were scientifically socialized under very different norms, and I can see how it is hard to accept that things have changed, especially if your whole identity is built around successes that are now under attack.[16]I have all the more respect for the seniors who are not like that but instead update their opinions, cf. Kahneman’s comment here, or even actively push others to update their opinions, for example, the old farts I met at SIPS. But really what matters in the long run – and I guess we all agree that science will go on after we are all dead – is that the upcoming generation of researchers is informed about past mistakes and learns how to do proper science. Which is why you should probably go out and get active: teach your grad students about the awesome new developments of the last years; talk to your undergraduates about the replication crisis.
Bewildered students that are unable to grasp why psychologists haven’t been pre-registering their hypotheses and sharing their data all along is what keeps me optimistic about the future.

Footnotes   [ + ]

1. but cf. more recent real-world fuck ups: For example, the retraction of a paper by William Hart, supposedly because a grad student faked data but the whole story turns out to be way more messy: The study was ridiculous in the first place and either Hart is exceptionally unlucky in his choice of grad students or he could have known about the issue earlier. The order of events has remained a mystery. And, of course, the instant classic fuck up of the grad student who never said no: Brian Wansink recalls fond memories of a grad student who went on “deep data dives” under his guidance that results in four publications; three brilliant data detectives in turn dive into these papers and find about 150 numerical inconsistencies; Wansink promises to clean up the mess (but really how much hope is there when people are not even able to consistently calculate from two numbers the change in percent). But who cares, it’s not like we are doing chemistry here.
2. It was, of course, Wilhelm Wundt, and I’m not only saying this because I am pretty sure that the University of Leipzig would revoke my degrees if I claimed otherwise.
3. Self-designation introduced by Roberts (2016)
4. cf. Paul Meehl, the Nostradamus of psychology.
5. Alternative explanation: I’m just an optimistic person. But I’ve noticed that heritability estimates don’t really make for entertaining blog posts.
6. Maybe the main reason why I care about good science is that sloppy studies make literature search even more tedious than it would be anyway.
7. to varying degrees, obviously. But hey, did you know that plate tectonics became accepted among geologists as late as the 1960s?
8. For example, that some effects only work under very strict boundary conditions, such as “effect occurs only in this one lab, and probably only at that one point in time.”
9. Really this seems to be a no-brainer, but then again, some people seem to mistake the ability to find p < .05 with scientific skills.
10. There are more examples but these two come to my mind because of their awesome editors. There are also journals that take a, uhm, more incremental approach to open and replicable science. For example, I think it’s great that the recent editorial of the Journal of Personality and Social Psychology: Attitudes and Social Cognition concludes that underpowered studies are a problem, but somehow I feel like the journal (or the subfield?) is lagging a few years behind in the whole discussion about replicable science.
11. Not the story about the pedophile ring, the story about the psychological study that took place at an all-you-can-eat pizza buffet.
12. Anne is skeptical about this point because she doubts that this is indicative of actual rethinking as compared to a new kind of sexiness: debunking of previously sexy findings. Julia is probably unable to give an unbiased opinion on this as she happens to be the first author of a very sexy debunking paper. Now please excuse me while I will give yet another interview about the non-existent effects of birth order on personality.
13. In case you know, tell me! I’d really like to know what it was like to get a PhD in psychology back then.
14. What an amazing name for a rock band.
15. cf. Ryder, N. B. (1965). The cohort as a concept in the study of social change. American Sociological Review
30: 843-861.
16. I have all the more respect for the seniors who are not like that but instead update their opinions, cf. Kahneman’s comment here, or even actively push others to update their opinions, for example, the old farts I met at SIPS.

Climate changes: How can we make people feel welcome in academia?

Academia is a strange place. There are a lot of implicit norms and unspoken rules which, to make it worse, can vary by field, subfield, across countries, and over time. For example: How do you write an email in an academic setting? Should your mails be polite or is it already impolite to waste the reader’s time with polite fluff? How do you address a professor who you (1) have never met in person, (2) met in person once but they likely don’t remember, (3) had a beer with at a conference but they likely don’t remember? Do you shake hands? How do you start a collaboration and are you sure you want to wear this pair of jeans/cat shirt/three piece suit to the next conference?

It takes time to figure these things out and to finally feel comfortable in academic interactions – even more so for students from working class families who can’t draw on experiences from their parents, or for students from parts of the world with considerably different academic norms.

Academia can be scary for newcomers. (Photo: Timon Studler, unsplash.com)

So how can we help people to feel welcome in our strange insider’s club?

I have three suggestions, and as it happens, none of them is about tone. Actually, I believe that changing the tone is a quite ineffective way to make people feel welcome and valued because it is just cosmetics: Communication can be extremely hostile while maintaining a picture-perfect all friendly nonviolent surface. I suspect that people who champion tone monitoring hope that talking nicely for long enough will transform attitudes. However, I faintly remember learning in the first year of my undergraduate that Sapir-Whorf is not well substantiated.[1]This memory also spoiled Arrival for me. Still a great movie though.[2]After publishing this post, it has been pointed out three times that I shouldn’t bash linguistic relativity. To add more nuance to my argument, let me add that, as far as I know, there is substantial evidence for a weaker form of the Sapir-Whorf account (which also seems to be misnamed), which I consider plausible. Of course, I’m only bashing the form of linguistic relativity displayed in Arrival.[3]After publishing this post, my boyfriend read it and I additionally have to add the disclaimer that yes, Arrival was a great movie and, yes, maybe, assuming that those aliens are so different from humans and way more advanced etc., probably in a parallel universe that does not adhere to our physics, maybe it could work like this.

Furthermore, setting well-intended rules about the tone of interactions might just add another layer of conventions that poses yet another obstacle for outsiders. What I suggest is that we do not tackle tone, but instead try to change the underlying climate.

Start admitting that you are sometimes wrong

Many students start with the assumption that people with the fancy “Dr.” attached to their name or (gasp) professors have privileged access to the secrets of the world and are thus close to infallibility. Anne pointed me towards Perry’s Scheme, a model of how college students come to understand knowledge, that succinctly summarizes this first level of understanding: The authorities know.

However, social interactions get pretty one-sided if one side assumes that the other side is never wrong, and it unnecessarily reinforces power differentials (that exist anyway, and that are probably not always conducive to scientific progress, but we will keep this for another blog post). It also greatly obscures how science – as opposed to esotericism – is supposed to work.
Anecdotal data ahead: I have never felt particularly unwelcome in academia, and I blame this on the fact that both my parents have a PhD. Now before we all get excited about social transmission of educational attainment, I will quickly add that I was not raised by the doctors but by my down-to-earth mother-of-eight catholic grandma. However, I still got the strong impression that academic rank does not predict how often a person is right about things that are outside of their specific narrow subfield. Of course there is a German word for this idea: Fachidiot, a narrowly specialized person who is an idiot when it comes to anything else. In fact, I might have had a phase in which I firmly believed that a PhD indicates that a person is always wrong.[4]I’m sorry, Mum. It wasn’t you, it was puberty.

Even seniors can’t know everything. (Photo: Miriam Miles, unsplash.com)

 

Coincidentally, this also relates to the one piece of career advice I got from my dad: It’s important to hang out at conferences because there you can actually see with your own eyes that everybody cooks with water, which is the German way to say that everybody puts their pants on one leg at a time.
There is a quick fix to the misconception that academics are always right: Just communicate that you are fallible and be honest about the things that you are uncertain about. If you need a role model for this type of behavior, I recommend Stefan Schmukle, who has been my academic advisor since the second year of my undergraduate and is probably the main reason why I did not leave psychology for a more lucrative and less frustrating career path. Stefan openly admits his knowledge gaps when he teaches and stresses that he keeps learning a lot. Funnily enough, it does not undermine his authority[5]You know what does undermine your authority in front of the students? Pretending to know something that you don’t know while not even being aware that the smarter students can easily tell you are just pretending. There is a German word for the student’s feeling in such a situation, it’s called fremdschämen. in front of his students according to the data available to me, which includes both quantitative (student evaluations and teaching awards) and qualitative (intensive student interviews over a beer or two) evidence. 

Positive side effects of admitting that you are sometimes wrong might include (1) students feeling more respected because of your honesty, (2) students learning that psychology is not an arcane art accessible only to privileged old white men, and (3) sending a strong signal that you are, in fact and despite all your glorious achievements, a human being. Which already leads to my second suggestion.

Show others it’s okay to have a life. Have a life.

This is important not only because you probably enjoy having a life, but it also avoids any sort of the mystification of what it means to be an academic. If we establish the norm that being an academic implies working from early morning until late in the night, seven days a week and especially between Christmas and New Year’s Eve, a lot of people might actually decide that they don’t want to feel welcome in academia. If your subfield actually requires this type of commitment, then please be frank about it so that junior researchers can decide early on whether they want to sacrifice literally everything else that makes life fun.
However, if your job does not require to sacrifice your life completely, it’s great to signal to others that you are, in fact, a human being with a family, hobbies, and other stuff that you do in your free time, like binge-watching Gilmore Girls or blindfolded speed runs of your favorite childhood video games.[6]The author of this article only indulges in one of these two activities but knows for a fact that at least one tenured individual in her proximity indulges in the other one. 

I don’t have any data to back up this claim, but I’m pretty sure most humans enjoy the company of other humans above the company of restless and efficient publication machines. Overworking is not a sustainable lifestyle for most people, and it does not create a particularly welcoming climate. It also leads to a race to the bottom which makes life worse for everyone, so maybe work less (and unionize). As a senior scientist, don’t make overworking the norm.

As it happens, this point also maps onto the one piece of solid career-related advice my mother passed on to me. Her professor told her that she was spending too much time in the library instead of getting to know her peers in the evenings. In my personal interpretation, I’m not reading that as advice to go “networking”, but to do things that are actually fun because we know what all work and no play did to Jack.

Don’t act as if willpower/grit/self-control/discipline/ambition/perseverance will lead to success

In the current predominant culture, especially in academia – and a bit more in the US than in my control group, Germany – success is often equated with the result of some sort of internal strength. If only you tried a bit harder, if only you got a bit more organized, if only you started getting up earlier, if only you gave a bit more, if only you networked more efficiently, your efforts would finally pay off. It’s all fine and dandy to try your best and to try to actively regulate your behavior, but I fear we have brought this to a point at which the attitude is getting toxic.
First, it opens the door to self-exploitation. Second, it makes people more willing to comply with exploitative structures, which is great for the maintenance of the status quo, but not so much for early career researchers who end up working endless hours. Third, if internal strength inevitably leads to success, having no success implies that you lack some sort of internal strength, or worse, that you are a failure.

Some things take patience, not willpower. (Photo: Kleber Varejão Filho, unsplash.com)


But, most importantly, it’s just not true that trying as hard as possible will lead to success, and that success will lead to some sort of bliss that compensates for all the hard work.
Success depends on multiple factors, and even if we assume that effort contributes quite a bit, there is still plenty of factors outside of our control: innate abilities, external factors such as being surrounded by people that support you (vs. having an advisor who is still fully absorbed in the rat race and exploits you for their own purposes) and a lot of randomness. Anyone who has ever submitted a paper to a journal will probably agree that there is a lot of randomness in the current academic system in psychology – if you’ve never encountered some level of arbitrary decision making, you’ve been pretty lucky (q.e.d.).
Then, the story goes, you should of course accept the things outside of your control but work hard with those that you can control – such as your ambitions and your perseverance. But can we even control these things? Frankly, I don’t know. But Ruben pointed out that we know few interventions that improve conscientiousness reliably, and that grit (which is basically conscientiousness) is partially heritableBased on my experience, trying hard is much harder for some people than for others. I can indeed be as disciplined as I want, but I cannot will what I want.[7]I’m pretty sure there is a reason why Schopenhauer is not particularly popular with motivational coaches. 


Last but not least, I don’t think that bliss necessarily awaits those who work hard and end up being successful. We have yet to hear of the lucky person who got tenured and immediately reached a state of inner peace as a result. In fact, when I look into the office next door, I get the impression that the daily grind is not that different with a nice title in front of your name. (It certainly is more comfortable with respects to financial security, but not everybody can end up being a professor, so maybe we need structural change instead of individual struggle to tackle the precarious employment situation in academia.) However, this outlook does not seem too dull to me: In our lab, we are being nice to each other and we agree that our job is (to some extent) about doing science – not that much about gaming the system to get somewhere where you can finally, if you are lucky, do science.

TL;DR: It’s fine if you are sometimes wrong, don’t sweat it. Don’t make overworking the norm. Don’t give students the impression they just have to try hard enough to make it because deep down, we all know that this is not how it works.

Footnotes   [ + ]

1. This memory also spoiled Arrival for me. Still a great movie though.
2. After publishing this post, it has been pointed out three times that I shouldn’t bash linguistic relativity. To add more nuance to my argument, let me add that, as far as I know, there is substantial evidence for a weaker form of the Sapir-Whorf account (which also seems to be misnamed), which I consider plausible. Of course, I’m only bashing the form of linguistic relativity displayed in Arrival.
3. After publishing this post, my boyfriend read it and I additionally have to add the disclaimer that yes, Arrival was a great movie and, yes, maybe, assuming that those aliens are so different from humans and way more advanced etc., probably in a parallel universe that does not adhere to our physics, maybe it could work like this.
4. I’m sorry, Mum. It wasn’t you, it was puberty.
5. You know what does undermine your authority in front of the students? Pretending to know something that you don’t know while not even being aware that the smarter students can easily tell you are just pretending. There is a German word for the student’s feeling in such a situation, it’s called fremdschämen.
6. The author of this article only indulges in one of these two activities but knows for a fact that at least one tenured individual in her proximity indulges in the other one.
7. I’m pretty sure there is a reason why Schopenhauer is not particularly popular with motivational coaches.