Sometimes a causal effect is just a causal effect (regardless of how it’s mediated or moderated)

TL;DR: Tell your students about the potential outcomes framework. It will have (heterogeneous) causal effects on their understanding of causality (mediated through unknown pathways), I promise.

It’s probably fair to say that many psychological researchers are somewhat confused about causal inference. That’s very understandable given the minimal amount of training most of us receive on the topic, but also rather unfortunate given that a big chunk of psychological research is about understanding broad causal patterns.[1]I lifted the phrase “broad causal patterns” from Angela Potochnik’s “Idealization and the Aims of Science” which argues that it is such patterns that are the path to human understanding. Due to the causal complexity of the world, getting there requires idealizations – assumptions that are made without regard for whether they are true (and often in full knowledge that they are false); hence the title of the book.

In this blog post, I want to tackle two related misconceptions that I have encountered. The first one is the notion that whether something is a causal effect or not depends on the specific mechanisms involved. The second one is the notion that when causal effects vary between people, that’s somehow a big issue and invalidates our inferences. To get these out of the way, we will start with some basics—i.e., the big secret of what we actually mean when we talk about causal effects.

The interventionist framework – What even is a cause? 

Nobody knows for sure. I have read a surprising number of articles in psychology that vaguely point to philosophical problems in the definition of causality and then somehow end up citing David Hume (“There must be a constant union betwixt the cause and effect”).[2]And that’s still the better version of this genre, the worse one starts talking about quantum physics. There are indeed philosophical discussions surrounding the concept of causality, and I’m sure some of these are also relevant to applied researchers. At the same time, it’s probably fair to say that most researchers across most quantitative empirical fields have subscribed to an interventionist framework of causality—sometimes explicitly, in psychology usually implicitly. 

In that framework, causal effects are defined with reference to some (sometimes merely hypothetical) intervention. For example (Figure 1), what would it mean for my well-being right now (Y) if I had taken an aspirin this morning (X = 1)? The causal effect of said aspirin on my well-being is then defined as the contrast between my well-being with aspirin (YX=1) and my well-being without aspirin (YX=0).[3]In Pearl’s framework, the notion of a hypothetical, surgical intervention is represented by the do()-operator. As I cannot simultaneously take (X = 1) and not take it (X = 0), we can only observe one of the outcomes involved in the causal effect, the other one remains a so-called counterfactual. YX=0  and YX=1 are referred to as potential outcomes because they are, well, the potential outcomes that could be observed depending on the state of the world (aspirin vs. no aspirin).

Figure 1. This is where the magic happens.

This is how we define individual-level causal effects. It’s really quite narrow and unassuming; the complications and assumptions all enter because one of the potential outcomes is destined to remain unobserved. This is also referred to as the fundamental problem of causal inference (a phrase attributed to Holland, 1986). The standard “solution” involves declaring that we cannot possibly know the individual-level effects and then trying to come up with some smart solution that still allows us to make certain statements. For example, if we randomly assign people to conditions (aspirin versus no aspirin), the mean difference between the groups should be an unbiased estimator of the average of the individual-level causal effects.

Causal effects are indifferent to the mechanism

Importantly, if my well-being is better if I take the aspirin rather than not (YX=1 > YX=0), that’s a causal effect of taking the aspirin—regardless of the specific underlying mechanism. Maybe it’s because the aspirin exerted anti-inflammatory properties, thus biologically reducing some underlying cause of suffering. Maybe it’s because I told myself “now that I have taken an aspirin, I’ll be fine” and subsequently went out and enjoyed the day rather than hiding in the dark bedroom wearing one of those cooling masks that make you look like migraine man, one of the less effective superheroes. There’s a certain black box character to causal effects, but that’s not really a bug. Imagine we always needed to understand the full underlying mechanism to be able to declare a causal effect. The causal chain could be broken down further and further, leading to levels of abstraction that are probably not very conducive to human understanding (“and then this specific precursor of prostaglandin is blocked from reaching this one particular active site of the enzyme, and then…”).[4]Quantum physics intensifies. 

Still, defining causal effects without regards to mechanisms can lead to some degree of confusion. Consider Sandy Jencks’ thought experiments (recounted by Kathryn Paige Harden in the Genetic Lottery) of a world in which a nation refuses to send children with red hair to school. That would constitute a causal effect of red hair on literacy. The intervention here is easy enough to imagine; if for some sick reason we decided to dye a kids’ hair red, that would (causally) decrease their chances of becoming literate. We can even go a step further and say that red hair genes causally affect literacy (that’s the original point of the thought experiment). All of this may feel wrong for various reasons.

People’s ideas of causality are sometimes entangled with the notion of blame (“Guns don’t kill people, people kill people”)—and clearly, we should not blame redheads for their bad outcomes, but rather society for its sick ways. So how could we say that red hair is a (or even the) cause? And the idea of causal effects of red hair seems to imply certain pathways, even more so when genes are invoked as a cause. It feels like the affected individual should do something that results in the outcome (maybe redheads are intrinsically lazier?), or maybe there should even be some biological explanation (maybe genes that cause red hair also impair brain function?). In any case, for a causal effect of red hair, it seems like something more deterministic and inevitable should be going on than “this weird nation decided that redheads are not allowed to go to school, you won’t believe what happened next.”

But the definition of a causal effect within the interventionist framework is indifferent to all of that. Within the specific population, changing hair color at a young age changes literacy later in life, so it’s a causal effect. The precise mechanisms don’t matter.

This indifference to causal pathways also applies to experiments. Consider the notion of demand effects: In one of the experimental conditions, the experimenter implicitly communicates that the participant ought to behave a certain way and the participant complies. This may not be the mechanism you had in mind when planning the experiment, but it’s still a causal effect of the experimental condition—had the participant been in a different condition, the experimenter would have communicated something else implicitly, and the participant’s behavior would have been different. Sometimes, experimentalists will refer to such unintended pathways from the experimental condition to the outcome as “confounds.” Fair enough, they do confound conclusions with respects to the effects of the intended pathway (Figure 2). However, they do not constitute confounders in another sense; they are not common causes of both the independent and the dependent variable of interest. Instead, they are unwanted mediators.[5]In the Campbellian validity system, those would be threats to construct validity (but not to internal validity).

Figure 2. Two different ways that people with different backgrounds may use the term “confounding” to confound the other side.

Causal effects may vary, always & forever

From the redhead example above, it should already be clear that causal effects are not to be thought of as immutable building blocks of reality, as fixed laws of nature. If the anti-redhead nation stops discriminating based on hair color, or if we look at a neighboring nation that instead discriminates against blondes, the effects of red hair on literacy will look different. These would usually be filed under concerns of generalizability.[6]Which can also be tackled from within a causal inference framework, see e.g., Deffner et al., 2022. In this particular example, we would be able to supplement our knowledge that the effects of red hair on literacy fully depend on laws that prohibit redheads from attending school. We would then reasonably conclude that the effects only generalize to other nations with such laws.

But there is really no reason to only think in terms of “tractable” variation that we can explain. Before, we discussed the notion of the individual-level causal effect which already sort of implies that every individual may have their own causal effect. In many ways, such (unexplained) effect heterogeneity is the default assumption in the causal inference literature, and then we try to work around it and somehow try to estimate some average of such individual-level effects. 

I often have the impression that the psychological literature starts from the opposite notion that causal effects are the same for everyone. If somebody then raises the possibility of heterogeneity (or “omitted moderators”), some people will be like “stop the presses, this changes everything.” Some go so far as to say that the estimated (average) effects are suddenly meaningless because they do not necessarily reflect anybody’s individual-level causal effect.[7]I have rambled about this before in footnote 2 of this blog post. I know, I know; I’m getting old and repetitive. Sure, it would be very nice to know everybody’s individual-level causal effect, but for many research questions, that’s simply out of the question due to the fundamental problem of causal inference; so some sort of average is often the best we can get. If we can get that at all.

On a related note, psychological researchers will sometimes hear (or actively teach) that main effects cannot be interpreted when there is an interaction. One version restricts this prohibition to cross-over interactions (in which the effect actually changes sign depending on a third variable), another one says “don’t interpret main effects in the presence of an interaction PERIOD.” From the perspective that effects may always be heterogeneous, that prohibition appears a bit puzzling. After all, it may as well be possible that the treatment interacts with something we did not observe, or just happens to have an effect with the opposite sign in some random individuals for inexplicable reasons—would the prohibition extend to those scenarios?

But I think it helps make sense of the prohibition if we consider it within the context of a fully factorial experimental study. Let’s say that A has a positive effect on the outcome when B = 0, and a negative effect when B = 1; these are the conditional effects. What’s the main effect of A? Some average of the two conditional effects; conventionally we may want it to be precisely in the middle between the two. This would be the average effect of A in a population in which B = 0 for half of the people, and B = 1 for the other half. But remember that this is a fully factorial experimental design, so we decide how many people get B = 0 and how many get B = 1. So we could adjust those numbers to get literally any average effect that lies between the two conditional effects; the average effect would be up to us (Figure 3).[8]Do the main effects in your ANOVA output actually correspond to meaningful average effects? Maybe; it depends on the design and the sum of squares used (Graefe et al., 2022). Frankly, everything that I learn about ANOVA squarely sums up to the conclusion that ANOVA is just too confusing. ENOV already!

Figure 3. Honk

But now imagine that only A is an experimentally manipulated factor, maybe some emergency medication, administered after accidents to stop bleeding. B is a particular genetic mutation. For people without the genetic mutation (B = 0), the medication works just fine. For people with the genetic mutations (B = 1), it actually makes things worse. How many people have B = 0 and how many people have B = 1 is completely outside of our control. Let’s say only 0.05% of the population carry the mutation. Now, the average effect would come out in the positive; if we give somebody the medication we can expect to help them, potentially saving lives. We may thus recommend usage of the medication in emergency situations, even if we usually won’t know whether the patient belongs to the 0.05% who are hurt. Of course, if we learned that in fact 30% of the population carry the mutation, the average effect would change and a different recommendation may result. But in any case, we need to worry about the average effect in the actual population that we want to treat, which is meaningful and can inform our decisions, despite the presence of a cross-over interaction.[9]When I worked on a manuscript with Arthur Chatton, a biostatistician, I noticed that he seemed to care a lot about target populations. But that makes a lot of sense if you start from the notion that your intervention may even harm some people – any relevant conclusion will depend on how things average out in your population of interest. In contrast, if you think that the effect is about the same for everyone, you don’t really need to care about representing your target population well, and I guess that’s how psychologists usually operate (despite their insistence that everything is super complex and moderated in subtle ways).

Some psychologists have taken the idea that things may be moderated to an extreme. First, if you don’t find an effect, maybe check for moderators; there could be some subtle cross-over interaction that leads to an average effect of zero when there is so much exciting stuff happening. Second, if somebody else fails to replicate your study, clearly that means that there must be some hidden moderator that just happens to have the wrong value in the replication study. While both of these things can be plausible in certain scenarios, in combination, they lead to a literature filled with unreplicable zombie claims that just cannot be killed. So at this point, we should also acknowledge that (1) cross-over interactions that lead to effect cancellation are probably rare in the wild[10]Although it looks like they occupied the fantasies of experimental social psychologists of a certain era. and (2) if you start invoking hidden moderators, it probably means that you failed to clearly define a target population in the first place.[11]To be fair, most of psychology is bad at that. So maybe we do deserve the endlessly repetitive “hidden moderators” debate for our sins.

Wrap-up

So: Causal effects are indifferent to the mechanisms that contribute to them.[12] This does not imply that mechanisms are indifferent to causality; claims about mechanisms are claims about how things causally unfold in the world. That means that claims about mechanisms come with all the standard causal inference problems, and then some – because they require the successful causal identification of multiple path specific effects. Sometimes people try to weasel their way out by claiming they are merely “demonstrating that the data are compatible with a theoretically plausible mechanism”; alas, such demonstrations only provide a severe rest of the underlying theory if the underlying causal assumptions are plausibly met. For example, if we estimate the causal effect of gender on income, that will include all sorts of things, including differences in interests and preferences, women dropping out of the labor force after children, direct discrimination behavior by decision makers. If that “feels wrong” to you because some of these things shouldn’t count, it’s likely that you are not interested in the causal effect of gender on income per se, but in some more specific mechanism (maybe you’re just interested in the effect that remains when removing the influence of certain pathways that you deem justified, i.e., a bias). Or maybe you are not interested in the lifetime effects of gender (“What if you had been born a boy?”) but rather in the effects of a more immediate gender change (“What if you suddenly turned into a boy today but kept all of your previous credentials?”). Admittedly, for gender (and for biological sex),[13]There is a whole literature on the question whether sex and/or gender can be a meaningful causal variables. The funny thing about biological sex in particular is that its effects can be identified quite plausibly, as biological sex seems to be pretty much randomized at conception, turning this into a natural experiment. But if you actually try to define the individual-level causal effect for biological sex, you are comparing “you with the biological sex you actually have” with “you if you had received different chromosomes etc, which arguably might no longer be you but is instead a different person.” the interventions are a bit more hypothetical than for red hair; but if it is possible in Animal Crossing, certainly it’s in the realm of the conceivable.

One area in which people seem to have a particularly hard time to differentiate between questions about causal effects and questions about specific mechanisms is genetics. It just seems very tempting to interpret the effects of genes as immutable laws of nature that arise in all contexts, which makes them a lot more contentious than they would be in some alternative universe in which everybody is assigned to read Harden’s “The Genetic Lottery.”[14]Chapter 5, “A Lottery of Life Chances”, includes a very nice discussion of what causal effects of genes are (and aren’t). There is also an article by Madole and Harden (2022) which elaborates on the matter and which is likely well worth your time. Maybe there is something special about the possibility of biological pathways that triggers brain areas responsible for specific deterministic causal narratives. More fMRI studies are needed.[15]I am currently reading Sarah Thornton’s “Tits Up” in which at some point Thornton argues that associations between breastfeeding and IQ are not just confounding because “recent MRI scans have revealed a human milk ‘dose-response’ in brain morphology.” That does not seem like a particularly compelling argument from a causal inference perspective – after all, there may as well be common cause confounders between breastfeeding and white matter – but the argument “feels” like it works.

Causal effects may also vary in arbitrary ways. Some of this variability may be tractable, some may be intractable. Either way is fine, it doesn’t mean that average effect estimates are somehow wrong—if all goes well (big if!) they are exactly just that, average effect estimates. If it feels like that’s “not enough” to do proper science, it may be helpful to recall that due to the fundamental problem of causal inference, the individual level effects will often be out of reach. Holland (1986) dinstinguishes between the “scientific solution” and the “statistical solution” to this problem. In the former, scientists can employ reasonable invariance assumptions. For some laboratory equipment, I may assume that its measured outcome at an earlier time point is equivalent to its potential outcome for the same condition right now (i.e., if I did the same thing to it, the same thing would happen), so if I do something else to it and then something different happens, it’s plausible to conclude that my action had a causal effect. This type of invariance assumption may not always be feasible for living beings,[16]Except maybe for some cases of repeated within-subject experimentation for which it is plausible to assume no carry-over effects. If this happens to be applicable to your research question—knock yourself out, it’s a great design. and so we need some other solution. The statistical solution relies on the fact that things can average out to return meaningful answers. Maybe the naming of the two solutions is unfortunate given how researchers sometimes regress into physics envy (“Are you saying that behavioral research does not rely on scientific solutions?”).

But, to put it another way: the scientific solution works if your research object is so well-behaved that it makes causal inference easy. The statistical solution is needed if your research object is trying its very best to give you a hard time, as humans, cats, and other critters are prone to do.

Footnotes

Footnotes
1 I lifted the phrase “broad causal patterns” from Angela Potochnik’s “Idealization and the Aims of Science” which argues that it is such patterns that are the path to human understanding. Due to the causal complexity of the world, getting there requires idealizations – assumptions that are made without regard for whether they are true (and often in full knowledge that they are false); hence the title of the book.
2 And that’s still the better version of this genre, the worse one starts talking about quantum physics.
3 In Pearl’s framework, the notion of a hypothetical, surgical intervention is represented by the do()-operator.
4 Quantum physics intensifies.
5 In the Campbellian validity system, those would be threats to construct validity (but not to internal validity).
6 Which can also be tackled from within a causal inference framework, see e.g., Deffner et al., 2022. In this particular example, we would be able to supplement our knowledge that the effects of red hair on literacy fully depend on laws that prohibit redheads from attending school. We would then reasonably conclude that the effects only generalize to other nations with such laws.
7 I have rambled about this before in footnote 2 of this blog post. I know, I know; I’m getting old and repetitive.
8 Do the main effects in your ANOVA output actually correspond to meaningful average effects? Maybe; it depends on the design and the sum of squares used (Graefe et al., 2022). Frankly, everything that I learn about ANOVA squarely sums up to the conclusion that ANOVA is just too confusing. ENOV already!
9 When I worked on a manuscript with Arthur Chatton, a biostatistician, I noticed that he seemed to care a lot about target populations. But that makes a lot of sense if you start from the notion that your intervention may even harm some people – any relevant conclusion will depend on how things average out in your population of interest. In contrast, if you think that the effect is about the same for everyone, you don’t really need to care about representing your target population well, and I guess that’s how psychologists usually operate (despite their insistence that everything is super complex and moderated in subtle ways).
10 Although it looks like they occupied the fantasies of experimental social psychologists of a certain era.
11 To be fair, most of psychology is bad at that. So maybe we do deserve the endlessly repetitive “hidden moderators” debate for our sins.
12 This does not imply that mechanisms are indifferent to causality; claims about mechanisms are claims about how things causally unfold in the world. That means that claims about mechanisms come with all the standard causal inference problems, and then some – because they require the successful causal identification of multiple path specific effects. Sometimes people try to weasel their way out by claiming they are merely “demonstrating that the data are compatible with a theoretically plausible mechanism”; alas, such demonstrations only provide a severe rest of the underlying theory if the underlying causal assumptions are plausibly met.
13 There is a whole literature on the question whether sex and/or gender can be a meaningful causal variables. The funny thing about biological sex in particular is that its effects can be identified quite plausibly, as biological sex seems to be pretty much randomized at conception, turning this into a natural experiment. But if you actually try to define the individual-level causal effect for biological sex, you are comparing “you with the biological sex you actually have” with “you if you had received different chromosomes etc, which arguably might no longer be you but is instead a different person.”
14 Chapter 5, “A Lottery of Life Chances”, includes a very nice discussion of what causal effects of genes are (and aren’t). There is also an article by Madole and Harden (2022) which elaborates on the matter and which is likely well worth your time.
15 I am currently reading Sarah Thornton’s “Tits Up” in which at some point Thornton argues that associations between breastfeeding and IQ are not just confounding because “recent MRI scans have revealed a human milk ‘dose-response’ in brain morphology.” That does not seem like a particularly compelling argument from a causal inference perspective – after all, there may as well be common cause confounders between breastfeeding and white matter – but the argument “feels” like it works.
16 Except maybe for some cases of repeated within-subject experimentation for which it is plausible to assume no carry-over effects. If this happens to be applicable to your research question—knock yourself out, it’s a great design.

1 thought on “Sometimes a causal effect is just a causal effect (regardless of how it’s mediated or moderated)”

Comments are closed.