Longitudinal data don’t magically solve causal inference

While reviewing papers, I’ve noticed some boilerplate that keeps creeping up in the “Limitations” sections of studies using cross-sectional, observational designs:

“Of course, we are unable to draw causal conclusions…but future studies with experimental manipulations or longitudinal data will fix this.”

Now, let’s ignore for a second that usually authors of these studies, of course, nonetheless draw causal conclusions,[1]As always, including my own work. or that I’m mostly reviewing studies in personality psychology where it is often utterly unclear what a good experimental manipulation would look like.[2]Repeatedly telling participants that they are sore losers just for the sake of finally figuring out whether self-esteem does, indeed, causally affect happiness? Why not. The part that keeps irking me is that such statements imply longitudinal data are a safe way to draw causal conclusions from observational data. In fact, some papers read as if longitudinal data, combined with the right statistical model, were some sort of magical causal inference machine. Collect longitudinal data, run a cross-lagged panel model (CroLaPaMo)[3]Und jetzt alle: Cro-La-Pa-Mo-La Blanca! or a variation of it (more on those later), BAM!, causality. In this blog post, I’m going to explain why it’s not that easy.

In the ideal case, CroLaPaMos get at something called Granger causality. For example, let’s say you’ve measured talkativeness and headaches repeatedly over time. Talkativeness granger-causes headaches if additional knowledge of past talkativeness leads to better predictions than knowledge of past headaches alone. According to Wikipedia, “[A]s its name implies, Granger causality is not necessarily true causality.” Silly me, somehow I’d have assumed that something called Granger causality was about causality! Rumor has it that even the late Clive Granger wasn’t entirely happy about the name.

So, what’s up with Granger causality? Take my previous example of talkativeness and headaches. Actually, I’d be willing to bet that for me personally, talkativeness does indeed granger-cause headaches. But, most likely, talkativeness does not actually cause my headaches. How is that possible, Granger causality without causality? It’s the same reason why we can get a correlation between two variables without them causally affecting each other: third variables — just in this case, the nasty third variables are the ones that affect both variables with different time lags.[4]There’s probably other scenarios in which Granger causality differs from actual causality, this just seemed like the simplest “canonical” case. For example, in my case, the confounder would be alcohol. After a few beers, I get more talkative. Then, with some time lag, I get a headache. If you wanted to reduce my risk of headaches, you’d have to stop me from drinking rather than getting me to shut up. Sounds complicated? Well, take it from this Granger:

Granger causality may be useful in the context in which it was developed in economics,[5]I’m not entirely sure about this because I’ve seen some smart economists like Paul Hünermund dealing with (Gr)anger issues. and, in some sense, it gives you exactly the answer to the question that you are asking: do past values of X1 predict X2 above and beyond past values in X2? But, in many cases, that is probably not the question psychologists are trying to answer to begin with — at least I haven’t seen a paper in my own field convincingly arguing why Granger causality was of particular substantive interest for a given research question. Most studies very much sound as if the authors were interested in plain old causality.[6](don’t tell David Hume)

Apart from this fundamental issue with Granger causality, there are additional technical intricacies with CroLaPaMos and their like. There’s the by now essentially classic paper by Hamaker, Kuiper, and Grasman (2015) on how additional random intercepts are necessary to estimate within-person associations over time. Less well-known seems the insight that, applied to continuous processes, the cross-lagged coefficients depend on the time interval between measurements (e.g., Oud, 2002) — so a naive interpretation of the coefficients of a CroLaPaMo might vary depending on the timing of the data collections, even if the underlying process is exactly the same. While these are interesting methods issues and developments to consider, they don’t change anything about the more fundamental and conceptual issue that Granger causality is not the same as causality.[7]It’s one of those things that methods people know and acknowledge in footnotes or brief paragraphs, but somehow it tends to get lost on the way to the average user of those models.

Still, there is tremendous value in longitudinal data. For example, by looking at within-subject correlations, one can rule out that stable between-person factors (personality, anyone?) can fully explain an association. Looking at people’s trajectories as they live through certain life events can be super informative. Some research questions (e.g., about intra-individual variability) only make sense with longitudinal data to begin with. It’s just that there is no causal inference machine that can be fed with longitudinal, observational data and will then spit out the correct answer, no strings attached — as always, causal inference on the basis of observational data is very hard and dependent on additional assumptions.

With these remarks in mind, fellow Power Grangers, it is, as they say, morphin’ time.

Many thanks to Stephan “Dr. π” Poppe who proofread this post to ensure that I’m not talking total nonsense. However, he does not bear any responsibility for any mild nonsense. All typos are to be blamed on my co-bloggers.


Footnotes   [ + ]

1. As always, including my own work.
2. Repeatedly telling participants that they are sore losers just for the sake of finally figuring out whether self-esteem does, indeed, causally affect happiness? Why not.
3. Und jetzt alle: Cro-La-Pa-Mo-La Blanca!
4. There’s probably other scenarios in which Granger causality differs from actual causality, this just seemed like the simplest “canonical” case.
5. I’m not entirely sure about this because I’ve seen some smart economists like Paul Hünermund dealing with (Gr)anger issues.
6. (don’t tell David Hume)
7. It’s one of those things that methods people know and acknowledge in footnotes or brief paragraphs, but somehow it tends to get lost on the way to the average user of those models.

2 thoughts on “Longitudinal data don’t magically solve causal inference”

  1. I take little issue with Granger causality if it isn’t used as a proxy for causality. Personally, I often use ‘X Granger-causes Y’ in my work, and find it very interesting if 1) X temporally precedes Y & 2) X also predicts Y. Most relations in psychology come from causation, although the causal process involves a ton more variables than X and Y. But it’s not a solid point to start further investigation.

    1. Fully agree that it can be interesting in its own right! I think saying “X Granger-causes Y” is a great solution because it makes readers stop and think exactly at the moment when you make the claim — much better than using vague language and adding a paragraph in the discussion “oh BTW all of this wasn’t meant to be read as causal claims”

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.