TL;DR: What’s an age-effect net of all time-varying covariates?
The sound of one hand clapping.
Recently, we submitted a paper with some age trajectories of measures of individuals’ (un-)well-being. We thought of these trajectories in the most descriptive way: How do these measures change across the life course, all things considered? And really while this might not be the most interesting research question because it doesn’t directly answer why stuff happens, I’m a fan of simple descriptive studies and think they should have a place in our domain; Paul Rozin wrote a great piece on the importance of descriptive studies.
Anyway, the editor asked us to justify why we did not include any time-varying covariates (e.g. income, education, number of children, health) in our analysis of age trajectories. I thought the editor had requested an actual justification; my co-author (an economist) thought the editor just wanted to tell us that we should throw in all sorts of covariates. I felt too lazy to re-run all analyses and create new figures and tables, plus I always get a weird twitch in my left eye when somebody asks for “statistical control” without additional justification, so instead I looked into the (scientific) literature on the midlife crisis and tried to figure out how people have justified the inclusion of control variables in the analyses of age effects on well-being.[1]Ruben, on the other hand, would probably get a twitch in his right eye if he found out I did not automate making my figures and tables.
Cat ownership, a time-varying covariate. (Pic: pixabay.com)
Whether or not life satisfaction dips in middle adulthood (somewhere between age 45-64) before rising again in older age[2]before dipping again in a terminal decline. If you read that footnote, you’re now in the mortality salience condition of my Terror Management Theory study. has been hotly debated by psychologists and economists. There are a lot of papers out there on that subject and personally, I’m totally agnostic regarding the existence of the midlife crisis – ask me again in 20 years, if I’m not too busy driving my Porsche. But there are a lot of interesting methodological questions that arise when trying to answer this question.
A brief list of stuff I don’t want to talk about in this post, which are important nonetheless:
- the Age-Period-Cohort conundrum: In short, this requires us to make certain assumptions when we want to identify age/period/cohort effects. That’s okay though, every researcher needs to make assumptions from time to time.
- longitudinal vs. cross-sectional data: Both can have their pros and cons.
- statistical control for covariates that don’t change over time, such as gender or race, as systematic differences in the composition in the age groups can cause issues
- what we can learn from lab studies in which researchers recruit older people and then compare their performance on an arbitrary task X to the performance of their convenient undergraduate sample. How do you reasonably match 60 year old people that decided to participate in lab studies onto a younger sample of psych majors that really just want to get their freakin’ course credit?
- lots of other interesting stuff you can do with longitudinal data that is more interesting than simple descriptive trajectories
But let’s get back to the topic our editor raised: Should we control for time-varying covariates such as income, marital status, health? The logic seems straightforward: Wouldn’t we want to “purge” the relationship between age and life satisfaction from other factors?
Quite obviously, a lot of stuff changes as we age. We get older, get our degrees, start a decent job and make some money,[3]Or, alternatively, start a blog. maybe marry and settle down or travel to an Ashram to awaken our inner goddess and spite our conservative parents or maybe just get a lot of cats.
Mount Midlife Crisis, not to be confused with Mount Doom. (Art: Hakuin Ekaku)
To control for these variables might be wrong for two distinct reasons, and I will start with the somewhat more obscure one.
First, our time-varying covariate might actually be causally affected by life satisfaction. This argument has been raised regarding the statistical control of marital status by the late Norval Glenn (2009). He simulated a data situation in which (1) life satisfaction is stable across the life course and (2) starting from 21, only the 10 happiest people marry each year. He then demonstrated that controlling for marital status will result in a spurious trajectory, that is, a pronounced decline of life satisfaction over the life course even though we know that there’s no age effect in the underlying data. If you have read this blog before and the data situation sounds somewhat familiar to you: Marital status would be one of the infamous colliders that you should not control for because if you do, I will come after you.[4]And you should be scared because I can deliver angry rants about inapproriate treatment of third variables. I might bring cookies and hope that you have decent coffee because this might take longer. If marital status is affected by age (the older you are, the more likely you are to be married), and if satisfied people are more likely to marry, marital status becomes a collider of its two causes and should not be controlled.
The second reason is somewhat more obvious: In many cases, the time-varying covariates will mediate the effects of age on your outcome. That is probably most obvious for health: Health declines with age. Decreases in health affect life satisfaction.[5]They obviously do, though the fact that life satisfaction remains stable until a certain age despite decreases in health has been labeled the happiness paradox. So life satisfaction might decrease with age because of declining health. Now what does it mean if we control for this potential mediator?
Well, it means that we estimate the age effect net of the parts that are mediated through health. That is not inherently nonsensical, we just have to interpret the estimate properly. For example, Andrew Oswald was cited in Vol. 30 of the Observer: “[But] encouragingly, by the time you are 70, if you are still physically fit then on average you are as happy and mentally healthy as a 20 year old.” Now this might be indeed encouraging for people who think they are taking great care of their health and predict that they will be healthy by the time they are 70; but whether it’s encouraging on average strongly depends on the average health at age 70.
For example, if we assume that only the luckiest 1% of the population will be physically fit at that age, 99% will end up unhappier than 20 year olds (whether or not 20-year-olds are very happy is a different question). That doesn’t sound very optimistic any more, does it? The lucky one percent might also be very special with respect to other characteristics such as income, and a message such as “the wealthy will be still happy at age 70, whereas the poor are wasting away because of a lack of health care” again sounds not very encouraging. For the record, I’m not claiming that this is happening, but those are all scenarios that are aligned with the simple statement that those who are physically fit at age 70 are as mentally healthy as 20 year olds.
So the estimated association has its own justification but must be interpreted carefully.[6]Actually, there is yet another problem that has been pointed out by Felix Thoemmes in the comments section of this post. A mediator is also almost always a collider, as it is per definition caused by the independent variable of interest and (most likely) by some other factors. So you would actually have to control the backdoor paths from the mediator in turn, or else your estimate of the direct effect, whatever it reflects, will actually be biased again. Additionally, it renders the “remaining” age effect hard to interpret, so it might not be very enlightening to look at age effects net of the effects of time-varying covariates. Let’s assume you “control” all sorts stuff that happens in life as people age – marital status, education, income, number of children, maybe also number of good friends, cat ownership, changes in health, and when we are already at it, why don’t we also control for the stuff that is underlying changes in health, such as functioning of organs and cell damage? – and still find a significant age effect.
What does that mean? Well, it means that you haven’t included all time-varying covariates that are relevant to life satisfaction because age effects must necessarily be mediated through something. The sheer passing of time only has effects because stuff happens in that time.
The “stuff” might be all sorts of things, and we might be inclined to consider that stuff more or less psychologically meaningful. For example, we might not consider changes in marital status or physical health to be “genuinely psychological”, so we might decide to control for these things to arrive at a purely psychological age effect. Such a “purely psychological” age effect might then be driven by e.g. people’s attitude towards the world. For example, people might get more optimistic and thus more satisfied controlling for other life circumstances. But I would again be careful with those interpretations, because of the collider problem outlined before and because of the somewhat arbitrary distinction between physical changes and social role changes as opposed to psychological changes.
In other words: what you should or shouldn’t control for always depends on your research question. If you study living things and control for life, don’t be surprised if your results seem a bit dull.
Update: In the meantime, I have written an article loosely based on this blog post which gives a somewhat more formal introduction to issues of causal inference. Check out the preprint, Thinking Clearly About Correlations and Causation: Graphical Causal Models for Observational Data
Footnotes
↑1 | Ruben, on the other hand, would probably get a twitch in his right eye if he found out I did not automate making my figures and tables. |
---|---|
↑2 | before dipping again in a terminal decline. If you read that footnote, you’re now in the mortality salience condition of my Terror Management Theory study. |
↑3 | Or, alternatively, start a blog. |
↑4 | And you should be scared because I can deliver angry rants about inapproriate treatment of third variables. I might bring cookies and hope that you have decent coffee because this might take longer. |
↑5 | They obviously do, though the fact that life satisfaction remains stable until a certain age despite decreases in health has been labeled the happiness paradox. |
↑6 | Actually, there is yet another problem that has been pointed out by Felix Thoemmes in the comments section of this post. A mediator is also almost always a collider, as it is per definition caused by the independent variable of interest and (most likely) by some other factors. So you would actually have to control the backdoor paths from the mediator in turn, or else your estimate of the direct effect, whatever it reflects, will actually be biased again. |
Since you brought up colliders and mediators, it is useful to realize that in virtually all circumstances a mediator M on a causal pathway between X and Y is most likely also a collider on pathways
X -> M Y, and thus will bias the relationship between X and Y by opening a previously closed front-door path. So you are not estimating some age effect net some other pathways, but a biased net effect.
Nice post.
Mhm, looks like my little path diagram was deleted when uploaded. I was drawing a path from X to M and then added an unobserved variable U that is a common cause of M and Y thus rendering M a common effect of X and U.
Hey Feli(x), thanks for your comment! And you are totally right, mediators will almost necessarily be colliders, so yet another reason why mediation analysis is fucked. I’ll add a note to the post to acknowledge this unfortunate additional problem.
Hi Julia,
Haha – that was Felix with the x – typo.
I think mediation analysis might deserve it’s own blog post, with some refs to the “classic” Bullock et al. paper, along with some recent papers by Pearl, Vanderweele, Imai, and Vansteelandt.
Yup, a full blog post might be a good idea. And thanks for all the references – in case you could imagine writing said post yourself, let me know! Guest posts are highly appreciated and maybe it would be a nice change to have a post by an actual pro 🙂
Yeah, I guess blogging is something that I should add to my scholarly activities – seems like the thing to do these days… 😉
I would definitely be interested in a post on mediation. As I understand it (n.b. I do not understand it), the causal mediation framework of Vanderweele et al has some really strong assumptions of no unmeasured confounders. I fairly recently tried to find some examples of studies actually using their methodology, so I went to Google Scholar and looked up papers that cite some of their main articles. The majority papers were about extending the methodology — to survival analysis, multiple mediators, etc, etc. Most of the few articles on actual studies seemed to not actually even try to justify meeting the unmeasured confounding assumptions. Which made me wonder how useful these methods are if they are so difficult to implement. However, I really don’t understand them well at all, and I didn’t do a lit review, just a bit of googling, so there’s a good chance I’m missing something.
That’s an interesting observation, Will. I am pretty familiar with the methods literature on that topic, but truth be told, I never checked whether applied researchers are actually using them. According to your search, they are not.
That’s unfortunate, and in a sense surprising, because somewhat user-friendly software in R already exists. I guess it takes some time to permeate the field. And the methods really are useful – their advantage being that they make the (very strict) assumptions transparent, as opposed to ignoring them.