Reviewer notes: Avoid any ambiguity about analysis aims

For any central statistical analysis that you report in your manuscript, it should be absolutely clear for readers why the analysis is being conducted in the first place – that is, the analysis goal should be transparently communicated. A helpful concept here is the so-called theoretical estimand, the target quantity of your analysis. This quantity should be stated in precise terms that exist outside of any statistical model (Lundberg et al., 2021). So, a theoretical estimand is not something like “the cross-lagged path coefficient in this model that I’m fitting” or “the centrality measure my network analysis returns” or even “the mean difference between these two experimental groups;” it’s what those numbers are supposed to tell you about the world: How does X affect Y over time and vice versa? What’s the causal role of this variable for these other variables? How does my experimental manipulation affect the outcome?
In general, statistical models aren’t research questions. At the very best, they are tools that provide answers to research questions.

What’s in an estimand?

An estimand has two components, (1) a unit-specific quantity and (2) a target population over which it should be aggregated. For example, a well-defined estimand could be the prevalence of a certain diagnosis in the German population (unit-specific quantity: diagnosis yes/no, target population: German population). It could also, in some circumstances, be a simple association, such as the association between only-child status and certain personality traits.[1]In a given target population. We will ignore the target population aspect from now because the unit-specific quantity aspect seems to be the greater source of conceptual confusion. But target populations are certainly almost always woefully underdefined in psychology. And, more often than not, it will be some (population average) causal effect of interest, which can be technically defined as a contrast between potential outcomes under certain (hypothetical) interventions.[2]Lundberg et al. (2021) are an excellent source to learn more about how to specify estimands, I also blogged a bit more about it in “Who would win, 100 duck-sized strategic ambiguities vs. 1 horse-sized structured abstract?”

Consistency is key

Once you have figured out the central target quantity of your main analysis, it should remain in focus throughout your article. Another way to put this: the implied research question should be the same from introduction to method and results to discussion.[3]There can of course be exceptions to this when the research question deliberately changes – for example, you may start with a causal estimand but then realize that it cannot be recovered and a more modest research question has to be asked. In that case, the transition must be clearly explicated and justified to readers. 

Not-too-casual causal questions

For example, if your introduction leads up to a causal research question (“Can money buy happiness?”) then your statistical analysis of course has a causal estimand (say, the effect of increasing income by X on well-being) and the discussion can consider its implications, but also the necessary assumptions

Association Shmassociation

If, instead, you have committed to an associational estimand, the introduction should make it clear why the association is of interest in its own right, regardless of causality. I have rarely encountered this so far, but it’s possible. For example, one may simply be interested in whether only children are, on average, different from people with siblings – is there any truth to the stereotypes? The corresponding analyses should end up fairly simple.

The discussion is a bit trickier here – even if you use “explicitly associational” language, humans just tend to gravitate toward causal stories they find plausible (e.g., Hill et al., 2024). A good way to prevent overinterpretation is to spell out multiple causal explanations that vary in structure and that are actually plausible. For example, only children may lack certain social experience and thus score higher on X (only child -> social experience -> X). But it may also be because they enjoy more parental attention (only child -> parental attention -> X). Importantly, confounding could also provide an explanation, for example, only children may result from divorce (only child <- divorce -> X) or from women delaying childbearing as they prolong education (only child <- maternal education -> X). Last but not least, child characteristics may actually affect whether subsequent children are born (X <- child temperament -> only child). And all of these may be true simultaneously, resulting in the observed association.

Sometimes, researchers are interested in associations, but upon closer look, it becomes clear that those associations are only deemed interesting because of certain causal interpretations. One sign that this is the case is when researchers start to condition on third variables to “rule out alternative explanations” or “remove spurious associations.” But an association really just is what it is, it cannot be “contaminated” by spuriosities – what can be contaminated by spurious associations is a causal effect estimate. And the existence of an “alternative explanation” implies the existence of a favored explanation, which is most likely a causal one. So, if you feel the urge to “control away” some parts of the associations of interest, it’s quite likely that your estimand is in fact a causal effect (see also Wysocki et al.’s “Statistical control requires causal justification”).[4]There could of course be scenarios in which the the estimand of interest is in fact a conditional association of some sorts. In that case, the introduction needs to make clear why this conditional association is of interest.  For example, we may be plausibly interested in whether firstborn children are systematically different from laterborn children within families of the same size.

PostPredictive Modeling

A special case of an associational research endeavor is building a predictive model. If you’re doing that, your introduction needs to motivate an actual predictive use case and your discussion should be focused on implications for the application of the model for said use case. Predictive utility is always tied to a specific use case (see e.g., Hunsley & Meyer, 2003) and a prediction model is always only validated within a specific context (Van Calster et al., 2023). For example, psychologists may be interested in whether X predicts Y beyond some other information Z that is available. But even if such “predictive utility” has been demonstrated, it need not hold in a scenario in which the base rate of Y is different, in which the sample was selected in a different manner, or in which the other information that is available changes. 

If you are looking for generalizable knowledge, a predictive model may not be the tool you are looking for. If you find it hard to generate a specific use case for your model, or if you are mainly focusing on the “contributions” of individual predictors and what they may tell you about the world, again, maybe a predictive model is not the right tool for your job. Your research question may in fact be a causal one.[5]I ramble more about “prediction” in psychology in Rohrer (2024).

What to avoid

Don’t write an article with the following structure:

  • introduction that motivates a causal research question
  • analysis framed in associational terms
  • discussion that interprets the findings causally
  • limitation section that says that findings should not be interpreted causally

You can’t have a causal cake and correlate it too. People have repeatedly pointed out that this style of research is bad for a variety of reasons (most prominently, Hernán, 2018; for a take for psychologists, see Grosz et al., 2020). But I think the logical inconsistency and disingenuity of it is actually a sufficient reason to stay away from it. 

Beckham "Be Honest. Thank you." meme.
First row
Victoria: I want to know whether personality predicts important life outcomes above and beyond basic demographics.
David: Be honest.
Second row
Victoria: I am being honest.
David: Why would anybody be interested in that?
Third wor
Victoria: Maybe personality causes stuff.
David: Thank you

You may feel uncomfortable trying to support causal claims with observational data, but trying to hide your causal estimand is just not a good way to deal with it. One better way involves tackling the issue head on: Explicating both the causal estimand and the identification assumptions under which your data can indeed inform you about the theoretical estimand. An upside of this approach is that you already have something substantial for your discussion section, namely a critical discussion of the plausibility of your identification assumptions. 

(Another option would be to change the theoretical estimand, which would usually entail writing a different paper altogether.)

One special case to consider here is a situation in which you have a (usually causal) theory which predicts that you should observe certain associations, and then you check whether these associations exist in your data. This is, as far as I can tell, just another way to do causal inference and should thus be treated as such. That is, the necessary rigor should be applied – under which assumptions do these associations support the favored interpretation? What may be alternative explanations?

Quality vs. quantity

Last but not least, you may of course have multiple main research questions and thus multiple theoretical estimands. In that case, all of the above applies to each of them.

If this sounds daunting to you – it is my general impression that (at least in my line of research) psychologists try to squeeze too many ambitious research questions (and thus too many estimands) into a single manuscript. Maybe they are simply underestimating how much effort it would take to provide a rigorous and reliable answer to any single one of them.[6]Malte asks: Isn’t Estimalami slicing a problem? The answer will depend on how the estimands are logically connected. For example, if you have an experimental manipulation X and want to look at its effects on Y1, Y2, and Y3, which are all related constructs – by all means, do that in one paper. If you are looking at the correlation between one of the Big Five personality traits and some other variable, you might as well also throw in the other Big Five. In fact, that’s expected in personality psychology. If, instead, you are trying to estimate the effect of X on Y and you barely have a proper identification strategy in the first place, and then you add “oh and is this effect mediated by M1 depending on the level of the moderator M2?” then you are probably trying too much at once.

One step to improve the quality of inferences in psychology may be trying to make fewer inferences in the first place and take things a bit more slowly.[7]For one more sketched out vision of what that could look like, see conclusion of Rohrer et al. (2022). 

Footnotes

Footnotes
1 In a given target population. We will ignore the target population aspect from now because the unit-specific quantity aspect seems to be the greater source of conceptual confusion. But target populations are certainly almost always woefully underdefined in psychology.
2 Lundberg et al. (2021) are an excellent source to learn more about how to specify estimands, I also blogged a bit more about it in “Who would win, 100 duck-sized strategic ambiguities vs. 1 horse-sized structured abstract?”
3 There can of course be exceptions to this when the research question deliberately changes – for example, you may start with a causal estimand but then realize that it cannot be recovered and a more modest research question has to be asked. In that case, the transition must be clearly explicated and justified to readers.
4 There could of course be scenarios in which the the estimand of interest is in fact a conditional association of some sorts. In that case, the introduction needs to make clear why this conditional association is of interest.  For example, we may be plausibly interested in whether firstborn children are systematically different from laterborn children within families of the same size.
5 I ramble more about “prediction” in psychology in Rohrer (2024).
6 Malte asks: Isn’t Estimalami slicing a problem? The answer will depend on how the estimands are logically connected. For example, if you have an experimental manipulation X and want to look at its effects on Y1, Y2, and Y3, which are all related constructs – by all means, do that in one paper. If you are looking at the correlation between one of the Big Five personality traits and some other variable, you might as well also throw in the other Big Five. In fact, that’s expected in personality psychology. If, instead, you are trying to estimate the effect of X on Y and you barely have a proper identification strategy in the first place, and then you add “oh and is this effect mediated by M1 depending on the level of the moderator M2?” then you are probably trying too much at once.
7 For one more sketched out vision of what that could look like, see conclusion of Rohrer et al. (2022).