The Hare-Brained Generation: Teen mental health crisis or lacklustre record keeping?

In The Anxious Generation, Jon Haidt argues that social media is driving a mental health crisis among teens. It’s a compelling thesis, widely discussed in the media, mostly accepted by my students and even by me—for a while. I felt I owed this book a read given that this is a topic many of my students are very interested in. And some of the arguments I’ve seen thrown about against Haidt’s thesis just didn’t seem convincing.

The Reference Class Problem or Simple Arguments Contra Haidt

There is a commonly made argument against Haidt’s thesis that previous worries about technology have turned out to be panics. Nothing truly bad happened because of novels, radio, Dungeons and Dragons, anime, video games, and so on, so nothing bad will happen because of social media. I think this argument is weaker than it appears. How do we know previous media/entertainment is the right reference class? Arguing from first principles, it could be that companies trying to maximise engagement have finally hit on the right combination of Behaviourist principles and devices to really sink in their hooks. Maybe we should place social media in the larger reference class of superstimuli, where we do know some that have actually caused problems (like the Western diet, casinos, or opioids). And of course, COVID was the first in its (recent) reference class to truly become a pandemic, notwithstanding behavioural science experts like Gerd Gigerenzer, and Cass Sunstein calling it a panic and likening it to bird flu, swine flu etc.

Why might we doubt that anime and D&D are not the right reference class? Many of the affected youth actually believe that social media is bad for them. That seems unusual to me, compared to other technology panics I’ve lived through. None of my friends or me truly believed that violent video games would turn us into killers.[1]I have not seen studies comparing this for different purported “panics”. Might be worthwhile. But when I was around 16, Facebook clones entered the European market. Never fazed by GTA, I was wary of social media, and even of MMORPGs like World of Warcraft and co. 

Even though I’d been quite online since I was 12 and had been in many chat rooms etc., it seemed like social networks and neverending games might be more dangerous and perhaps irresistible. I wondered if I could get so drawn into WoW that I’d lose out in real life. My views were of course influenced by the discourse at the time, but some personal experiences entered too. I had a friend who once didn’t open his door when his girlfriend rang the doorbell, because he was on a raid in WoW. She had ridden her moped out to the countryside where he lived for 30 minutes, presumably to sleep with him. That didn’t bode well for the future of our species, I thought.

I never played WoW[2]Nor did I have a girlfriend who would have ridden her moped out to the countryside for me, so there is that.. But during an exchange year in Sweden, I caved and signed up for Facebook, because I was hoping I would get invited to more parties.[3]Causal inference is difficult, but I did go to more parties during the second half of my year. An obvious confounding factor being my much better haircut. I resisted smartphones a while longer.[4]Also, I couldn’t afford one. But I prefer to think my moral fortitude was the real reason. I remember carrying around a compass in my first years in Berlin, so my sense of orientation was definitely already shit before I got GPS [5]For some reason there seems to be no good research on the oft-repeated idea that GPS ruined our sense of orientation.. So, I understand the skepticism about social media. I can also emphasise with the parenting angle. Haidt tells a story about his daughter asking him to put away her iPad because she couldn’t bring herself to put it away. I’m a parent too now and can tell you about the time my child had a little “accident” while playing a highly gamified letter writing game on the iPad.

All this to say: I get it. I’ve seen and felt the “addictive” pull of various new media. But I don’t feel like they ruined my life or even affected my mental health. I shared that anecdote about my friend, but, truthfully speaking, I never had the impression that video games, porn, social media etc. could compete with attention from the attractive sex for long for any of us. Yeah, my friend ignored his girlfriend, but to be honest, I didn’t like her very much, maybe he didn’t either. Now, he is in a stable relationship with someone else and has kids. [6]Is World of Warcraft the hero of that story? I’ll let you be the judge.

So, my foreboding feeling was off and I didn’t witness a moment of voluntary self extinction there. I remain a bit sceptical of social media’s pull on me and I’ve set some screen time limits on my phone with a code that only my wife knows. But I’ve mostly learned and adapted. And my child? Recently had an “accident” while playing Lotti Karotti/Funny Bunny, a completely analog game that, to me, is barely exciting enough to keep me conscious. It’s a kid. They learn and adapt[7]and master bladder control, hopefully.

Figure 1. The Hare-Brained Generation?

A Systemic Problem is Hard to Prove

So, I have heard the stories and can contribute some of my own. Haidt’s thesis assumes these stories aggregate into systemic problems, but my personal observations suggest adaptation is also common. Are my observations just different from his? It could certainly be true that there’s heterogeneity in at least two senses.

There is the sense that social media might affect some people positively, some negatively and others not at all. This is my default assumption, which is to say, I would assume the same thing for alcohol, casinos, video games, psychotherapy, cilantro, and so forth. Still, just because I believe heterogeneity is likely does not imply that I believe we can easily infer the counterfactual of an adolescence without social media for any individual case. But that is what Haidt invites the reader to do when he highlights individual cases and implicates social media in their suffering.

There is also the sense that social media and smartphones are a heterogeneous group of activities and devices. Very different platforms can be accessed and they can be used very differently. Focusing on the medium and not the contents that people consume is silly to me. I’ll fight anyone who says Bluey is worse for kids than Elmer the patchwork elephant[8]So. Lame., simply because one is a TV series and one a book. Does Haidt acknowledge that content matters? Barely. He does distinguish between synchronous (e.g. FaceTime) and asynchronous (consuming content, writing messages) behaviour on phones, but that’s about it. I think this is extremely problematic for his policy advocacy. By banning social media wholesale, you might take it away from people who benefit (e.g., people who feel different and don’t have an offline community, such as queer people in a small town or 13 year old boys who desperately need help from the SelfHTML community to build their website with crude jokes and puns). If you instead focus on clearly harmful content (such as pro anorexia content or misinformation about autism), you could probably unite a much bigger coalition for a ban and risk fewer unintended side effects for those who benefit.

Now, if you believe there’s all this heterogeneity, you do not expect that small experimental studies where people go off some social media platform for a few weeks will tell you much about what a proposed country-wide ban on social media for teens would cause. [9]I also agree with the cautionary note that an individual going off social media is very different from their whole friend group doing so. Haidt does cite such small experimental studies, and there’s been some brouhaha about meta analyses of these studies, but, honestly, none of this moves the needle much for me. If I start by believing that some benefit and some suffer, arguing whether average treatment effects are zero or just pretty close to zero feels beside the point.

Time Trends in Mental Illness

My impression is that Haidt himself is more strongly impressed by the evidence from time trends in mental health. He compiles many graphs on mental illness, suicidal ideation, and suicides going up over time, always highlighting (as a gray bar from 2010-2015[10]See here and here for an argument that this is actually too late.) the rise of the smartphone and social media as the point when things turn south. That is also the part that I found interesting.

I did not expect to find strong causal evidence on social media causing mental illness. But regardless of the cause I was worried to hear that there was a mental illness epidemic among teens worldwide. That seems like something we should know about. 

But, and I cannot stress this enough, even the evidence of real changes in teen mental illness over time is weak. I’m not a mental health researcher and I was surprised about the low quality of the evidence Haidt could marshal. Apparently, good record keeping about mental health is not something all rich countries have on their agenda.

Of course, it’s challenging to measure mental health to begin with. What Mental Illness Really Is… and What It Isn’t by Lucy Foulkes has a good lay introduction to this problem. We do not have objective biomarkers of diseases like depression.[11]Even brain patterns associated with depression can potentially be voluntarily controlled (see also this excellent piece on placebos).

Lacking objective measures like viral load, most researchers investigating time trends are reduced to looking at self reports, diagnoses, or clinical interviews.[12]Not that it’s easy to measure time trends in potentially objectively measurable quantities, see the sad story of testosterone Self reports can be affected by changes in awareness, in how we use language (“a bit depressed”, “that’s my OCD”, anyone?), in how forthcoming we are, but even fads (TikTok tics etc.) and so on and so forth. Diagnoses can be affected by all of these vibe shifts as well as explicit expansions of our diagnostic manuals (such as the DSM III to IV to 5 expansions[13]Along with getting rid of Roman numerals, the DSM-5 got rid of the bereavement exclusion criterion for major depression, made it easier to get an ADHD diagnosis as an adult and so on.), lobbying from pharmaceutical companies, patient groups, changes in insurance coverage and other health system things. For example, in Germany it’s common for parents to be diagnosed with adaptation disorder if their child is hospitalised for a long time — it’s not that feeling sad if your child is sick is a sign of mental illness, the diagnosis is just a wink-wink approach to be excused from work with pay that works within the system. Finally, diagnoses are often made by GPs who rarely use gold standard diagnostic procedures[14]In Cybulski et al., the researchers note that UK primary care physicians may engage in ‘strategic labeling’, i.e. diagnose depression less often and related symptoms more often, because diagnosing depression can cause them more work., not that specialists always do.[15]Structured clinical interviews are considered the gold standard. Better than simply responding to a self report survey; higher standards for the diagnosis than in the day-to-day hustle of clinical practice. But many of the problems above affect them too and of course they require much more dedicated effort than self report surveys or making use of insurance databases on diagnoses.

In contrast to Haidt, Lucy Foulkes transparently discusses these problems, reviews a lot of evidence and comes away believing there’s uncertain evidence for a slight increase in recorded mental illness, but we cannot rule out changes in awareness and language, so it’s difficult to infer that people are actually feeling worse. Those are frustrating levels of nuance. Are you a university teacher, frustrated by the exemptions students ask for in recent years? Instead of validating you, she forces you to reckon with thorny measurement problems. If there’s one clear conclusion, it seems to be that psychological science has failed to solve the measurement and record keeping challenges needed to distinguish time trends in mental illness from time trends in how we measure, diagnose and discuss mental illness.

Haidt, to phrase it carefully, does not go for this nuance. Instead, he shows mainly self report charts, usually dichotomised and analysed to show a percentage increase, shown as if there was no uncertainty, statistical or otherwise. Haidt acknowledges that self report could be biased, but stresses that suicide and self harm are also up. Suicide and self harm are real physical acts in the world, so if they are also increasing, all these nuanced arguments about bias, awareness and diagnostics ring hollow. Or do they?

Motte and Bailey

Unfortunately for this argument, neither of these charts, on self harm and on suicide, can be taken at face value: whether something is recorded as intentional (rather than accidental) self harm (or as suicide) is not immune to bias because it is an interpretation of an event by a human being who (ideally) follows diagnostic guidelines. And, crucially, these guidelines have, in fact, changed in the time frame shown in his graphs. In other countries, which did not undergo guideline changes but were affected by the social media boom, teen suicides are not uniformly up. Haidt, despite signalling that he is all about the debate and posting a sparsely populated public errata page on his website, does not acknowledge these issues in the book and has not yet noted them on his errata page.[16]There are rebuttals by Zach Rausch and Jean Twenge to this criticism, and Zach Rausch reached out to me recently about some of my critical tweets, which I appreciated. But though I value scientific debate, this engagement with critics and the subsequent rebuttals fell short of my ideal. They don’t steelman the critics or even fully quote them. I had the impression they are trying to win the argument, not to get to the bottom of this. To briefly explain why I did not find the rebuttals convincing: The argument about the ACA guideline change was that the “Department of Health and Human Services issued a new set of guidelines that recommended that teenage girls should be screened annually for depression by their primary care physicians and that same year required that insurance providers cover such screenings in full.”. That was in 2011. Rausch writes “changing diagnostics hypothesis predicts that the big rise in self-harm hospitalizations should happen after 2015” but does not make the case why the ACA-mandated screenings for depression would not also turn up evidence of self harm. Rausch also dismisses the evidence from New Jersey. New Jersey does not matter per se; it’s a small state. But it seems to be one of the few places where the mechanism of diagnostic change was investigated — since we know diagnostic change also occurred outside NJ, the point still stands. Twenge writes “the increase in self-harm among teen girls shows no evidence of being caused by a coding change” — but that is exactly what that study is about. We shouldn’t infer that the entirety of observed change is due to coding, but we should not minimise the issue either. She cites time trends predating the coding change, but these are not a strong contradiction because coding changes in diagnoses made by humans do not necessarily flip switch-like. Journalists have published Haidt’s emails to Australian policymakers, obtained through freedom of information requests. These emails clearly show him to use the classic motte-and-bailey strategy: in public debates, he admits some uncertainty, praises debate and skeptics, makes claims that are easier to defend — he stays in the “motte”/castle. But when he doesn’t expect to be heavily challenged, in the “bailey”, as in this email to a policymaker, he makes much stronger claims, including unsubstantiated accusations that his critics bury evidence.

So, I read two books about teen mental illness trends. Is there cause for concern? I remain frustratingly uncertain about that. In fact, I would say we are not even in a position to clearly know whether we should be doing mental health awareness or unawareness campaigns. I am certain though that we should be worried about psychology’s lacklustre record keeping and measurement of mental health over time. I hope my pessimistic side is proven wrong for once and we fix some of the issues and collect better data, before the hype cycle moves on to the next thing, presumably AI companions.

Footnotes

Footnotes
1 I have not seen studies comparing this for different purported “panics”. Might be worthwhile.
2 Nor did I have a girlfriend who would have ridden her moped out to the countryside for me, so there is that.
3 Causal inference is difficult, but I did go to more parties during the second half of my year. An obvious confounding factor being my much better haircut.
4 Also, I couldn’t afford one. But I prefer to think my moral fortitude was the real reason.
5 For some reason there seems to be no good research on the oft-repeated idea that GPS ruined our sense of orientation.
6 Is World of Warcraft the hero of that story? I’ll let you be the judge.
7 and master bladder control, hopefully
8 So. Lame.
9 I also agree with the cautionary note that an individual going off social media is very different from their whole friend group doing so.
10 See here and here for an argument that this is actually too late.
11 Even brain patterns associated with depression can potentially be voluntarily controlled (see also this excellent piece on placebos).
12 Not that it’s easy to measure time trends in potentially objectively measurable quantities, see the sad story of testosterone
13 Along with getting rid of Roman numerals, the DSM-5 got rid of the bereavement exclusion criterion for major depression, made it easier to get an ADHD diagnosis as an adult and so on.
14 In Cybulski et al., the researchers note that UK primary care physicians may engage in ‘strategic labeling’, i.e. diagnose depression less often and related symptoms more often, because diagnosing depression can cause them more work.
15 Structured clinical interviews are considered the gold standard. Better than simply responding to a self report survey; higher standards for the diagnosis than in the day-to-day hustle of clinical practice. But many of the problems above affect them too and of course they require much more dedicated effort than self report surveys or making use of insurance databases on diagnoses.
16 There are rebuttals by Zach Rausch and Jean Twenge to this criticism, and Zach Rausch reached out to me recently about some of my critical tweets, which I appreciated. But though I value scientific debate, this engagement with critics and the subsequent rebuttals fell short of my ideal. They don’t steelman the critics or even fully quote them. I had the impression they are trying to win the argument, not to get to the bottom of this. To briefly explain why I did not find the rebuttals convincing: The argument about the ACA guideline change was that the “Department of Health and Human Services issued a new set of guidelines that recommended that teenage girls should be screened annually for depression by their primary care physicians and that same year required that insurance providers cover such screenings in full.”. That was in 2011. Rausch writes “changing diagnostics hypothesis predicts that the big rise in self-harm hospitalizations should happen after 2015” but does not make the case why the ACA-mandated screenings for depression would not also turn up evidence of self harm. Rausch also dismisses the evidence from New Jersey. New Jersey does not matter per se; it’s a small state. But it seems to be one of the few places where the mechanism of diagnostic change was investigated — since we know diagnostic change also occurred outside NJ, the point still stands. Twenge writes “the increase in self-harm among teen girls shows no evidence of being caused by a coding change” — but that is exactly what that study is about. We shouldn’t infer that the entirety of observed change is due to coding, but we should not minimise the issue either. She cites time trends predating the coding change, but these are not a strong contradiction because coding changes in diagnoses made by humans do not necessarily flip switch-like.