
Why Con Air Likely Did Not Cause Anyone to Drown in a Pool
Rob J. Mann
University of Chicago
February 2022
In the decade leading up to 2009, the annual number of pool drownings in the U.S. fluctuated at a rate closely corresponding to the frequency of Hollywood studios releasing a film starring actor Nicolas Cage (fig. 1). We can thus conclude that there exists a strong relationship between Cage's movies and pool drownings. Right?!
Even for careful thinkers, the confusion of correlation with causation is a strikingly common cognitive pitfall. Upon reflection, the error may be revealed to be a product of oversight, a cognitive blindspot, motivated reasoning, or perhaps just a missed breakfast. To prevent falling victim to the attraction of examples of simple correlation, researchers must prioritize investigation of the reasons for the apparent connections. A. Bradford Hill, in a paper outlining how the environment may be evaluated to affect illness, offers a handy toolbox of nine guidelines, often dubbed the “Bradford Hill Criteria,” which are widely accepted as the standard means for assessing causation from association. (Hill, 1965) Tempting as it would be to occasionally accept correlation as a substitute for causation—and believe, for example, Nicolas Cage really did have something to do with all those drownings—I work through below several studies which both highlight the value of applying Hill's criteria, and the shortcomings of not digging deeper into the causes of apparent association.*

Hill on Strength: Hill observed that when the correlation between two sets of observable data is especially strong, it is compelling to consider the possibility of a causal relationship based on the association’s strength alone. As an example of this, Hill pointed out that mortality rates from heavy tobacco smokers was twenty to thirty times that of nonsmokers. Before even hypothesizing biological reasons for smoking causing cancer, a researcher could not ignore such a stark difference between heavy smokers and nonsmokers and their lung cancer mortality rates, and would be compelled to investigate further, rightly supposing a meaningful association was present.
Cummings (2010) and consistency and plausibility: Rather than investigate the possible causes of this well observed correlation, researchers are often incentivized to spend their effort (and funding) uncovering new and novel links between associations that may to be equally strong. When encountering such studies we should look to other criteria in the Hill arsenal, such as his principle of consistency. Cummings’ (2010) research on the connection between sunshine and the seasonality of human birth rates drew on data from nine distinct locations around the globe, with a average study period of nearly 17 years. When Hill suggested that consistency was key to assert causal connections, he noted the value of data, “repeatedly observed by different persons, in different places, circumstances and times”. The Cummings study did just that, and sought what data was available from locations as disparate as Vietnam, Finland, and South Africa, among others. The study was looking for indications that exposure to environmental light intensity (ELI) in the 1-2 months preceding conception increased the rates of children born 9-10 months later. The breadth of Cummings' data and rigorous approach to isolating actual light exposure (and not being confounded by dark, rainy skies on long summer days, for example) allowed Cummings to conclude with some confidence that there are “significant positive correlations between ELI and birth seasonality in six culturally, geographically diverse environments.” Hill also stressed plausibility as a criteria for studying causal connection, and the Cummings study addressed that also, citing the human body’s increase in the production of reproductive hormones in response to sunlight exposure. He further rested his theory on biological, not cultural, origins by noting how there is similar evidence of birth seasonality among baboons and chimpanzees.
Verma & Verma (2015) and temporality: Verma (2015) summarizes a body of scientific literature about links between myopia and high intelligence. The study acknowledges the myriad challenges espousers of this relationship must overcome—not the least of which is a universally accepted definition of “intelligence”. Anyone hearing this assertion for the first time quickly tries to imagine the alleged scientific basis for the claim. Verma summarizes what various studies purport are the medical links between myopia and intelligence, such as the association between ocular and brain development, and the likelihood that myopes read a lot and are thus better prepared for “intelligence tests” that reward those examinees who read a lot. The latter claim is really behavioral, not biological, and this study tries to isolate stronger medical reasons why myopia develops in intelligent individuals, but struggles to find objective, observable data that may be independent of behavior.
Verma recounts another hypothesis of the myopia-intelligence correlation, dubbed the “eye-brain gene." The idea requires considerable mental yoga to accept as plausible and is not worth regurgitating; however, there is a more fundamental Hill criteria that the various studies on this topic fail to satisfy, and that is the principle of temporality. Hill observes that in strong cases for causation, the data indicates what condition leads to what outcome. In other words, it should be clear what is the horse, and what is the cart. The literature summarized by Verma is woefully deficient in this respect, as while those target studies can point to clear associations between the two conditions, researchers cannot articulate whether myopia is the product of high intelligence, or somehow causes it. If, for example, it is true that myopia and high intelligence are the result of some shared gene, how can one refer to their relationship as “causal”?
Rogan & Gladen (1993) and biological gradient: Another novel criteria suggested by Hill as useful in determining causation is the principle of biological gradient. This notes that a characteristic alleged to cause an outcome should do so on a measurable basis along a dose-response curve. For example, if exposure to some airborne pollutant causes a respiratory disorder, than exposure to a high dose of that pollutant should cause a severe case of that disorder. It is interesting to consider this criteria in the context of studies asserting links between infant breastfeeding and cognitive development. The Rogan (1993) study does just that, and focuses particularly on how longer breastfeeding periods may produce relatively higher cognitive development than would shorter periods. (Fig. 2)

The Rogan (1993) research is not without shortcomings (the researchers admit their “mental development” may be a crude measurement of several complex traits; and they note the study’s subjects are a pretty homogenous group, which can produce confounding data from cultural sources), but for the purposes of considering biological gradient, it is a compelling study, especially in asserting how exposure to the nutrition found in breastmilk may be the cause for cognitive development experienced by individuals who breastfed for prolonged periods.
Manfredini, et al (2017) and experimentation: Hill quietly remarked that experiments may offer the most powerful case for causation. While his observation is remarkably understated, most scientists today agree that systematic, objective, and repeatable experiments—and randomized control trials (RCTs), in particular—are the modern gold standard for determining causation over mere correlation. RCTs and other such experiments appear to be especially compelling when conclusions are confounded by innumerable possible variables, or when common sense points to a seemingly obvious conclusion (though I did not dive into additional data sources to support this prevailing hypothesis).
Manfredini, et al. (2017) demonstrated the value of this approach in a large RCT examining the links between exercise and improved health in dialysis patients. Because exercise is so widely accepted as beneficial to health, researchers here could have simply built a study of observational data, for example simply asking dialysis patients about their existing exercise habits, and then rested conclusions on the assumption that those who exercised more would have healthier outcomes. But they went further here with the RCT, designing an exercise regime, recruiting volunteers, and examining how participants in the study benefitted from increased exercise, for example through increased mobility, walking times, and quality of life, versus those in the control group. Such a targeted and customized approach minimizes the chance of unrelated variables producing similar outcomes and can illuminate genuine causality.
Conclusion: “A new study finds…[insert provocative clause (propagated by an online algorithm to suit the reader’s biases)]”. Even before the COVID-19 pandemic brought topics such as epidemiology, vaccine efficacy, and the lethality of the Spanish flu into everyone’s home and Twitter feed, the proliferation and democratization of data meant online news readers may have a dangerously inflated sense of their own scientific literacy. In this age of information, citizens must have some fluency with statistical reasoning, and should be attuned to the perils of confusing correlation with causation. We laugh at the purported connection between a Hollywood action star and accidental drownings because it is so obviously ridiculous. But in terms of scientific objectivity, is it really more ridiculous than the pundit MD on TV alleging video games must cause brain cancer because brain tumor diagnoses have increased while video games became more popular? Without understanding the crucial differences between genuine causation and correlation, we are vulnerable to how “scientific” data is manipulated to achieve political, social, or material outcomes that have nothing to do with science. Perhaps more perilously, if we let down our guard in our own thinking, when the invitation finally comes, we might spurn the offer to attend one of Nick Cage's legendary pool parties.
​
​
*Vigen's website, Spurious Correlations, is a brilliant and highly illustrative reminder of just how wrong-headed it is to assume all correlations have causal links.
​
​
References:
Cummings, D.R. (2010) Human birth seasonality and sunshine. American Journal of Human Biology 22:316-324.
Hill, A.B. (1965) The environment and disease: association or causation? Proceedings of the Royal Society of Medicine 1965;58:295–300.
Manfredini, F., et al. (2017) Exercise in patients on dialysis: A multicenter, randomized clinical trial. Journal of the American Society of Nephrology 2017, 28 (4) 1259-1268
Rogan, W.J. & Gladen, B.C. (1993) Breast feeding and cognitive development. Early Human Development 31:181-193.
Verma, A. & Verma, A. (2015) A novel review of the evidence linking myopia and high intelligence. Journal of Ophthalmology 2015, 271746:1-8.
Vigen, T. Spurious correlations. https://tylervigen.com/spurious-correlations