Še en odličen komentar Tima Harforda, tokrat o tem, da nas korelacija med različnimi pojavi (denimo med številom gnezd štorkelj na danskih hišah in številom otrok, rojenih v teh hišah) ne sme zavesti, da med pojavi obstaja tudi vzročna zveza. Toda hkrati ne smemo pretiravati v drugo smer: vzročnosti ne smemo kar odpisati, kadar med pojavoma (še) ne znamo pojasniti neposredne vzročne povezave.
Bodite pozorni na koncu na znano zgodbo o Ronaldu Fisherju, velikemu statistiku, biologu in genetiku, brez katerega ne bi bilo sodobne statistike, sodobne genetike in neo-darvinistične sinteze, ki pa se je v zgodbi o tem, ali kajenje povzroča raka postavil na stran, da za vzročnost med obema ni dokazov. Kasneje se je odkrilo, da je bil Fisher na plačilni listi tobačne industrije. Vem, na kaj ste nemudoma pomislili… Toda prav tukaj je twist, ki ga Harford bravurozno pokaže: je svetovalna dejavnost za tobačno industrijo spodbudila Fisherjev skepticizem? Ali pa so osebna prepričanja in navade (bil je strasten kadilec pipe) spodbudili svetovalne projekte? Ali drugače rečeno, bi bilo mnenje strokovnjakov kaj drugačno, če ne bi bilo plačano?
It is said that there is a correlation between the number of storks’ nests found on Danish houses and the number of children born in those houses. Could the old story about babies being delivered by storks really be true? No. Correlation is not causation. Storks do not deliver children but larger houses have more room both for children and for storks.
This much-loved statistical anecdote seems less amusing when you consider how it was used in a US Senate committee hearing in 1965. The expert witness giving testimony was arguing that while smoking may be correlated with lung cancer, a causal relationship was unproven and implausible. Pressed on the statistical parallels between storks and cigarettes, he replied that they “seem to me the same”.
The witness’s name was Darrell Huff, a freelance journalist beloved by generations of geeks for his wonderful and hugely successful 1954 book How to Lie with Statistics. His reputation today might be rather different had the proposed sequel made it to print. How to Lie with Smoking Statistics used a variety of stork-style arguments to throw doubt on the connection between smoking and cancer, and it was supported by a grant from the Tobacco Institute. It was never published, for reasons that remain unclear. (The story of Huff’s career as a tobacco consultant was brought to the attention of statisticians in articles by Andrew Gelman in Chance in 2012 and by Alex Reinhart in Significance in 2014.)
Indisputably, smoking causes lung cancer and various other deadly conditions. But the problematic relationship between correlation and causation in general remains an active area of debate and confusion. The “spurious correlations” compiled by Harvard law student Tyler Vigen and displayed on his website (tylervigen.com) should be a warning. Did you realise that consumption of margarine is strongly correlated with the divorce rate in Maine?
We cannot rely on correlation alone, then. But insisting on absolute proof of causation is too exacting a standard (arguably, an impossible one). Between those two extremes, where does the right balance lie between trusting correlations and looking for evidence of causation?
…
It’s not clear why Huff and Fisher were so fixated on the idea that the growing evidence on smoking was a mere correlation. Both of them were paid as consultants by the tobacco industry and some will believe that the consulting fees caused their scepticism. It seems just as likely that their scepticism caused the consulting fees. We may never know.
Vir: Tim Harford