Tudi rezultati ekonomskih študij so večinoma napačni

Naš stari znanec, John Ioannidis s Stanforda, ki je pred dobrim desetletjem šokiral akademsko javnost z raziskavo, ki je dokazovala, da je večina rezultatov znanstvenih študij napačnih (ker ne upoštevajo v zadovoljivi meri moči statistične povezave) in ki je lani dokazoval, da je večina kliničnih testov v medicini napačno ocenjenih, se je v še sveže objavljeni raziskavi lotil ekonomije. In tudi tukaj je ugotovil podobno: izmed 6,700 empiričnih študij jih približno polovica prikazuje pretirane rezultate. Poenostavljeno rečeno, ker je delež pojasnjene variance prenizek (prenizek determinacijski koeficient), je vrednost ocenjenih koeficientov nepravilno (pretirano) ocenjena. V povprečju za faktor 2, v eni tretjini študij pa za faktor 4 ali več.

Za neposvečene v empirijo, je spodaj uvodni del članka, ki razlaga, zakaj je problem, da v empiričnih študijah zanemarjamo moč statistične povezave (determinacijski koeficient, ki kaže, kako dobro pojasnjevalne spremenljivke pojasnjujejo variabilnost odvisne spremenljivke).

Denimo, da v študiji o učinku dviga minimalne plače na zaposlenost pojasnjevalne spremenljivke lahko pojasnijo le 50% variabilnosti odvisne spremenljivke, mi pa dobljene koeficiente ne glede na to interpretiramo, kot da je determinacijski koeficient enak 0.9 ali več (da z modelom lahko pojasnimo vsaj 90% variabilnosti odvisne spremenljivke). To pomeni, da so naši koefcienti (denimo tisti ključni, ki kaže povezavo med višino minimalne plače in obsegom zaposlenosti) napačno ocenjeni. Če bi namreč v model vključili še vse ostale potrebne pojasnjevalne spremenljivke (če bi jih poznali), bi se seveda vrednost naših ocenjenih koeficientov ustrezno spremenila.

In to je ključni problem vseh empiričnih raziskav v znanosti, ne le ekonomskih. Z našimi modeli se ne znamo dovolj dobro približati realnosti, ne znamo identificirati vseh faktorjev, ki vplivajo na nek pojav. Običajno je to objektivno nemogoče, zato se zadovoljimo z nekim, po našem mnenju, najboljšim približkom. Vendar pa s tem dobimo napačne koeficiente, ki niso napačni samo po svoji velikosti, pač pa lahko tudi po svoji statistični značilnosti (lahko, da niso statistično značilno različni od nič ali pa imajo nasproten predznak od dejanskega). Vendar tega ne vemo. Problem je, kadar se te študije jemljejo “zdravo za gotovo“, za sprejemanje nekih ukrepov, saj lahko s tem odločevalcem (od menedžerjev do politikov, od medicine in farmacije do ekonomske politike) damo povsem napačen ali pretiran nasvet.

Good policy and practice is built on the foundations of reliable scientific knowledge. Unfortunately, there are long-held suspicions that much of what passes as evidence in economics, medicine or in psychology (and possibly other fields) lacks sufficient credibility (De Long and Lang, 1992; Ioannidis, 2005b; Leamer, 1983; Ioannidis and Doucouliagos, 2013; Maniadis et al., 2017). For example, it has been difficult to reproduce and verify significant bodies of observational and experimental research independently (Ioannidis, 2005a; Begley and Ellis, 2012; Begley and Ioannidis, 2015; Duvendack et al., 2015; Nosek et al., 2015). Moreover, empirical research is plagued by a range of questionable practices and even the fabrication of results. Consequently, some argue that science is experiencing a credibility crisis. This crisis of confidence in research permeates multiple scientific disciplines. While there are discipline-specific nuances, there are also many shared experiences and distorted incentives. Just as declining credibility may spill over from one discipline to another, successful strategies and practices can benefit other disciplines. Hence, a multidisciplinary approach may advance all sciences.

Statistical power is a critical parameter in assessing the scientific value of an empirical study. Power’s prominence increases with policy importance. The more pressing it is to have evidence-based policy, the more critical it is to have the evidence base adequately powered and thereby credible. By definition, adequate power means that the empirical methods and data should be able to detect an effect, should it be there. Low power means high rates of false negatives. However, as Ioannidis (2005b) has argued, low power also causes high rates of false positives, where non-existent effects are seemingly detected. Aside from the prior probability that a given economic proposition is true (a magnitude that would likely cause endless debate among economists), the key parameters for assessing the validity of any given reported research result are: statistical power and the proportion of reported non-null results that are the artefact of some bias (e.g. misspecification bias and publication selection bias).

How credible is empirical economics? Is empirical economics adequately powered? Many suspect that statistical power is routinely low in empirical economics. However, to date, there has been no large-scale survey of statistical power widely across empirical economics. The main objectives of this article are to fill this gap, investigate the implications of low power on the magnitude of likely bias and recommend changes in practice that are likely to increase power, reduce bias and thereby increase the credibility of empirical economics.

For many researchers, a key consideration is whether a particular research project is publishable. In contrast, from a social welfare perspective, the more important consideration is the contribution that the research inquiry makes to science.1 The validity and credibility of empirical economics has long been questioned. For example, Leamer (1983) famously pointed out that empirical economics is vulnerable to a number of biases and, as a result, produces rather fragile results that few economists take seriously. De Long and Lang (1992) found evidence of publication selection bias among the top economic journals. Ziliak and McCloskey (2004) searched papers in the American Economic Review (AER) and found that only 8% of the empirical studies published in the 1990s actually consider statistical power.2 Doucouliagos and Stanley (2013) quantitatively surveyed 87 empirical economics areas and found evidence of widespread publication selection bias. Ioannidis and Doucouliagos (2013, p. 997) recently reviewed and summarised available evidence of prevalent research practices and biases in the field and called into question the credibility of empirical economics, arguing that overall ‘the credibility of the economics literature is likely to be modest or even low’.

Vir: Ioannidis et al (2017), The Power of Bias in Economics Research

%d bloggers like this: