'Big data' has become the catchcry of many data scientists and researchers in recent years. It's also become increasingly used in economics. However, by itself the analysis of big data doesn't provide anything but big data correlations. Even when big datasets are available, there is still a place for randomised controlled trials (RCTs). That is the essence of this new article by Andrew Leigh (Parliament of Australia), published in the journal Australian Economic Review (sorry, I don't see an ungated version online).
It should come as no surprise that Leigh is pro-RCT. After all, he is the author of the book Randomistas (which I reviewed here), which was essentially a tribute to RCTs. Leigh clearly sees the rise of big data, and its increasing use as a substitute for RCTs, as a threat to good research. In the article, he takes great pains to point out instances where big data draws the wrong conclusions, compared with RCTs on the same topic. For example:
Randomised trials have demonstrated a strongly beneficial effect of statins on reducing cardiovascular mortality. Yet when they analysed a database covering the entire Danish population, researchers found that the chance of death from cardiovascular causes was one‐quarter higher among those who took statins than among those who did not. The explanation is straightforward: people who were prescribed statins were at elevated risk of having a heart attack. Yet even when researchers made statistical adjustments, using all the variables available in the database, they were unable to reproduce the well‐known finding that statins have a beneficial effect on cardiovascular mortality.
Analysis of the Danish database also suggested that the relative risk of cancer was 15% lower among patients who took statins, an effect that remained statistically significant even after controlling for other observed factors about the patients. Yet this result is at odds with the evidence from randomised trials. A meta‐analysis of randomised trials, covering more than 10,000 cases of cancer, found no effects of statins on the incidence of cancer, nor on deaths from cancer...
The observational data was doubly wrong. Observational data failed to replicate the well‐known finding that statins improve heart health. And observational data wrongly suggested that statins reduce the risk of cancer. Randomised trials, which were not biased by selection effects, provided the correct answer.
That is only one example of many in the article. However, while Leigh is pro-RCT, he is not anti-big-data. He notes that:
Large data sets are a valuable complement to randomised trials. But big data is not a substitute for randomisation.
If we take anything away from Leigh's article, it should be that point. Big data is incredibly useful. However, it must be analysed using the tools of causal inference (of which randomised controlled trials are just one example) if we want to move beyond finding correlations. The problem with big data is compounded by a focus on statistical significance (as Ziliak and McCloskey noted in their book The Cult of Statistical Significance, which I reviewed here). Big datasets will find statistically significant correlations even when the size of the relationship is very small. That is an asset when causal methods are applied, but is very much a liability when big data are analysed without consideration of causality. RCTs are one way of disciplining our research approach in order to ensure that the effects we estimate are causal, and as Leigh notes:
While correlations in large data sets do not necessarily indicate causation, administrative data can be enormously helpful in ensuring the precision of estimates from randomised trials.
The article finishes with high-level strategies that policy makers and practitioners can use to ensure that RCTs are embedded within the analysis of public policy:
I advocate five approaches. Encourage curiosity in yourself and those you lead. Seek simple trials, especially at the outset. Ensure experiments are ethically grounded. Foster institutions that push people towards more rigorous evaluation. Collaborate internationally to share best practice and identify evidence gaps.
Those all sound like good approaches. I would add a sixth: Employ analysts with a thorough grounding in causal inference methods generally, if not RCTs specifically. We need more policy analysis that establishes causal evidence of impact.
No comments:
Post a Comment