I've written a number of times about studies that have used experimental methods to answer questions in economics. Experimental economics has some advantages over empirical economics that uses 'found' data, because in experiments the researcher has more control, and can therefore more easily test their hypotheses through clever use of experimental design. However, economics experiments are not fool-proof (nothing is!), and economists are constantly improving experimental methods. So, I was interested to read this 2019 article by Eszter Czibor (University of Chicago) David Jimenez-Gomez (University of Alicante), and John List (University of Chicago), published in the Southern Economic Journal (ungated earlier version here).
The paper enumerates twelve things that experimental economists should do more of, each backed up by numerous examples (so, it's clear that these are not things that no experimental economist is doing, they're just things that Czibor et al. believe should be done more).
The first thing is "Appropriately consider generalizability, across the lab and the field". Under this heading, Czibor et al. recommend that experimental economists clearly consider the potential threats to the generalisability of their experiments. These threats include interaction between treatment and other characteristics of the experiment (including problems like Hawthorne effects and John Henry effects), selective noncompliance (where research participants essentially try to change whether they are part of the treatment or control group), nonrandom selection into the experiment (participation bias in terms of who decides to participate in the experiment), and differences in populations (which is an issue that I have written about before).
Second, Czibor et al. suggest that experimental economists should "Do more field experiments, especially NFEs". NFEs are 'natural field experiments', where the research is conducted in a natural setting and the research participants have no idea that they are part of a research project and are just going about their daily lives. Czibor et al. argue that NFEs are "often less subject to the threats to generalizability than other types of experiments", which is likely to be true.
Third, experimental economists should "Use lab and field experiments as complementary approaches in the production of scientific knowledge". Czibor et al. note that these complementary approaches also improve the generalisability of the results.
Fourth, Czibor et al. note that "For proper inference, go beyond p-values". This is not a new insight (see here, for example), and they even cite Ziliak and McCloskey's excellent book The Cult of Statistical Significance (which I reviewed here).
The fifth suggestion is to "Replicate early and often". Again, this is not a new insight (see here, for example), and it's not a recommendation that is specific to economics, given the widespread replication crisis across the social and medical sciences.
Sixth, Czibor et al. advocate that experimental economists "Consider statistical power in the design phase". Power calculations, especially power calculations conducted before the analysis is conducted, are surprisingly rare in economics, although in common use in health and medicine. Experimental economists would do well to learn from those other disciplines, in order to raise the credibility of their experimental research.
Seventh, experimental economists should "Adjust for MHT [multiple hypothesis testing], in power tests and in data analysis". This is also something that is uncommon in experimental economics but also, from what I have seen, also rare in other experimental social sciences such as psychology. The problem of multiple hypothesis testing is well known - if you test the impact of the experimental treatment on enough variables, then simply by chance at least one (or more) of them will turn out to exhibit statistically significant effects. Adjusting for this multiple hypothesis testing is therefore important, to avoid false positive effects.
Eighth, Czibor et al. recommend that experimental economists "Use blocked randomization to increase power and credibility". They note that "Blocking (also known as stratification) refers to the practice of dividing experimental subjects into blocks (strata) by observable characteristics, such that randomization is performed within, but not between, these blocks..." The advantage here is that blocking increases the statistical power, allowing experimental economists to achieve meaningful results with a smaller sample size (or reducing the chance of false positive results).
The ninth suggestion is to "Use within-subject designs when appropriate". A 'within-subject' research design exposes each research participant to both the treatment and control condition (sequentially, randomising which condition they experience first). This contrasts with a 'between subject' design, where each research participant is either in the treatment group or the control group. A within-subject design improves the statistical power of the experiments, but it also has the advantage of reducing costs. However, as Czibor et al. note, there are experiments where a within-subject design simply wouldn't work well.
Tenth, Czibor et al. suggest that experimental economists should "Go beyond A/B testing by using theoretically guided designs". In this case, they:
...advocate for experimental economists to use economic theory to inform their experimental design whenever possible... and to incorporate results from experiments into existing economic theory... thereby creating a feedback process that guides the development of theory and the design of future experiments...
I found this bit interesting though:
We also do not advocate for journals to demand that authors include ad-hoc economic models after the experiment has been conducted and the data analyzed. Such models add little value in our opinion and can confuse readers as to the true intent and nature of the studies.
I suspect that a fair amount of clever economic theory in the papers that I read was constructed ex post in response to the findings of the empirical analysis.
The eleventh suggestion is to "Focus on the long run, not just on the short run". Behaviour adjusts to experiments, and in the long run behaviour can adjust by more. Or, it may take people time to adapt. Either way, it would be good to know the long run effects (although there are, of course, cost implications to doing experiments where the outcome variables are being measured in the long run).
And finally, Czibor et al.'s twelfth suggestion is that experimental economists should "Understand the science of scaling ex ante and ex post". This point relates almost directly the John List's book The Voltage Effect (which I reviewed here), where when a programme is scaled up, the effects are usually much smaller that they were in a smaller-scale experiment.
Finally, Czibor et al. close with some brief comments about other factors that did not make their list of twelve. Of those, the most surprising was preregistration, which is increasingly being imposed on experimental studies by top academic journals. Czibor et al. therefore argue that most experimental economists are already preregistering their studies. However, this is something that I think needs some further investigation, as it may be possible to preregister a study after the data have been collected.
The article is long, and there is a lot for readers to un-pack. As I've noted, not all of the insights are new. However, these suggestions are coming from some of the top people in the field (of field experiments), so they should be taken very seriously. Although this paper was published in 2019, and was based on a plenary talk given by John List at the Economics Science Association in 2016, I haven't noticed much of an impact. Mind you, I don't read a huge amount of experimental economics papers, so perhaps it is just having an effect in those that I haven't read. Nevertheless, from what I have seen, adjusting for multiple hypothesis testing in particular seems rare in experimental economics, and sadly replication is (and possibly always will be) unattractive to researchers - the incentives simply do not encourage replication studies. Clearly, there is still scope for more of these things to be done.
No comments:
Post a Comment