Monday, 21 November 2022

An important note of caution on meta-analysis

I've written a number of posts that mention meta-analysis (most recently here), where the coefficient estimates from many studies are combined quantitatively to arrive at an overall measure of the relationship between variables. Meta-analysis has the advantage that, while any one study can give an incorrect impression of the relationship simply by chance, that problem is much less likely when you look at a whole lot of studies. At least, that problem is much less likely when there is no publication bias (which occurs when studies that show statistically significant effects, often in an expected direction, are much more likely to be published than studies that show statistically insignificant effects). Fortunately, there are methods for identifying and correcting for publication bias in meta-analyses.

However, a recent post on the DataColada blog by Uri Simonsohn, Leif Nelson and Joe Simmons (the first of a series of posts), raises what appear to be two fundamental issues with meta-analysis:

Meta-analysis has many problems... But in this series, we will focus our attention on only two of the many problems: (1) lack of quality control, and (2) the averaging of incommensurable results.

In relation to the first problem:

Some studies in the scientific literature have clean designs and provide valid tests of the meta-analytic hypothesis. Many studies, however, do not, as they suffer from confounds, demand effects, invalid statistical analyses, reporting errors, data fraud, etc. (see, e.g., many papers that you have reviewed). In addition, some studies provide valid tests of a specific hypothesis, but not a valid test of the hypothesis being investigated in the meta-analysis...

When we average valid with invalid studies, the resulting average is invalid.

And in relation to the second problem:

Averaging results from very similar studies – e.g., studies with identical operationalizations of the independent and dependent variables – may yield a meaningful (and more precise) estimate of the effect size of interest. But in some literatures the studies are quite different, with different manipulations, populations, dependent variables and even research questions. What is the meaning of an average effect in such cases? What is being estimated? 

Both of these problems had me quite concerned, not least because I currently have a PhD student working on a meta-analysis of the factors associated with demand for healthcare in developing countries. However, I've been reflecting on this over the last couple of weeks, and I'm feeling a bit more relaxed now.

That's because the first problem is relatively easily addressed. The initial step in a meta-analysis is to identify all of the studies that could be included in the meta-analysis. That might include some invalid studies. However, if as a second step we subject all the identified studies to a quality check, and exclude studies that do not meet a minimum quality standard, I think we probably eliminate most of the problems. Of course, some studies with reporting errors, or outright fraudulent results, might sneak through, but poorly designed studies, which fail on basic statistical or sampling criteria, will be excluded. That is the approach that my PhD student has adopted.

The second problem may not be as bad as Simonsohn et al. suggest. Their example relates to experimental research, where the experimental treatment varies across studies, such that averaging the effect of the treatment makes little sense. However, not all meta-analyses are focused on experimental treatments. Some are combining the results of many observational or quasi-experimental studies, where the variable of interest is much more similar. For instance, looking at the effect of income on health services demand, we need to worry about how income (and health services demand) are measured. However, if we use standardised effects in the meta-analysis (so that all estimates are measuring the effect of a one-standard-deviation change in income on health services demand, measured in standard deviations), then I think we deal with most problems here as well. Again, that is the approach that my PhD student has adopted.

None of this is to say that all (or most, or even perhaps many) meta-analyses are bulletproof. It's just that the critiques of Simonsohn et al. may be overplayed. However, it is important to keep these issues in mind. I recommend reading the other posts in the DataColada series on this topic, of which there is just one so far, on meta-analyses of  'nudging', with a promise of more to come.

In the meantime, I think my PhD student can rest a little uneasily, but secure that so far their work is addressing these critiques. But with more to come, I reserve the right to change my mind. Simonsohn et al. are usually quite persuasive, so I'm awaiting a stronger case against meta-analysis from them in the rest of the series. On the other hand, I'm hoping not!

[HT: David McKenzie at Development Impact, among others]

No comments:

Post a Comment