Saturday 20 May 2023

How not to analyse the relationship between climate and international migration

I've done research before on the relationship between climate and migration (see this post, and the paper published here, or ungated here). So, I was really interested to read this new article by Dennis Wesselbaum (University of Otago), published in the journal Letters in Spatial and Resource Sciences (open access). Wesselbaum uses data on migration flows from 198 countries to 16 OECD countries, along with temperature data from the Berkeley Earth database, and weather-related disasters data from the EM-DAT (international disasters) database. Controlling also for GDP, population, political freedom, life expectancy, and share of agricultural land, he finds that:

...temperature, but not weather-related disasters, have a significant direct effect on migration in our sample. Temperature has a smaller effect on migration towards OECD countries in Asia compared to Europe, Africa, and North America. For disasters, we only find a stronger effect on migration in Asia compared to Africa. Temperature matters in most regions while disasters do not.

However, as the Economics Discussion Group students and I discussed in our most recent session, there are two key statistical problems with Wesselbaum's analysis. The first is the way that migration flows equal to zero (of which there are likely to be many) are dealt with. Because the dependent variable is the log of migration, and the log of zero is undefined, Wesselbaum deals with this by "adding one to all flows". That creates a problem of bias, as I noted in this recent post. Most migration researchers have instead adopted the Poisson pseudo-maximum likelihood (PPML) approach (see this working paper, for example), as it not only copes with zero values, but also deals with over-dispersion.

The second issue is likely to be more problematic. The three key variables in the analysis (migration, temperature variation, and weather-related disasters) are all trended over time. When you run an analysis with a long time-series (or, as in this case, a long panel dataset), then time trends in the variables can lead to spurious correlations. That's the reason why per capita cheese consumption is highly correlated with the number of deaths by bedsheet entanglement:


Two variables that are both trended over time will tend to look like they are closely correlated, even when a change in one of the variables does not cause a change in the other. Even when you use more complicated statistical methods, this remains a problem. To see why that may be a problem here, consider Figures 1-3 from Wesselbaum's paper:



Notice how all three variables have an upward trend. Economists refer to these time series as being non-stationary (which essentially means that the mean value of the variable is not constant over time). That doesn't mean for certain that there are problems in Wesselbaum's analysis, but it does mean that he should have tested for non-stationarity in the variables. If time series variables are found to be non-stationary, a simple solution can be to take first-differences (so that each variable would then be the difference between its value at time t, and its value at time t-1). Since Wesselbaum doesn't report the tests for stationarity, we have no way of knowing how serious the problems are, and the risk is that the correlation he identifies is simply spurious, and driven entirely by the time trends in the data.

This is not the way to analyse these data. However, it does open an opportunity for a good Honours or Masters student to replicate the analysis with a better approach.

Read more:

No comments:

Post a Comment