## Saturday, 30 September 2017

### Extrapolating linear time trends... Male teacher edition

I recently read this article from The Conversation, by Kevin McGrath and Penny Van Bergen (both Macquarie University). In the article, the first two sentences read:
Male teachers may face extinction in Australian primary schools by the year 2067 unless urgent policy action is taken. In government schools, the year is 2054.
This finding comes from our analysis of more than 50 years of national annual workplace data – the first of its kind in any country.
Take a look at The Conversation article. It's hilarious. The authors take time series data on the proportion of teachers in Australia who are male, and essentially fit a linear time trend to the data (and in some cases a quadratic time trend also), then extrapolate. I took a look at the paper, which was just published in the journal Economics of Education Review. Any half-decent second-year quantitative methods student would be able to do the analysis from that paper, but most would not then extrapolate and conclude:
Looking forward, it is not possible to determine whether the decreasing representation of male teachers in Australia will continue unabated. If so, however, the situation is dire. In primary schools Australia-wide, for example, male teachers were 28.49% of the teaching staff in 1977. Taking the negative linear trend observed in male primary teaching staff and extrapolating forward, it is possible to determine that Australian male primary teachers will reach an ‘extinction point’ in the year 2067. In Government primary schools, where this decline is sharpest, this ‘extinction point’ comes much sooner – in the year 2054.
There is nothing to suggest a linear time trend is going to continue into the future. Certainly, it seems unlikely when you have a variable (like the proportion of teachers who are male) that is bounded by 0% and 100% that it will behave in any way linearly when you get close to the extremes, even if it is behaving linearly for past data. Here's the key data from their paper (there's also a more interactive version at The Conversation):

If you're set on trying out polynomials of time trends, why stop at a quadratic? The primary school data (the lower line in the diagram above) looks like it might be a cubic since it starts off upward sloping then starts going downwards, but at a decreasing rate. I manually scraped their data from the article in The Conversation for male primary teachers, then ran different polynomial time trends through it. The linear time trend had an R-squared of 0.939 (close to the 0.95 they report in the paper), a quadratic had an R-squared of 0.939, a cubic increased this to 0.982. In the cubic, all three variables (time, time-squared, and time-cubed) are highly statistically significant. Moreover, this model has a much higher R-squared than their quadratic, so is more predictive of the actual data. In the cubic model (shown below), the forecast shows an increase in male teacher proportions from about now!

Now, I'm not about to use my model to suggest that the proportion of male primary school teachers would accelerate to 100% (if you extrapolate the cubic model, this happens by 2049, which is sooner than the proportion reaches zero under McGrath and Van Bergen's model extrapolation!), but I could. And then I would be just as wrong as McGrath and Van Bergen. The British mathematician George Box once said: "All models are wrong, but some are useful". In this case, both the linear time trend model of McGrath and Van Bergen and the cubic model I've shown here are both wrong and not useful. Hopefully people haven't taken McGrath and Van Bergen's results too seriously. The gender gap in teaching is potentially a problem, but trying to show this by extrapolating time trends using a very simple model is not helpful.