Saturday, 7 March 2026

Perceptions of inequality and satisfaction with democracy

Last week, my ECONS101 class covered (among many other things) the faulty causation fallacy. This occurs when we observe two variables that appear to be related to each other (they are correlated), but a change in one of the variables does not actually cause a change in the other variable (there is no causal relationship). We might observe a relationship between two variables (call them A and B), and it might be because a change in A causes a change in B, in which case the relationship is causal. But even if we can tell a really good story explaining why we think a change in A causes a change in B, that in itself doesn't make it true. We might observe that relationship because a change in B causes a change in A (we call this reverse causation). Or, we might observe that relationship because a change in some other variable causes a change in both A and B (we call this confounding). Or, the two variables might be completely unrelated, and the observed relationship happens by chance (we call this spurious correlation).

To illustrate this, I'm going to use the example of the research in this 2024 discussion paper by Nicholas Biddle and Matthew Gray (both Australian National University). They also wrote a non-technical summary of their paper on The Conversation. Biddle and Gray look at the relationship between perceptions of income inequality and faith in democratic institutions. To be fair to them, they do say in the paper that "This does not, however, demonstrate a causal relationship from views on inequality to views on democracy". However, most of their interpretations and their policy recommendations assume that the relationship is causal. For example, they conclude that:

The fundamental issue identified in this paper is that the Australian population has identified the income distribution in Australia as being unfair, and that this appears to be impacting views on democracy.

First though, let's take a step back and look at the research. Biddle and Gray use data from Waves 5 and 6 (from 2018 and 2023 respectively) of the Asian Barometer Survey (with a sample size of over a thousand in each wave for Australia), as well as from the ANUPoll surveys, which is a quarterly survey of public opinion run by the Social Research Centre at ANU. For the ANUPoll, they use the January 2024 data, which includes data from over 4000 respondents.

First, from the Asian Barometer, Biddle and Gray find that there is substantial concern about inequality:

In both waves 5 and 6 of the survey, respondents were asked ‘How fair do you think income distribution is in Australia?’... more Australians think that the income distribution is unfair or very unfair (60.5 per cent) than think it is fair or very fair. This gap has widened slightly since 2018, particularly in terms of those who think the distribution is very unfair as opposed to just unfair.

Second, in the ANUPoll data, they find that:

Combined, 30.3 per cent of Australians were not at all or not very satisfied with democracy in January 2024 (compared to 34.2 per cent in October 2023). This is still well above the January 2023 levels of dissatisfaction (22.9 per cent) and even more so the March 2008 levels (18.6 per cent).

So over time, Australians' perceptions of inequality have gotten worse (they think the income distribution is less fair), and they are less satisfied with democracy. It is reasonable, then, to ask whether those concerns about inequality affect people's faith in democratic institutions. Biddle and Gray next look at that relationship, using to the ANUPoll data, and find that:

There is a very strong relationship between views on income inequality in Australia and views on democracy...

Their model (shown in Table 1 in the paper [*]) shows that the most negative views of the income distribution are associated with negative satisfaction with democracy, while the more positive views of the income distribution are associated with positive views of democracy.

So, there is a strong correlation between perceptions of inequality and satisfaction with democracy. But is that just a correlation, or is there a causal relationship? We can tell a good story here (and Biddle and Gray do that). People who are less satisfied with the income distribution may lay some blame on government, and therefore their satisfaction with democracy falls.

Before we conclude that this relationship is causal though, let me lay out some alternatives. First, perhaps people who are less satisfied with democracy become less satisfied in general with many aspects of society, including the income distribution. In this case, there could be reverse causality. Second, perhaps people who are less satisfied with life in general express less satisfaction with many aspects of life and society, and so they answer more negatively when asked about the satisfaction with democracy, and they answer more negatively when asked about their views of the income distribution. In this case, there would be confounding. Third, perhaps satisfaction with democracy is declining over time for some reason, and views about the income distribution are becoming more negative for some completely different reason. But they look like they are related because they are both trending downwards. In this case, there would be a spurious correlation between perceptions of inequality and satisfaction with democracy.

It isn't straightforward to see two variables that appear to be related, and assume that a change in one of those variables causes a change in the other variable. Economists and other researchers have developed a number of statistical tools and experimental methods to try and tease out when a correlation really is demonstrating a causal relationship. Biddle and Gray haven't done that. It might be that negative perceptions of inequality reduce satisfaction with democracy. By itself, this research doesn't allow us to conclude that.

[HT: The Conversation]

*****

[*] Table 1 in the paper actually has an error. The explanatory variable in the table is labelled as satisfaction with democracy, when that is actually the dependent variable. It is perceptions of inequality that is the explanatory variable.

Friday, 6 March 2026

This week in research #116

Here's what caught my eye in research over the past week:

  • Numa and Zahran (with ungated earlier version here) show that W.E.B. Du Bois made enduring contributions to economics (and may be one of the most under-rated economists of the early 20th Century)
  • Federle et al. (with ungated earlier version here) study 150 years of the economic cost of war, and find that a war of average intensity is associated with an output drop of close to 10 percent in the war-site economy, while consumer prices rise by approximately 20 percent
  • Passaro, Kojima, and Pakzad-Hurson (with ungated earlier version here) find that when there are more men than women in a labour market, 'equal pay for similar work' policies increase the gender wage gap
  • Strulik and Trimborn (open access) show analytically that higher world population could causally lead to a lower long-run temperature increase under optimal carbon taxation (though I think that the optimal carbon taxation might be doing a lot of the work there)
  • Arellano-Bover et al. (with ungated earlier version here) look into the initial job-matching of US graduates by major, and find significant variation in callback rate returns to majors, with Biology and Economics majors receiving the highest rate, particularly in occupations involving high intensity of analytical and interpersonal skills
  • Xu et al. develop a geographically weighted autoregressive model with an adaptive spatial weights matrix (a bit pointy-headed for many readers of this blog, but of interest to me!)
  • Li, Liu, and Si find in a meta-analysis that minimum wages actually increase female employment (showing that the question of the employment effects of minimum wages is still not solved)

Wednesday, 4 March 2026

This is not how generative AI should be used in research

I've been using ChatGPT Pro to help with drafting research papers this year, as I noted that I would do in this post from January. It has amped up my productivity a lot, allowing me to finish writing up two papers already, with a third on the way. These were papers where the analysis was already done, but it was the writing that was holding up the process. Having ChatGPT to help with the drafting seems to kickstart my writing, even though I have ended up extensively re-writing everything that ChatGPT produces. I find it a good disciplining tool as much as anything. Several colleagues have asked whether I am disclosing my generative AI use to journal editors when I submit. And I do. I have a standard 'generative AI use statement' that I include in my papers, that notes how it was used, and that I remain responsible for all of the content. You can see an example in this recent working paper.

However, not everyone is as careful with their generative AI use, or as transparent. Consider this example:

That is both infuriating and a sad indictment of the reviewing, editing, and publishing process, not least because, as on Reddit commenter noted, many authors see high-quality work rejected by journals, whereas a paper like this, with obvious flaws, has successfully been published. And it's not an isolated incident. This 2025 article by Artur Strzelecki (University of Economics in Katowice), published in the journal Learned Publishing (open access), catalogues over 1300 instances of likely unacknowledged and frankly stupid use of ChatGPT, up to September 2024.

Strzelecki's approach is to search for text strings that are almost certainly ChatGPT responses to a prompt asking it to generate text. The main example Strzelecki uses, which is in the title of the article, is "as of my last knowledge update". No human author is going to say that in a research paper. Similarly, "as an AI language model", "I don't have access to", and "certainly, here is" are highly indicative of ChatGPT use. There are circumstances where a human might use those phrases in a research paper, but it seems unlikely. Strzelecki screens out papers that mention ChatGPT, and manually checks each paper to ensure the text was not in some way legitimate, and that leaves 1362 articles.

How do these articles get published with this content intact? There are lots of stopping points where this could be caught and corrected (or prevented), but these articles have gotten through all of them. Strzelecki outlines the process. First, perhaps it is only one of the authors (and not all of them) that used ChatGPT. In which case, why didn't the other co-authors pick it up? Next, the paper is submitted to a journal, and often goes through a text review by the publisher. And then the editor or editors (including associate editors) looks at it, and decides whether it should be sent out for peer review. And then the peer reviewers (usually more than one, sometimes four or more) look at the paper in detail and provide comments. Then the editor receives the review reports and makes a decision. The paper may go through more than one round of review and editorial decision. And then, once accepted for publication, the article may be copy-edited. And at any of those stages, this text could be picked up. And yet, for over 1300 articles as of September 2024, the ChatGPT-generated text has not been picked up.

Strzelecki particularly focuses on 89 articles that have been published in journals indexed by Scopus or Web of Science, which should be the most credible journals. Of these:

...as many as 28 of them are in journals with Scopus percentile values of 90 and above. Two journals have a 99th percentile, indicating that they are the top journals in their field...

In total, 64 articles were found in journals considered to be in Q1, top quartile, recognized as the group of the best journals in their respective fields. Twenty-five articles are in the percentile range between 50 and 75, indicating that the journals in which these articles are found belong to Q2.

So, this phenomenon is not limited to low-ranked 'predatory' journals. In fact, looking at the list, there are several journals published by MDPI and Frontiers (for more on those publishers, see here). However, there are a whole lot published by Elsevier and Springer, publishers that we should expect much better of. Although, those are also publishers that publish a lot of journals, and a lot of articles, so perhaps that accounts for their higher numbers within the 89 articles that Strzelecki focuses on. Fortunately, I don't see any reputable journals in economics in the list, but I could be wrong.

Anyway, the takeaway is not so much that generative AI use is widespread in the write-up of research. It is that authors are using generative AI, not being transparent in their use of it, and that the quality control system by journals, even high-ranking journals, is terrible. Strzelecki makes a good point in the conclusion of his article that 89 out of over 2.5 million articles indexed in Scopus is only 0.000035% of the total indexed articles. However, this analysis is only picking up the really, really obvious cases. There will be far more use of generative AI that has not been adequately checked or acknowledged by authors, and not picked up in quality control.

I'm not against using generative AI in the write-up of research. Obviously, because I am doing the same thing. What needs to happen is that researchers need to be transparent and honest when they use generative AI, so that editors, reviewers, and the readers of research can see how it was used. That way, the users of research can evaluate for themselves whether they should believe, discount, or discard research depending on the ways and the extent of generative AI use. Without transparency, that important evaluation step is lost.

[HT: Artur Strzelecki]

Read more:

Monday, 2 March 2026

You can make future population decline disappear just by changing the way you categorise people and fertility

Fertility has been on a long-term declining trajectory worldwide and, apart from the occasional blip, in every country. There seems to be no prospect of a reversal of this trend, and no prospect of fertility returning to the replacement level of approximately 2.1 births per woman. So, when you see a research paper claiming that "high-fertility, high-retention groups persist, gain share, and lead the total population to grow", you should sit up and take notice. That is, at least, until you've carefully thought about the paper in question.

That's what happened to me with this 2025 NBER Working Paper by Sebastian Galiani (University of Maryland, College Park) and Raul Sosa (Universidad de San Andres). They create and calibrate models of fertility based on two different subgroupings (by race, and by religion), and taking account of cultural transmission of fertility rates from mothers to daughters. They then use their calibrated models to simulate population change going forward for ten generations. What they find when the population is categorised by race is a decreasing population, as shown in Figure 1 Panel A from the paper:

And when Galiani and Sosa categorise the population by religion, they instead find an increasing population, as shown in Figure 2 Panel A from the paper:

Now, this struck me as really odd. We’re talking about the same country and the same underlying population. If you split that population into subgroups and take a weighted average of what happens in each subgroup, you should get back the outcome for the population as a whole. If you are measuring the same underlying thing consistently, changing the subgroups (race in one analysis, and religion in another) shouldn’t magically create or destroy population growth in the model. At most, it should change which groups are growing faster and therefore how the composition by group changes over time, with high-fertility groups making up a larger share of the population and lower-fertility groups making up a smaller share. But the headline result here is much stronger than that, with the direction of population growth in aggregate changing direction entirely depending on the groupings that are employed. Galiani and Sosa use those results to conclude that:

...whenever at least one group remains above replacement on the female line and transmits identity effectively, its share rises and turns the aggregate path upward.

The first part of that conclusion makes sense, but the second part stretches credibility. It made me wonder whether the results were being driven by unusual features of the model, or by different modelling choices in the two analyses. 

So, I dug into the paper, which is not an easy task as it is quite theoretical. And there are consequential differences between the two analyses (by race and by religion) that drive the difference in results. First, they use different measures of fertility, with the analysis by race based on the total fertility rate (TFR), while the analysis by religion is based on completed fertility (see this post for a brief discussion on the difference between those two measures). There is a consequential difference between the two measures. By definition, completed fertility can only be observed for women who have finished their childbearing years, so it covers a period over the last twenty or more years. In contrast, the total fertility rate that Galiani and Sosa use was measured in 2023, after a long period of fertility decline. By construction then, the analysis using completed fertility (the analysis by religion) will be assuming higher fertility than the analysis using the total fertility rate (the analysis by race). This is highlighted by Table 1 in the paper, which shows that nearly every racial group has a total fertility rate that is below replacement (Hispanic is highest among the large groups at a TFR of 1.946, while Native Hawaiian and Pacific Islanders have a TFR of 2.218), whereas there are several religious groups with completed fertility rates above replacement (including Mormons at 3.4, and Muslims at 2.4). 

Second, their calibration implies much bigger gaps across religious groups than across racial groups. Specifically, they assume greater dispersion in fertility and retention by religion than by race. That means that the forces driving fertility change within population groups are much stronger in the analysis by religion than the analysis by race. So, essentially this doubles down on the effect of higher fertility that arises from the different data sources.

Overall, I don't find the comparison across the two models to be credible. They are employing different measures, taken from different points in time, and applying different modelling assumptions. In contrast, the results within each model showing that the relative group proportions change over time to favour groups that have higher fertility are plausible and are worth taking account of. For instance, Galiani and Sosa conclude that:

Although the objective is not to forecast outcomes for particular groups, our world simulations imply not only a more religious composition but also that, within the horizon we study, Muslims become the largest tradition by share.

That seems like a sensible conclusion to draw based on the evidence, especially as they explicitly note that they aren't trying to forecast the population. Nevertheless, they do forecast the population, and their results are not entirely consistent with what is expected to happen. World population is set to start declining later this century in large part because of declining overall fertility, and their results based on religion suggest that this is suddenly going to reverse course, and remain upward over a time horizon of ten generations. In reality, the long-run trend in fertility is difficult to change in the real world, and applying some complicated economic modelling in a way that appears to overturn the on-the-ground reality is not going to contribute to a change.

[HT: Marginal Revolution]

Read more: