Friday, 20 March 2026

This week in research #118

Here's what caught my eye in research over the past week (a quiet week, following last week's bumper edition):

  • Rubenstein and Stephenson assess the effect of Taylor Swift’s relationship with Travis Kelce on the Kansas City Chiefs’ television audience, and find that viewership increases by about one-third beginning with Swift’s first time attending a Chiefs’ game
  • Bussoli and Fattobene (open access) find that Financial Graph Literacy is lower among older adults, those with less education, and lower-income groups, and is significantly associated with a greater likelihood of engaging in proactive financial behaviours such as saving, investing, budgeting, and using digital financial tools

Wednesday, 18 March 2026

How the 'travelling Pope' affected international trade

Pope John Paul II was known as 'the travelling Pope' because of the large number of international trips ('pastoral visits') he undertook (more than 100 during his reign from 1979 to 2004). He also had a huge following, as you might expect as the leader of the Catholic Church, but the advent of television meant that the public could follow his travels in a much closer way than ever before. And, through his pastoral visits and his following, he exposed Catholics the world over to new places they would otherwise not have seen or, in some cases, even heard of. What effects did that exposure have?

That is essentially the question addressed in this recent article by Alexander Popov (European Central Bank), published in the Economic Journal (ungated earlier version here). Popov focuses on the impact of the Pope's visits on exports from the visited country, and especially exports to Catholic countries. He employs an event study design - looking at how exports changed between the time before and the time after the Pope's first visit to a country, while controlling for GDP growth, population, the US dollar real exchange rate, and the extent of trade liberalisation and democracy. The key results are summarised in Figure 2(a) from the paper:

The figure shows how exports evolve before and after the Pope's visit. Beforehand, there isn't much evidence of a trend (notice that the red line hovers around zero). However, after the Pope's visit, exports increase (the red line is clearly above zero and trending upwards), and the effect is substantial. Popov notes that:

...the point estimate on Year 3 after the pope’s visit to a country is 0.1152, which implies that exports to the rest of the world are higher by 12.2%, relative to the year of the visit.

And the effects are even larger for exports to countries with larger Catholic populations. Specifically:

...exports to a trading partner with 54.3% (75th percentile), relative to a trading partner with 1.1% (25th percentile) Catholics in the population were higher by between 16.5% and 36.9% during years 1 to 5 after a visit by the pope.

Clearly, Catholics were paying attention to where the Pope was visiting. Popov then asks the obvious question: what explains this effect? He examines three hypotheses:

The first one is that during a foreign visit, the pope explicitly encourages Catholics around the world to engage with the host country on economic terms. I analyse 633 speeches given during the pope’s 130 first visits and I find rare occasions when he mentions words like ‘trade’, ‘economic’ or ‘globalisation’.

So, the Pope wasn't explicitly telling Catholics to buy more goods from the countries he was visiting. Then:

The second hypothesis is that, by simply visiting a country, the pope raises its profile, or ‘puts it on the map’ for the global Catholic family, especially if Catholics around the world are for cultural or economic reasons less connected with the visited country. I find that the effect on exports of a pastoral visit to a country is stronger if this country is relatively poor and if it has relatively fewer Catholics and relatively weaker bilateral trade links with the partner country. The third hypothesis is that Catholics around the world are simply buying souvenirs to commemorate the pope’s visit. I analyse data on bilateral trade at the product level, for ten different sectors, and I find that after a pastoral visit, the increase in exports I detect takes place in half of them.

So, the third hypothesis (souvenirs) doesn't have much support. Popov concludes that the second hypothesis shows the likely driver of the increase in exports. This evidence is consistent with the Pope raising the profile of the countries he visited, and those countries benefiting from their higher profile among Catholics in the form of higher exports, especially to Catholic countries.

What makes this paper interesting in an economic sense is that it suggests trade flows don't just depend on prices, trade policy, and distance. They also depend on visibility, familiarity, and the ways that cultural influence can affect economic outcomes. Pope John Paul II's visits appear to have increased visibility and familiarity, which may in turn have boosted trade. The 'travelling Pope' may have also been the 'trade-promoting Pope'.

Tuesday, 17 March 2026

Seven decades of change in the demographics and research styles of top economics research

Back in 2013, Daniel Hamermesh (University of Texas at Austin) published this article in the Journal of Economic Literature (ungated earlier version here), which summarised changes in the demographics and research styles of top economics research, based on articles published between 1963 and 2011 in three top journals: the American Economic Review (AER), the Journal of Political Economy (JPE), and the Quarterly Journal of Economics (QJE). A new update last year (open access) from Hamermesh extends the analysis to include articles up to 2024.

In terms of demographics, the trends show a continuation and in terms of gender, Hamermesh notes that:

The progression that occurred from the 1960s and 1970s, when only a minute fraction of authors were women, to the early twenty-first century has, if anything, accelerated.

This will be welcome news, given the persistent gender gap in economics (see this post and the links at the end of it). It likely reflects the changing demographics of young economists, with a growing proportion of the young 'stars' in economics being women (and noting that it is young stars who often get published in the top journals that Hamermesh is considering).

In terms of the age structure of authors, Hamermesh reports that:

The changes from 2011 to 2024 continued those that started in the 1980s, but the rate of change has not accelerated. Indeed, most noticeable from 2011 to 2024 was a continuing sharp and statistically significant drop in the representation of the youngest group (and a nearly equal sharp rise among those 36–50)...

...the average age of authorship has increased steadily since 1973. 

Can I change my comment above about the young stars in economics? The increasing median age of authors in top journals seems to be a general trend across academia. Hamermesh then turns to research 'style', documenting a continued dramatic rise in the proportion of articles in those journals that are co-authored:

There were no four-authored papers as recently as 1983; today they account for 17 percent of articles. There were no papers with more than four authors in 2003; today nearly 12 percent of articles have five or more authors (with five articles written by six authors each and one by seven authors). Obversely, sole-authored papers are now quite scarce; and even two-authored papers today only account for slightly more than one-fourth of all articles (compared to a majority as recently as 2003).

Unsurprisingly, the increase in the number of co-authored articles means that the age diversity of author collaborations has increased over time as well. In terms of the types of research, he reports that:

The big changes are the continuing rise in empirical work based on original non-laboratory data and the rapid and even accelerating increase in experimental work. Today these two methods, which both involve collecting original data, account for over half of all published papers, compared to less than 4 percent four decades ago...

These trends are not all unrelated, of course. Experimental research, and the increasing use of large datasets, typically both require larger research teams. They also often require more detailed methods, which may involve both larger teams, and more experienced researchers. Larger teams might be more likely to include female team members. And larger teams often need someone to lead and coordinate all of the team members, and those leaders tend to be more experienced (and older) academics. So, it would not surprise me, if more detailed analysis was conducted, to see that the trends are interconnected.

Now, the interesting thing will be what happens going forward, given the increasing use of generative AI in research (see here, for example). Since generative AI can now do a lot of the work that research assistants and early career researchers previously did, will the trend towards larger research teams be reversed? How will that interact with the gender gap in research (given that the age of female economists skews younger at the moment). And how will it affect the age distribution of researchers (given that men, and younger people, are somewhat more likely to use generative AI). I'll be looking forward to Hamermesh's next update. Hopefully, we don't have to wait another 12 years.

[HT: Marginal Revolution, last year]

Monday, 16 March 2026

Changing their minds could be a good thing for economists

People don't like to change their minds. This may partly be an expression of loss aversion - we really want to avoid losses, including the loss of an idea that we previously thought was true. This leads to status quo bias - we prefer not to change things, and keep them the same, because changing things entails a loss. But what if changing our minds could make us better off? Would we be so reluctant to do so?

This 2025 paper by Matt Knepper (University of Georgia) and Brian Wheaton (UCLA) suggests that economists, at least, should not be afraid to change their minds, because doing so increases the number of citations to their research. Knepper and Wheaton investigate authors who undergo an 'ideological reversal' - previously publishing research that could be considered right-wing, before switching and publishing a paper that draws a left-wing-consistent conclusion, or the reverse (switching from left-wing to right-wing). Their main data source is every economics paper ever published in the top 100 economics journals indexed in Web of Science - some 200,000 articles. They also have a narrower dataset of papers referenced in meta-analyses on policy topics, including:

...the minimum wage, the economics of unions, the taxable income elasticity, the fiscal multiplier, intergenerational transfers, trade and productivity, trade and domestic employment, crowd-out, the gender wage gap, unemployment insurance, disability benefits, universal preschool, childcare and employment, immigration and wages, and more.

Knepper and Wheaton use this narrower dataset to train a machine learning model to categorise the rest of the papers in the dataset, as to how left-wing (or right-wing) the conclusions are. For instance, a paper that concludes that the minimum wage reduces employment is more right-wing, whereas one that concludes that there is no disemployment effect of the minimum wage is more left-wing. Knepper and Wheaton define an author as left-wing if they published more left-wing papers than right-wing ones over the previous five years, and the reverse for right-wing authors. They then use the larger dataset to investigate what happens to each economist who undergoes an 'ideological reversal'. They first outline some descriptive facts based on their dataset, including:

  • Fact #1: The typical author mostly publishes results on one side of the political spectrum.

  • Fact #2: Ideological reversals are not rare; they occur at least once for 40% of authors.

  • Fact #3: Ideological reversals become much more common later in an author’s career, with authors essentially never undergoing a reversal in the first decade of their career.

  • Fact #4: Most ideological reversals do not represent a permanent defection to the other side of the political spectrum, but rather the beginning of repeatedly publishing results on both sides of the spectrum.

  • Fact #5: Ideological reversals occur much more frequently amongst authors who are (initially) classified as right-wing.

That does seem like a surprisingly high proportion of economists who undergo at least one ideological reversal. However, perhaps we should take comfort in that - if the results point in a particular direction, our conclusions should say that, even if that conclusion is inconsistent with our previous conclusions on the same topic.

Do these ideological reversals matter though? Knepper and Wheaton employ a difference-in-differences analysis, comparing the difference in citations (and other metrics) between authors who did, and did not, undergo an ideological reversal, between the time before, and after, the reversal occurred. In other words, they look at whether citation counts rise more for economists who have an ideological reversal than for otherwise similar economists who do not. The results are striking, with:

...a sharp clear increase in citation count following an ideological reversal with essentially no evidence of pre-trends... The citation boost accumulates to approximately 9 over a one-decade period and 30 over a two-decade period.

The results remain consistent when Knepper and Wheaton limit the analysis to papers published before the ideological reversal, and when they limit the analysis to papers in the meta-analysis only (showing that the machine learning approach doesn't drive the results). Knepper and Wheaton also find evidence consistent with no change in the quality of papers before and after the ideological reversal, and that:

Both left-to-right and right-to-left reversals are rewarded by increased citations of roughly the same magnitude. The boost in citations received subsequent to a left-to-right reversal is mostly driven by citations from right-wing authors, and the boost in citations received subsequent to a right-to-left reversal is mostly driven by citations from left-wing authors. Encouragingly, however, the new right-wing (left-wing) audience garnered by a left-to-right (right-to-left) reversal... also engages with and cites the author's previous left-wing (right-wing) papers. This dynamic suggests that ideological reversals help prevent the formation of echo chambers in economics academia and expose authors to opposite ideological findings.

This last result is particularly important, and I believe it allows us to conclude that economists need not fear ideological reversals. In doing so, they can attract a new audience from the other side of the ideological spectrum, bringing the two sides closer together. Hopefully through that, we end up with higher-quality research overall.

[HT: Marginal Revolution, last year]

Saturday, 14 March 2026

Artificial intelligence and the 'age of leisure'

My ECONS101 class covered constrained optimisation last week, and one of the models we looked at was the labour-leisure trade-off for workers. Now artificial intelligence, and in particular generative AI, is likely to have large impacts on the labour-leisure trade-off. As the Financial Times reported last year (paywalled):

The idea that technological progress can enable people to work fewer hours is not outlandish...

But in order to believe a similar trend is going to take hold again, you have to assume three things. First: that AI will deliver a substantial boost to economic productivity...

Second, you have to assume the economic gains will be widely distributed...

Third, you have to believe workers will “cash in” those proceeds in the form of extra leisure, rather than higher income. But will they? In many developed countries, there has been a slowdown in the reduction in working hours in recent decades...

Far from trading income for leisure, it is the people with the highest salaries who tend to work the longest hours.

Will workers trade off higher productivity for more leisure time? Are we about to enter an 'age of leisure'? The constrained optimisation model for the worker (see also this post) can help us clarify the possibilities. In this model, we'll assume that AI increases productivity, and that the increase in productivity is represented by higher wages for workers. [*] The model will then tell us whether workers might respond by consuming more, or less, leisure.

Our model of the worker's decision is outlined in the diagram below. The worker's decision is constrained by the amount of discretionary time available to them. Let's call this their time endowment, E. If they spent every hour of discretionary time on leisure, they would have E hours of leisure, but zero income. That is one end point of the worker's budget constraint, on the x-axis. The x-axis measures leisure time from left to right, but that means that it also measures work time (from right to left, because each one hour less leisure means one hour more of work). The difference between E and the number of leisure hours is the number of work hours. Next, if the worker spent every hour working, they would have zero leisure, but would have an income equal to W0*E (the wage, W0, multiplied by the whole time endowment, E). That is the other end point of the worker's budget constraint, on the y-axis. The worker's budget constraint joins up those two points, and has a slope that is equal to the wage (more correctly, it is equal to -W0, and it is negative because the budget constraint is downward sloping). The slope of the budget constraint represents the opportunity cost of leisure. Every hour the worker spends on leisure, they give up the wage of W0. Now, we represent the worker's preferences over leisure and consumption by indifference curves. The worker is trying to maximise their utility, which means that they are trying to get to the highest possible indifference curve that they can, while remaining within their budget constraint. The highest indifference curve they can reach on our diagram is I0. The worker's optimum is the bundle of leisure and consumption where their highest indifference curve meets the budget constraint. This is the bundle A, which contains leisure of L0 (and work hours equal to [E-L0]), and consumption of C0.

Now, let's say that the situation shown above is the situation before the advent of AI. After AI is introduced, productivity increases, and so wages increase (from W0 to W1). This causes the budget constraint to pivot outwards and become steeper (since the slope of the budget constraint is equal to the wage, the slope has increased from -W0 to -W1). The worker can now reach a higher indifference curve, and it is the position of that higher indifference curve that determines the worker's response in terms of whether they consume more leisure or not. If they move to the higher indifference curve I1, then the worker's new optimum is the bundle of leisure and consumption B, which contains leisure of L1 (and work hours equal to [E-L1]), and consumption of C1. For this worker (whose response is shown in red on the diagram), leisure hours decrease as a result of the higher wage. On the other hand, if they move to the higher indifference curve I2, then the worker's new optimum is the bundle of leisure and consumption C, which contains leisure of L2 (and work hours equal to [E-L2]), and consumption of C2. For this worker (whose response is shown in blue on the diagram), leisure hours increase as a result of the higher wage. [**]

Either of these possibilities could happen. In fact, both could happen, with some workers increasing leisure time and others decreasing leisure time. By itself, this model doesn't answer the question of what will happen, but shows that both increased leisure and decreased leisure are possible outcomes.

The key difference here comes down to the size of the income effect of the increase in wages. When wages increase, the opportunity cost of leisure increases. That makes leisure relatively more expensive, and workers should respond by consuming less leisure. That is what we call the substitution effect - workers substitute away from leisure as it becomes more expensive. However, increased wages also lead to an income effect. Leisure is a normal good, which means that as the worker's income increases, they would like to consume more leisure. Notice that the substitution effect and the income effect are working in opposite directions here. For workers who overall decrease their leisure, the substitution effect (which says they should consume less leisure) must be bigger than the income effect (which says they should consume more leisure). For workers who overall increase their leisure, the reverse is true - the substitution effect must be smaller than the income effect.

AI may lead us into an age of leisure. But only if productivity gains lead to higher wages, and the income effect of higher wages more than offsets the substitution effect.

*****

[*] The assumption that productivity gains will lead to higher wages is a strong assumption. Indeed, the FT article questions whether this assumption is valid. If productivity gains don't lead to higher wages, then this model doesn't help us evaluate whether we're about to move into an 'age of leisure', and the impacts might be more macroeconomic than microeconomic. That is, we may end up with leisure, but arising through weaker labour demand, reduced hours, or unemployment rather than through workers voluntarily choosing more leisure as wages increase.

[**] Notice that the indifference curves I1 and I2 are crossing, and indifference curves cannot cross. However, those two indifference curves are for different workers, so there is no problem. I could easily have drawn two different diagrams, one for each worker, but I've kept them both on the same diagram for efficiency.

Friday, 13 March 2026

This week in research #117

Here's what caught my eye in research over the past week:

  • Zhang et al. find that Uber’s entry into a US city significantly reduces crime rates, with larger effects in areas facing greater liquidity constraints (less bank credit supply, fewer local job opportunities, higher personal bankruptcy risk, and greater household financial stress)
  • Sandorf and Navrud (open access) establish convergent validity between a contingent valuation survey and a discrete choice experiment (meaning that both measures are highly correlated), with the example they use being willingness-to-pay to reduce the spread of invasive crabs in Norway
  • Desierto and Koyama (with ungated earlier version here) explain the economics of medieval castles in Europe
  • Ordali and Rapallini (with ungated earlier version here) conduct a meta-analysis of the relationship between age and risk aversion, and confirm that there is a positive relationship in studies using survey data and lotteries
  • Singh and Mukherjee conduct a replication of an earlier study that established 'action bias' among goalkeepers facing a penalty kick, and find that jumping left or right rather than staying in the centre of the goal is not a sub-optimal action for goalkeepers in FIFA World Cup matches, and so the high frequency of jumping is not indicative of action bias (it is good to see a replication study published in a good journal)
  • Lindkvist et al. (open access) investigate attitudes toward research misconduct and questionable research practices among researchers and ethics reviewers across academic fields, and find that researchers and ethics reviewers in medicine, as well as more senior and female researchers and reviewers, took a more negative view of questionable research practices
  • Lei et al. use China’s Compulsory Schooling Law as a quasi-natural experiment to investigate the effect of education on HIV/AIDS, finding that mass education significantly enhances knowledge about HIV/AIDS, and that each additional year of exposure to the law reduces HIV/AIDS and mortality rates by 6.51 percent and 2.15 percent respectively
  • Daoud, Conlin, and Jerzak (open access) study the differential effects of World Bank and Chinese development projects in Africa between 2002 and 2013, using data across 9899 neighbourhoods in 36 African countries, and find that both donors raise wealth, with larger and more consistent gains for Chinese development projects
  • Stoelinga and Tähtinen (open access) find that conflict exposure, on average, increases support for democracy in African countries, but the effects vary by ethnicity and regime type, but interestingly, violence increases trust in ruling institutions in autocratic regimes
  • Ruiz et al. (with ungated earlier version here) find that, following the exodus of Cuban doctors from Brazil in 2018, the reduction in doctors was associated with persistent reductions in the care of chronic diseases, while service utilization for conditions requiring immediate care, such as maternal-related services and infections, quickly recovered
  • Geddes and Holz (open access) investigate the effect of rent control on domestic violence in San Francisco, and find that there was a nearly 10 percent decrease in assaults on women for the average ZIP code (some good news for advocates of rent control, but it hardly offsets the bad outcomes)
  • Clemens and Strain (with ungated earlier version here) add further to the literature on the disemployment effects of minimum wages, this time looking at the difference between large and small minimum wage changes, finding that relatively large minimum wage increases reduced usual hours worked per week among individuals with low levels of experience and education by just under one hour per week during the decade prior to the onset of the Covid-19 pandemic, while the effects of smaller minimum wage increases are economically and statistically indistinguishable from zero

Thursday, 12 March 2026

Anticipating higher future petrol prices, consumers actually push up petrol prices now

In his 1984 book The Evolution of Cooperation, Robert Axelrod suggested that people cooperate in repeated games because of 'the shadow of the future'. They alter their behaviour by cooperating now, because they anticipate that will lead to greater gains for them in the future. I really like this analogy of the shadow of the future affecting our decisions now, and not just in the context of game theory and repeated games. In fact, we've seen it play out in a different context this past week, as reported by the New Zealand Herald:

Kiwis are rushing to fill up their cars across the country amid fears of price increases at the pump because of escalating conflict in the Middle East.

Video sent to the Herald of Waitomo Tinakori petrol station in Wellington today showed a queue of cars waiting for fuel, with vehicles spilling out on to the road.

Waitomo Group CEO Simon Parham said there has been a similar increase in demand at stations across the country, with sales increasing by 10-15% this week.

“People are filling up and filling their cars ahead of the price increase that will flow through the market over the coming weeks because of the Iran conflict,” he said.

To see what is going on here, let's consider the retail market for petrol, as shown in the diagram below. Before the current conflict in the Middle East, the equilibrium price of petrol was P0, and Q0 petrol was traded per week. Then the conflict begins. Consumers anticipate that the price of petrol will increase in the future, so they decide to fill up their vehicles now. That increases the demand for petrol from D0 to D1. The equilibrium price of petrol increases to P1, and there is Q1 petrol traded in the week. 

Notice that by trying to avoid the high petrol price in the future, the consumers cause the price to rise today, which is exactly the outcome they were trying to avoid! In effect, when consumers rush to fill up early, they bring some of the future price pressure forward into the present. Expectations about future prices can cause self-fulfilling prophecies like this, which is a point I will make in my ECONS101 class in several weeks, when we talk about financial markets (where self-fulfilling prophecies are a clear and present danger at all times). The shadow of the future matters - consumers' actions based on trying to avoid future price rises make those price rises happen now instead.

Tuesday, 10 March 2026

Consumers can't tell the difference in audio quality between high-end audio cables and a banana

Consumers are not very good judges of quality. They can't tell the difference between bottled water and tap water. They can't even tell the difference between pâté and dog food. And now, according to this article by Futurism last month, they can't tell the difference between audio cables and a banana:

High-quality cables have long been marketed as a key way to get the most out of high-end equipment, such as expensive studio-grade monitor speaker cables and gold-plated HDMI cables for cutting-edge TVs.

In the high-end audiophile world, which is renowned for eye-bulging prices, cables can cost tens of thousands of dollars for ultra-pure copper with silver plating, specialized insulation, and dozens of individual conductors that manufacturers claim will squeeze the most out of a luxury-grade sound system aimed at the uber-wealthy.

The laws of physics, however, have long dictated that spending that kind of cash on cables simply isn’t worth it in the vast majority of circumstances — as long as you don’t go for the cheapest option from the dollar store, of course.

To put the decades-long debate to the ultimate test, a moderator who goes by Pano at the audiophile enthusiast forum diyAudio conducted an eyebrow-raising experiment back in 2024, which was rediscovered by Headphonesty late last month and Tom’s Hardware last week.

Pano ran high-quality audio through a number of different mediums, including pro audio copper wire, an unripe banana, old microphone cable soldered to pennies, and wet mud. He then challenged his fellow forum members to listen to the resulting clips, which were musical recordings from official CD releases run through the different “cables.”

The results confirmed what most hobbyist audiophiles had already suspected: it was practically impossible to tell the difference.

Consumers are not fully informed about the quality of the products that they buy. When they lack quality information before they buy, but that information is revealed after the consumer buys the good, we say that quality is an experience characteristic (and goods like that are called experience goods). A used car is an example of an experience good - the consumer doesn't really know if it is a high-quality car until they drive it. However, for some goods, the quality isn't revealed even after the good is purchased. In that case, quality is a credence characteristic (and goods like that are called credence goods). Health care is a credence good, because patients don't know for sure what would have happened to them without treatment, so it is impossible to judge the quality of the treatment.

Coming back to using an unripe banana as an audio cable, it appears that the quality of audio cable may also be a credence characteristic. At least, that's what this research tells us.

Why does this matter? The thing about credence goods is that the buyer may be reliant on the seller telling them about the quality. In the case of audio cables, the industry has a strong incentive to convince buyers that a 'high-quality' audio cable matters for sound quality, even if the consumer can't tell the difference. That changes the nature of competition in the industry. When buyers cannot verify quality for themselves, sellers can't compete on quality, and instead rely on reputation, branding, expert language, and the seller’s ability to sound convincing. They aren't going to want to sell banana cables, even if the banana cable would be produce audio of equivalent quality to a 'fancier' cable. Overall, this is a good reminder that in some markets, what consumers pay for is not better quality, but a more persuasive story about quality.

[HT: Marginal Revolution]

Read more:

Saturday, 7 March 2026

Perceptions of inequality and satisfaction with democracy

Last week, my ECONS101 class covered (among many other things) the faulty causation fallacy. This occurs when we observe two variables that appear to be related to each other (they are correlated), but a change in one of the variables does not actually cause a change in the other variable (there is no causal relationship). We might observe a relationship between two variables (call them A and B), and it might be because a change in A causes a change in B, in which case the relationship is causal. But even if we can tell a really good story explaining why we think a change in A causes a change in B, that in itself doesn't make it true. We might observe that relationship because a change in B causes a change in A (we call this reverse causation). Or, we might observe that relationship because a change in some other variable causes a change in both A and B (we call this confounding). Or, the two variables might be completely unrelated, and the observed relationship happens by chance (we call this spurious correlation).

To illustrate this, I'm going to use the example of the research in this 2024 discussion paper by Nicholas Biddle and Matthew Gray (both Australian National University). They also wrote a non-technical summary of their paper on The Conversation. Biddle and Gray look at the relationship between perceptions of income inequality and faith in democratic institutions. To be fair to them, they do say in the paper that "This does not, however, demonstrate a causal relationship from views on inequality to views on democracy". However, most of their interpretations and their policy recommendations assume that the relationship is causal. For example, they conclude that:

The fundamental issue identified in this paper is that the Australian population has identified the income distribution in Australia as being unfair, and that this appears to be impacting views on democracy.

First though, let's take a step back and look at the research. Biddle and Gray use data from Waves 5 and 6 (from 2018 and 2023 respectively) of the Asian Barometer Survey (with a sample size of over a thousand in each wave for Australia), as well as from the ANUPoll surveys, which is a quarterly survey of public opinion run by the Social Research Centre at ANU. For the ANUPoll, they use the January 2024 data, which includes data from over 4000 respondents.

First, from the Asian Barometer, Biddle and Gray find that there is substantial concern about inequality:

In both waves 5 and 6 of the survey, respondents were asked ‘How fair do you think income distribution is in Australia?’... more Australians think that the income distribution is unfair or very unfair (60.5 per cent) than think it is fair or very fair. This gap has widened slightly since 2018, particularly in terms of those who think the distribution is very unfair as opposed to just unfair.

Second, in the ANUPoll data, they find that:

Combined, 30.3 per cent of Australians were not at all or not very satisfied with democracy in January 2024 (compared to 34.2 per cent in October 2023). This is still well above the January 2023 levels of dissatisfaction (22.9 per cent) and even more so the March 2008 levels (18.6 per cent).

So over time, Australians' perceptions of inequality have gotten worse (they think the income distribution is less fair), and they are less satisfied with democracy. It is reasonable, then, to ask whether those concerns about inequality affect people's faith in democratic institutions. Biddle and Gray next look at that relationship, using to the ANUPoll data, and find that:

There is a very strong relationship between views on income inequality in Australia and views on democracy...

Their model (shown in Table 1 in the paper [*]) shows that the most negative views of the income distribution are associated with negative satisfaction with democracy, while the more positive views of the income distribution are associated with positive views of democracy.

So, there is a strong correlation between perceptions of inequality and satisfaction with democracy. But is that just a correlation, or is there a causal relationship? We can tell a good story here (and Biddle and Gray do that). People who are less satisfied with the income distribution may lay some blame on government, and therefore their satisfaction with democracy falls.

Before we conclude that this relationship is causal though, let me lay out some alternatives. First, perhaps people who are less satisfied with democracy become less satisfied in general with many aspects of society, including the income distribution. In this case, there could be reverse causality. Second, perhaps people who are less satisfied with life in general express less satisfaction with many aspects of life and society, and so they answer more negatively when asked about the satisfaction with democracy, and they answer more negatively when asked about their views of the income distribution. In this case, there would be confounding. Third, perhaps satisfaction with democracy is declining over time for some reason, and views about the income distribution are becoming more negative for some completely different reason. But they look like they are related because they are both trending downwards. In this case, there would be a spurious correlation between perceptions of inequality and satisfaction with democracy.

It isn't straightforward to see two variables that appear to be related, and assume that a change in one of those variables causes a change in the other variable. Economists and other researchers have developed a number of statistical tools and experimental methods to try and tease out when a correlation really is demonstrating a causal relationship. Biddle and Gray haven't done that. It might be that negative perceptions of inequality reduce satisfaction with democracy. By itself, this research doesn't allow us to conclude that.

[HT: The Conversation]

*****

[*] Table 1 in the paper actually has an error. The explanatory variable in the table is labelled as satisfaction with democracy, when that is actually the dependent variable. It is perceptions of inequality that is the explanatory variable.

Friday, 6 March 2026

This week in research #116

Here's what caught my eye in research over the past week:

  • Numa and Zahran (with ungated earlier version here) show that W.E.B. Du Bois made enduring contributions to economics (and may be one of the most under-rated economists of the early 20th Century)
  • Federle et al. (with ungated earlier version here) study 150 years of the economic cost of war, and find that a war of average intensity is associated with an output drop of close to 10 percent in the war-site economy, while consumer prices rise by approximately 20 percent
  • Passaro, Kojima, and Pakzad-Hurson (with ungated earlier version here) find that when there are more men than women in a labour market, 'equal pay for similar work' policies increase the gender wage gap
  • Strulik and Trimborn (open access) show analytically that higher world population could causally lead to a lower long-run temperature increase under optimal carbon taxation (though I think that the optimal carbon taxation might be doing a lot of the work there)
  • Arellano-Bover et al. (with ungated earlier version here) look into the initial job-matching of US graduates by major, and find significant variation in callback rate returns to majors, with Biology and Economics majors receiving the highest rate, particularly in occupations involving high intensity of analytical and interpersonal skills
  • Xu et al. develop a geographically weighted autoregressive model with an adaptive spatial weights matrix (a bit pointy-headed for many readers of this blog, but of interest to me!)
  • Li, Liu, and Si find in a meta-analysis that minimum wages actually increase female employment (showing that the question of the employment effects of minimum wages is still not solved)

Wednesday, 4 March 2026

This is not how generative AI should be used in research

I've been using ChatGPT Pro to help with drafting research papers this year, as I noted that I would do in this post from January. It has amped up my productivity a lot, allowing me to finish writing up two papers already, with a third on the way. These were papers where the analysis was already done, but it was the writing that was holding up the process. Having ChatGPT to help with the drafting seems to kickstart my writing, even though I have ended up extensively re-writing everything that ChatGPT produces. I find it a good disciplining tool as much as anything. Several colleagues have asked whether I am disclosing my generative AI use to journal editors when I submit. And I do. I have a standard 'generative AI use statement' that I include in my papers, that notes how it was used, and that I remain responsible for all of the content. You can see an example in this recent working paper.

However, not everyone is as careful with their generative AI use, or as transparent. Consider this example:

That is both infuriating and a sad indictment of the reviewing, editing, and publishing process, not least because, as on Reddit commenter noted, many authors see high-quality work rejected by journals, whereas a paper like this, with obvious flaws, has successfully been published. And it's not an isolated incident. This 2025 article by Artur Strzelecki (University of Economics in Katowice), published in the journal Learned Publishing (open access), catalogues over 1300 instances of likely unacknowledged and frankly stupid use of ChatGPT, up to September 2024.

Strzelecki's approach is to search for text strings that are almost certainly ChatGPT responses to a prompt asking it to generate text. The main example Strzelecki uses, which is in the title of the article, is "as of my last knowledge update". No human author is going to say that in a research paper. Similarly, "as an AI language model", "I don't have access to", and "certainly, here is" are highly indicative of ChatGPT use. There are circumstances where a human might use those phrases in a research paper, but it seems unlikely. Strzelecki screens out papers that mention ChatGPT, and manually checks each paper to ensure the text was not in some way legitimate, and that leaves 1362 articles.

How do these articles get published with this content intact? There are lots of stopping points where this could be caught and corrected (or prevented), but these articles have gotten through all of them. Strzelecki outlines the process. First, perhaps it is only one of the authors (and not all of them) that used ChatGPT. In which case, why didn't the other co-authors pick it up? Next, the paper is submitted to a journal, and often goes through a text review by the publisher. And then the editor or editors (including associate editors) looks at it, and decides whether it should be sent out for peer review. And then the peer reviewers (usually more than one, sometimes four or more) look at the paper in detail and provide comments. Then the editor receives the review reports and makes a decision. The paper may go through more than one round of review and editorial decision. And then, once accepted for publication, the article may be copy-edited. And at any of those stages, this text could be picked up. And yet, for over 1300 articles as of September 2024, the ChatGPT-generated text has not been picked up.

Strzelecki particularly focuses on 89 articles that have been published in journals indexed by Scopus or Web of Science, which should be the most credible journals. Of these:

...as many as 28 of them are in journals with Scopus percentile values of 90 and above. Two journals have a 99th percentile, indicating that they are the top journals in their field...

In total, 64 articles were found in journals considered to be in Q1, top quartile, recognized as the group of the best journals in their respective fields. Twenty-five articles are in the percentile range between 50 and 75, indicating that the journals in which these articles are found belong to Q2.

So, this phenomenon is not limited to low-ranked 'predatory' journals. In fact, looking at the list, there are several journals published by MDPI and Frontiers (for more on those publishers, see here). However, there are a whole lot published by Elsevier and Springer, publishers that we should expect much better of. Although, those are also publishers that publish a lot of journals, and a lot of articles, so perhaps that accounts for their higher numbers within the 89 articles that Strzelecki focuses on. Fortunately, I don't see any reputable journals in economics in the list, but I could be wrong.

Anyway, the takeaway is not so much that generative AI use is widespread in the write-up of research. It is that authors are using generative AI, not being transparent in their use of it, and that the quality control system by journals, even high-ranking journals, is terrible. Strzelecki makes a good point in the conclusion of his article that 89 out of over 2.5 million articles indexed in Scopus is only 0.000035% of the total indexed articles. However, this analysis is only picking up the really, really obvious cases. There will be far more use of generative AI that has not been adequately checked or acknowledged by authors, and not picked up in quality control.

I'm not against using generative AI in the write-up of research. Obviously, because I am doing the same thing. What needs to happen is that researchers need to be transparent and honest when they use generative AI, so that editors, reviewers, and the readers of research can see how it was used. That way, the users of research can evaluate for themselves whether they should believe, discount, or discard research depending on the ways and the extent of generative AI use. Without transparency, that important evaluation step is lost.

[HT: Artur Strzelecki]

Read more:

Monday, 2 March 2026

You can make future population decline disappear just by changing the way you categorise people and fertility

Fertility has been on a long-term declining trajectory worldwide and, apart from the occasional blip, in every country. There seems to be no prospect of a reversal of this trend, and no prospect of fertility returning to the replacement level of approximately 2.1 births per woman. So, when you see a research paper claiming that "high-fertility, high-retention groups persist, gain share, and lead the total population to grow", you should sit up and take notice. That is, at least, until you've carefully thought about the paper in question.

That's what happened to me with this 2025 NBER Working Paper by Sebastian Galiani (University of Maryland, College Park) and Raul Sosa (Universidad de San Andres). They create and calibrate models of fertility based on two different subgroupings (by race, and by religion), and taking account of cultural transmission of fertility rates from mothers to daughters. They then use their calibrated models to simulate population change going forward for ten generations. What they find when the population is categorised by race is a decreasing population, as shown in Figure 1 Panel A from the paper:

And when Galiani and Sosa categorise the population by religion, they instead find an increasing population, as shown in Figure 2 Panel A from the paper:

Now, this struck me as really odd. We’re talking about the same country and the same underlying population. If you split that population into subgroups and take a weighted average of what happens in each subgroup, you should get back the outcome for the population as a whole. If you are measuring the same underlying thing consistently, changing the subgroups (race in one analysis, and religion in another) shouldn’t magically create or destroy population growth in the model. At most, it should change which groups are growing faster and therefore how the composition by group changes over time, with high-fertility groups making up a larger share of the population and lower-fertility groups making up a smaller share. But the headline result here is much stronger than that, with the direction of population growth in aggregate changing direction entirely depending on the groupings that are employed. Galiani and Sosa use those results to conclude that:

...whenever at least one group remains above replacement on the female line and transmits identity effectively, its share rises and turns the aggregate path upward.

The first part of that conclusion makes sense, but the second part stretches credibility. It made me wonder whether the results were being driven by unusual features of the model, or by different modelling choices in the two analyses. 

So, I dug into the paper, which is not an easy task as it is quite theoretical. And there are consequential differences between the two analyses (by race and by religion) that drive the difference in results. First, they use different measures of fertility, with the analysis by race based on the total fertility rate (TFR), while the analysis by religion is based on completed fertility (see this post for a brief discussion on the difference between those two measures). There is a consequential difference between the two measures. By definition, completed fertility can only be observed for women who have finished their childbearing years, so it covers a period over the last twenty or more years. In contrast, the total fertility rate that Galiani and Sosa use was measured in 2023, after a long period of fertility decline. By construction then, the analysis using completed fertility (the analysis by religion) will be assuming higher fertility than the analysis using the total fertility rate (the analysis by race). This is highlighted by Table 1 in the paper, which shows that nearly every racial group has a total fertility rate that is below replacement (Hispanic is highest among the large groups at a TFR of 1.946, while Native Hawaiian and Pacific Islanders have a TFR of 2.218), whereas there are several religious groups with completed fertility rates above replacement (including Mormons at 3.4, and Muslims at 2.4). 

Second, their calibration implies much bigger gaps across religious groups than across racial groups. Specifically, they assume greater dispersion in fertility and retention by religion than by race. That means that the forces driving fertility change within population groups are much stronger in the analysis by religion than the analysis by race. So, essentially this doubles down on the effect of higher fertility that arises from the different data sources.

Overall, I don't find the comparison across the two models to be credible. They are employing different measures, taken from different points in time, and applying different modelling assumptions. In contrast, the results within each model showing that the relative group proportions change over time to favour groups that have higher fertility are plausible and are worth taking account of. For instance, Galiani and Sosa conclude that:

Although the objective is not to forecast outcomes for particular groups, our world simulations imply not only a more religious composition but also that, within the horizon we study, Muslims become the largest tradition by share.

That seems like a sensible conclusion to draw based on the evidence, especially as they explicitly note that they aren't trying to forecast the population. Nevertheless, they do forecast the population, and their results are not entirely consistent with what is expected to happen. World population is set to start declining later this century in large part because of declining overall fertility, and their results based on religion suggest that this is suddenly going to reverse course, and remain upward over a time horizon of ten generations. In reality, the long-run trend in fertility is difficult to change in the real world, and applying some complicated economic modelling in a way that appears to overturn the on-the-ground reality is not going to contribute to a change.

[HT: Marginal Revolution]

Read more:

Sunday, 1 March 2026

Why specialist vape retailers may tend to locate in more socially deprived areas

When I first started studying the social impacts of alcohol outlets, one of the things my research team and I were interested in was where alcohol outlets located. We found (see here) that off-licence outlets tended to locate in areas of high deprivation in Manukau City. I've since replicated that analysis a couple of times in unpublished work, for both South Auckland and Hamilton.

I was interested to see that this new article by Robin van der Sanden (Massey University) and co-authors, published in the New Zealand Medical Journal (sorry, no ungated version online, but you can sign up for open access for free), finds very similar results for specialist vape retailers (which are defined here). They used Google Maps and Google Street View data to locate all of the specialist vape retailers across 14 Auckland suburbs, then categorised them into three types: (1) upmarket; (2) budget; and (3) 'store-within-a-store' (which are located inside or attached to convenience stores, petrol stations or liquor stores. The main results in terms of the relationship between store numbers and social deprivation are shown in Figure 1 from the paper:

This figure shows the median number of specialist vape retailers (in total and by type) by social deprivation. In their sample, stores tend to be more likely to be located in the most deprived two deciles (9-10), and least likely to be in the least deprived two deciles (1-2). Aside from that, I wouldn't draw too much from the analysis here. Because these are median counts per suburb group (not per capita or per land area), differences could reflect population size, commercial zoning, or land area rather than ‘density’. So if high deprivation suburbs also tend to have higher populations, or to be larger in area, then the apparent relationship between social deprivation and the number of specialist vape retailers is confounded. However, at the highest level there does seem to be some tendency. Van der Sanden et al. worry about this, concluding that:

The concentration of SVRs in high-deprivation suburbs in Auckland may warrant further regulatory responses that better balance the needs of predominately adults to access vaping products as a means to stop smoking with limiting vape products to young people who have never smoked...

However, Van der Sanden et al. don't really explore why specialist vape retailers may locate in areas of high deprivation. I've done quite a bit of exploration and thinking on this in relation to off-licence alcohol outlets, and I suspect that the reasons might be similar. And it doesn't require retailers to be 'targeting' high deprivation communities in some predatory business strategy. I have a few hypothesised reasons for more specialist vape retailers in more socially deprived areas can be explained with some simple economics.

First, if a prospective retailer is looking to run a retail store that maximises profits, one of the aspects that they must consider is the costs of operating the business. Ceteris paribus (all else held constant), a store with lower costs will be more profitable. Areas of high deprivation tend to have lower commercial rents, and are therefore less costly to operate, and will generate higher profits from the same revenue.

The second hypothesis is a little more complex, and involves a bit of economic geography. Each store may have a particular 'catchment area', which is the area from which its customers come to the store. In a low deprivation area, where everyone owns a car, and often commutes a fair distance for work, the catchment area for a store might be quite large. So, stores that are located close together will be in direct competition for consumers, since their 'catchment areas' will substantially overlap. In contrast, in a high deprivation area, fewer people might own cars, or they may not run reliably, or they may only be able to afford to drive them to and from work without long side-quests to buy vapes. So, the 'catchment area' for a store will be much smaller, and stores can be located closer together without being in direct competition for consumers. And so, we might expect to see more vape stores in areas of high deprivation than in areas of low deprivation, because the retailers are trying to minimise competition with other stores (although they may then need to balance a smaller catchment, which has less spending power, against the costs of operating the store).

Finally, the differences may reflect differences in demand. If vaping rates are higher in more socially deprived areas, then demand for vaping products may also be higher in those areas, and attract more vape retailers. I don't really know whether there is a social gradient in vaping, although the New Zealand Health Survey suggests that there is, with more vaping among people living in areas in the most socially deprived quintile. Of course, there is a potential reverse causation problem with the demand-side explanation, because more specialist vape retailers located in socially deprived areas might drive more vaping in those areas.

None of that is to say that having more specialist vape retailers in more socially deprived areas is a desirable outcome (especially if they do indeed drive more vaping). Van der Sanden's proposed policy response may be appropriate. However, the situation we observe could be explained by some simple economics. So if policymakers want to reduce retail availability of vaping products, they can focus on practical levers (licensing, zoning, proximity rules) without relying on arguments about predatory business practices, or vilifying store owners (both of which I have seen in the case of alcohol retailers).