Sex, Drugs and Economics: March 2026

Monday, 30 March 2026

The tone and expression of academics on X (or Twitter)

In my previous post, I highlighted the apparent contribution of X (formerly Twitter) to toxicity on the Economics Job Market Rumors (EJMR) website. A natural follow-up question is whether and to what extent academics on X contribute to the toxicity on that platform and, by extension, to other forums such as EJMR. This recent article by Prashant Garg (Imperial College London) and Thiemo Fetzer (University of Warwick), published in the journal Nature Human Behaviour (open access), goes some way towards providing an answer.

Garg and Fetzer constructed a dataset of nearly 100,000 academics, including all of their Twitter [*] activity from 2016 to 2022. They then use large language models (ChatGPT-3.5 and GPT-4) to characterise each tweet in relation to content and tone. They assess each academic's stance on climate change, economic policy, and cultural issues. In terms of tone, they measure egocentrism (how often the academic refers to themselves in the first person), toxicity (based on the probability a tweet is classified as toxic by Google's Perspective API), and the balance between reason and emotion (measured as a ratio of 'affective terms' to 'cognitive terms' based on the Linguistic Inquiry and Word Count tool). The analysis is then largely descriptive, but nonetheless interesting.

Garg and Fetzer first find that:

...leading academics are not typically social media influencers... We found weak correlations between citation counts and Twitter metrics: citations and likes... citations and followers... and citations and content creation...

Garg and Fetzer observe that:

The weak correlation underscores that many prominent public intellectuals online gain visibility through public engagement rather than scholarly achievements, often holding lower academic credentials while commanding significant public attention, thus widening the gap between social media influencers and established academic experts.

I think that Garg and Fetzer overstate the case here. The weak correlations suggest that Twitter includes a cross-section of academics (in terms of academic quality), rather than that the top academics eschew Twitter (which would instead lead to negative correlations between measures of academic quality and Twitter engagement).

I'll put aside their results on political expression, which I round rather uninteresting. In contrast, the results in terms of tone demonstrate some interesting correlations. First, in terms of egocentrism (using self-referential terms such as 'I', 'me', 'my', and 'myself'):

Female academics... exhibit higher egocentrism than male academics...

Egocentrism increases with university ranking: academics at top-100 institutions... exhibit higher egocentrism than those from institutions ranked 101-500... US-based academics... show higher egocentrism than non-US academics

Then, in terms of toxicity:

Academics with high reach but low academic credibility... exhibit lower toxicity than those with the contrasting profile, that is, ones with low reach but high credibility...

Academics at top-100 universities... exhibit higher toxicity than those at institutions ranked 101-500... Moreover, US-based academics... exhibit higher toxicity than non-US academics...

And in terms of emotionality (or reason):

Emotionality is significantly higher among female academics... than male academics... In terms of reach and credibility, high-reach/low-credibility scholars... show significantly higher emotionality than low-reach/high-credibility scholars...

Finally, US-based academics... exhibit higher emotionality than non-US scholars...

Many of those differences will surprise no one, such as US-based academics being more egocentric and toxic in their expression on Twitter. Other differences seem to confirm familiar stereotypes, such as female academics using more emotional language than male academics. No doubt, some of the differences relate to differences in norms across different disciplines in terms of communication styles (both on Twitter and in general academic discourse). Garg and Fetzer don't control those other factors that might affect tone and expression. And before we get carried away about how toxic academics are on Twitter, Garg and Fetzer provide an important comparison with the general population. From Figure 6 in the paper:

Notice that academics (the blue line) exhibit far less toxicity (in the graph in the top middle) than the general population of Twitter users (the red line). Moreover, the trend in toxicity is downwards (for academics over the whole period from 2016 to 2023, and for the general population from 2021 to 2023). So, academics are not the main problem in terms of toxicity in the discourse on Twitter.

Nevertheless, there are important differences across academics, and one difference in particular stands out. Academics with high reach (those that are very active on Twitter) but low academic credibility (they are not highly credible academics, as measured by citations) exhibit less toxic expression on Twitter than other academics, particularly those who have low reach but high academic credibility. In their conclusion, Garg and Fetzer focus on this as a problem because:

...those with the greatest public reach may not represent top scholars, potentially distorting public perceptions

However, I see the opposite problem. In terms of tone, the top scholars with the lowest reach have the most toxic expression. Are those the sorts of academics that we want to promote even further on social media? I would suggest not.

What is a better option? First, more highly credible academics should be encouraged to engage in the social media discourse. However, it is important to recognise that credibility alone is not enough. What is needed are credible academics who also model constructive discourse without the toxicity, raising the standard of debate. However, as noted in yesterday's post, many high-quality (especially female) scholars are targets of hostility on social media. These are not separate issues.

Alternatively, we could raise the standard of academic discourse on Twitter more generally, without changing who is represented on the platform. That would reduce the toxic nature of the interactions. Stop laughing! It could happen. The tone and expression of academics on X (or Twitter) matters. Academics can set the standards for everyone else. We don't need to descend into the toxic culture wars that play out each day on social media. We are better than that, and if we show ourselves to be such, maybe more people will listen.

[HT: Marginal Revolution, last year]

*****

[*] I refer to the platform mostly as Twitter, because it didn't change names to X until July 2023, after Garg and Fetzer's dataset ends.

Saturday, 28 March 2026

More on the toxic environment in Economics Job Market Rumors

The Economics Job Market Rumors (EJMR) website began as a forum for PhD students to discuss the economics job market, but it has long since become notorious for misogyny, racism, and other toxic behaviour (see this post, for example), due in large part to the anonymous nature of the platform. And even though the user community at EJMR has been called out for their behaviour, it doesn't seem to have gotten much better over time. This is documented by this 2025 article by Florian Ederer (Boston University), Paul Goldsmith-Pinkham, and Kyle Jensen (both Yale University), published in the journal AEA Papers and Proceedings (ungated earlier version here).

Ederer et al. analyse content from EJMR over the period from January 2012 to May 2023, documenting a number of changes. First:

...starting in 2018, EJMR saw an explosion in discussions initiated by references to Twitter posts. This shift mirrors Twitter’s growing importance as a real-time source of information and debate in academic and public policy circles.

Twitter (now X) essentially took over from YouTube as being the source of initial references on EJMR from about 2018, which is about the time of the earlier research on toxicity and misogyny on the platform. There were also surprising declines in Marginal Revolution and NBER links as the starting point for EJMR discussions. Given the predominance of Twitter as a source, Ederer et al. then look in more detail at which Twitter accounts were most referenced, reporting that:

These accounts can be broadly categorized into three main groups: economists, right-wing commentators, and journalists. The group of economists (e.g., Claudia_Sahm, jenniferdoleac, and JustinWolfers) includes academic and professional economists from diverse institutions whose tweets often serve as springboards for debates on research findings, policy implications, and professional conduct. The second group includes polarizing and predominantly conservative commentators and agitators (e.g., realChrisBrunet, RichardHanania, and libsoftiktok) and reflects EJMR’s right-wing slant and engagement with contentious political and social issues. The third group is a collection of news sources and journalistic accounts, many of which have a conservative slant (e.g., visegrad24, disclosetv, and nypost).

Finally, Ederer et al. characterise the posts linking to each Twitter account in terms of 'hate speech', 'negativity', 'misogyny', and 'toxicity' (based on measures from their companion paper here), finding that:

Among the 10 most frequently mentioned Twitter accounts, there are four economists, including three female economists. EJMR posts referencing two of these female economists (Claudia_Sahm and jenniferdoleac) have very high average z-scores of 1.974 and 2.598 for the Misogynistic classifier, indicating that EJMR posters discuss them in strongly misogynistic terms compared to all other Twitter accounts mentioned on EJMR... The only other large average z-score for the Misogynistic measure is for EJMR posts referencing elben (z-score Misogynistic = 0.956), an academic economist who has championed LGBTQ-inclusive policies in the economics profession.

In other words, since 2018 EJMR has remained a hostile and misogynistic platform, with its toxicity increasingly fed by same antagonism and culture-war discourse on Twitter/X. EJMR is not just an academic forum, but has become part of that broader hostile ecosystem.

Economists need places where they can share research in progress, ideas, and practical advice, especially early in their careers. In its early days, EJMR served that purpose. However, it has long since become a space that early career economists are better off avoiding.

[HT: Marginal Revolution, in January last year]

The toxic environment for women in econjobrumors.com

Friday, 27 March 2026

This week in research #119

Here's what caught my eye in research over the past week (another very quiet week, it seems):

Clemens et al. analyse the effect of California's $20 fast food minimum wage, which was implemented in 2024, and find that food away from home prices increased by 3.3 to 3.6 percent in areas subject to the minimum wage relative to control areas (so firms passed on their cost increase to consumers)

Tuesday, 24 March 2026

Evidence that artificial intelligence is increasing the impact, but narrowing the scope, of research

There is growing evidence of positive impacts of generative artificial intelligence on productivity. This includes productivity in research (see this post, for example), including my own. However, some have questioned whether increasing research productivity comes at a cost of narrowing the scope of research.

So, I was interested to read this article by Qianyue Hao (Tsinghua University) and co-authors, published in the prestigious journal Nature (ungated earlier version here) late last year. They look at the impact of AI tools (not limited to generative AI) on the productivity of researchers and the quality of research. Specifically, they look at authors publishing in six representative fields: biology, medicine, chemistry, physics, materials science, and geology, across three 'eras': (1) the 'machine learning era ' (from 1980 to 2014), the 'deep learning era' (from 2015 to 2022), and the 'generative AI era' (from 2023 onwards). Hao et al. compare authors who publish 'AI augmented papers' with those who do not. An 'AI augmented paper' is one that uses methods such as:

...support vector machines and principal component analysis from the machine learning era, and convolutional neural networks and generative adversarial networks from the deep learning era. Large language models, which have emerged in recent years, also rank among the most frequently used methods...

Using a dataset that includes over 27 million papers with complete records that were published between 1980 and 2025, of which about 310,000 were 'AI augmented', Hao et al. find that:

...annual citations to AI papers are 98.70% higher than those to non-AI papers on average...

So, AI augmented research gathers more citations, which suggests that authors using AI in their research achieve greater impact. This is reinforced by evidence that AI augmented papers are published in higher quality journals (with Q1 journals being the highest ranked). Hao et al. report that:

...the proportion of AI papers in Q1 journals is 18.60% higher than that of non-AI papers in all journals; in Q2 journals, the AI proportion is 1.59% higher; whereas Q3 and Q4 journals hold a relatively lower proportion of papers with AI... These results indicate a heterogeneous distribution of AI-augmented papers across journals, with a higher prevalence in high-impact journals.

And AI appears to make authors more productive, as:

On average, researchers adopting AI annually publish 3.02 times more papers... and garner 4.84 times more citations... than those not adopting AI, with consistency.

All of these results seem to hold across all of the disciplines that Hao et al. consider. However, it is not all good news. Hao et al. use machine learning to create a measure of the 'breadth of scholarly attention'. Using that measure, they find that:

Compared with conventional research, AI research is associated with a 4.63% contracted median collective knowledge extent across science, which is consistent across all six disciplines... Moreover, when dividing these disciplines into more than two hundred sub-fields, the contraction of knowledge extent can be observed in more than 70% of them...

Of course, some of the differences here may be due to selection, as the types of researchers, and the types of research, involving AI use may be meaningfully different from those that don't. However, putting the selection issues aside, Hao et al. note that there is a tension between the individual researcher's incentive to produce a greater quantity of research that has higher impact, which would suggest greater use of AI, and the social incentive to produce a greater breadth of research.

So, the takeaway from this paper is that we need to consider researcher incentives, not just productivity. Specifically, this research suggests that the use of AI in research is leading to a 'prisoners' dilemma' outcome: each individual researcher acting in their own best interests (and using AI in their research) leads to an outcome that is worse for society overall (less breadth of research and more incremental gains).

Hao et al. conclude that:

The substantial academic benefits of AI use may be a driving force behind its accelerated rate of adoption; however, we also find unintended consequences from the increased prevalence of AI-augmented research. In all fields, AI-augmented research focuses on a narrower scope of scientific topics and reduces the scientific engagement of follow-on research, leading to more overlapping research work that slows the expansion of knowledge. Further, with a greater concentration of collective attention to the same AI papers, the adoption of AI seems to induce authors to converge on the same solutions to known problems rather than create new ones.

So, what is the solution here? Society probably wants research to be higher quality and have a broad scope. But individual researchers' incentives to use AI in their research appears inconsistent with that outcome. The traditional prisoners' dilemma is a repeated game (see here or here, for example), and the players of that game can avoid the worst outcome by cooperating. In this case, the researchers could cooperate by agreeing not to use AI in their research. The problem is that every researcher has an incentive to cheat on that agreement, since if they use AI, then that will be good for their career. This prisoners' dilemma is more difficult to ensure cooperation in than the traditional game, because there are not just two players who need to cooperate, but thousands (or millions). Ensuring cooperation in a prisoners' dilemma game with many players, each of whom is far better off cheating than cooperating, is almost impossible (which is why solving the problem of climate change is so difficult).

My own view is that the answer is not to keep AI out of research. That is not realistic, in the same way that it's not realistic to expect students not to use generative AI. The incentives need to be redesigned, but this will be no easy task. As long as universities, research funders, and publishers reward researchers for quantity, citations, and publication in top-ranked outlets, then we should expect more AI-augmented work, with a narrower scope than society might prefer. If we want AI to expand knowledge rather than simply accelerate competition within narrow foci, then we need institutions that also reward novelty, breadth, and the discovery of new questions. That is the economic challenge we must face up to.

[HT: Marginal Revolution]

Monday, 23 March 2026

The relationship between obesity of politicians and corruption is correlation, not causation

Not every correlation between two variables represents a causal relationship. Even if we can tell a compelling story about why a change in one variable might cause a change in another, that doesn't make the relationship causal. Sometimes a correlation actually results from something other than the story you tell. Sometimes the correlation is just random noise (a spurious correlation). So, we should be cautious when interpreting correlations.

I was reminded of this when reading this 2021 article by Pavlo Blavatskyy (University of Montpellier), published in the journal Economics of Transition and Institutional Change (sorry, I don't see an ungated version online). The article even generated a small debate, with a comment by György Márk Kis, and then a reply by Blavatskyy, appearing in the same issue of the journal.

In the original article, Blavatskyy looks at the relationship between the body mass index (BMI) of politicians in a country and the Corruption Perceptions Index by Transparency International. The data Blavatskyy uses is for 2017, and the sample of countries is limited to 15 post-Soviet countries (Armenia, Azerbaijan, Belarus, Estonia, Georgia, Kazakhstan, Kyrgyzstan, Latvia, Lithuania, Moldova, Russia, Tajikistan, Turkmenistan, Ukraine, and Uzbekistan). The argument for why this correlation matters is explained in Blavatskyy's reply to Kis:

One common form of corruption/lobbying is inviting governmental officials to lavish banquets with excessive consumption of food and drinks... Corrupt politicians frequenting such banquets might risk gaining extra weight. This ‘hedonic theory of corruption’ postulates the existence of a positive relationship between median body mass index of public officials and the level of grand political corruption in society.

So, Blavatskyy is able to tell a good story for why greater corruption would cause higher BMI among politicians. However, that doesn't mean that the relationship is causal. Even though the correlation between perceived corruption and median politician BMI is clear, from Figure 1 in the original paper:

Low numbers in the Corruption Perceptions Index represent higher levels of perceived corruption. So, this figure shows that countries where the politicians have higher median politician BMI have higher levels of perceived corruption.

Kis took issue with a number of things in the paper. First, why those 15 countries? Why not all countries? Kis shows that if you separate the 15 countries in Blavatskyy's sample by their geographic location, you get different relationships within each subsample. However, the broader question is not what happens when you look at subsamples, but does this relationship hold if you add more countries to the sample? Neither Blavatskyy nor Kis answer that question. We should also wonder whether there is something special about 2017 that leads to this correlation. Does it hold in other years?

In his reply, Blavatskyy doesn't really address those two points (narrow sample, and a single year) in a convincing way. Instead, he narrows the sample even further to look at changes in politician BMI and perceived corruption for just one of the countries in his sample, Ukraine. In that analysis, he again shows a correlation between corruption perceptions and politician BMI, in this case over time for Ukraine. However, that simply raises the question of: why Ukraine? Why didn't he look at other countries in his sample in that way? And just because Ukraine shows a correlation over time, that still doesn't demonstrate a causal relationship.

Kis also takes issue with the machine learning algorithm that Blavatskyy uses to estimate the BMI for politicians in his sample. Kis notes that the accuracy of the algorithm is quite dubious (my words, not Kis's), with:

...errors of at least 5.5 in 21.1% of the time.

That's an error in the estimated BMI of 5.5 in over 20 percent of cases. That extent of measurement error would be problematic. To that, I would add that it is unclear whether the training sample that the machine learning algorithm was trained on included people from post-Soviet countries. The relationship between facial features and BMI could well be ethnic-specific, in ways that systematically bias the results. We have no way of knowing. And Blavatskyy didn't address this point in his reply.

Now, the point of this post is to focus on correlation or causation. From what I have seen, this seems a likely candidate for confounding. There are any number of variables that might increase politician BMI and increase corruption, without corruption being a cause of higher politician BMI. As one example, a country with high inequality might simultaneously have high corruption (with petty officials willing to take bribes to supplement their low incomes) and high politician BMI (since politicians would likely be among the wealthy class in society). Blavatskyy doesn't consider confounding variables such as inequality, or differences in age distribution, or differences in average BMI in the population, or regional differences in diet, in his analysis.

Now, to be fair to Blavatskyy, he doesn't adopt a causal interpretation of his results (except in his response to Kis, as I quoted above). Instead, Blavatskyy argues that, if BMI and perceived corruption are correlated, then we might infer how much corruption is being experienced in a country by looking at the median BMI of its politicians. However, even that inference is problematic, and Blavatskyy should know why. He gives the example of Swiss watches in China as a proxy for corruption, but then notes that:

...the rise of social media and Internet anti-corruption platforms in 2011–2012 made it no longer possible to measure grand political corruption through visible luxury Swiss watches. Luxury Swiss watches could still be a popular expenditure of corrupt governmental officials, but these officials are now more careful not to reveal their Swiss watches to the general public.

When politicians realised that their Swiss watches were giving away their corruption, they stopped showing off their Swiss watches. If politicians realised that their expanding waistlines were giving away their corruption, wouldn't they invest more in personal trainers (or liposuction)? As soon as this correlation was used for inference, the correlation would likely start to break down. This again illustrates the limited usefulness of such proxies.

Correlation does not imply causation. And sometimes, correlation today does not imply correlation in the future. We need to be much more cautious when considering analyses like this one.

Sunday, 22 March 2026

The impact of Taylor Swift on the Kansas City Chiefs' TV ratings

In 2023, Taylor Swift began a relationship with Kansas City Chiefs tight end Travis Kelce. After that, Kansas City Chiefs broadcasts seemed increasingly eager to cut to shots of Taylor Swift in the corporate boxes, rather than fans in the stands. The NFL was clearly trying to appeal to Swift's fans, but did it work? In a new article published in the Journal of Sports Economics (sorry, I don't see an ungated version online), Kerianne Rubenstein (Syracuse University) and Frank Stephenson (Berry College) show that it did.

Rubenstein and Stephenson collated data on 247 NFL games played in the 2022 and 2023 seasons, noting that the first Chiefs game that Taylor Swift attended was in the third week of the 2023 season. They apply a difference-in-differences analysis, comparing the difference in TV ratings between before and after Week 3 of 2023 for the Chiefs, with before and after Week 3 of 2023 for other teams, while controlling for other variables expected to affect TV ratings. In other words, Rubenstein and Stephenson check whether the Chiefs' TV ratings increased by more than the average before-and-after change that other teams experienced. They find that:

...Chiefs’ games after Taylor Swift started attending see an increase of 2.15 ratings points, which is an approximately 32% increase relative to the mean Nielsen rating... total viewership increased by about 4.8 million after Taylor Swift started attending Chiefs’ games.

So, it appears that Taylor Swift did increase TV ratings for the Kansas City Chiefs. Good news for the Chiefs (and for other NFL teams, who share in the broadcast revenue). Interestingly, and to be expected given Swift's young fan base, the effect was even larger on TV viewership among those aged 18-34 years, with a 40.1 percent increase in TV rating.

An important question, though, is whether Swift attracted new fans, and whether they stuck around. In terms of the former, Rubenstein and Stephenson find some evidence that games played at the same time as Chiefs games suffered a decrease in TV ratings (although that analysis is based on a sample of only ten games, which limits how much we can take from it). However, they also find an increase in TV ratings when the Chiefs game was the only game in its timeslot. So, while there was some substitution between NFL games, new fans were also attracted to watch. And, they did stick around - Rubenstein and Stephenson find limited evidence that the effect declined over time, with Chiefs games later in 2023 having a similar TV rating as those earlier in the season (it is worth noting that the Chiefs had a particularly good 2023 season though, finishing the regular season 11-6, winning their division, and ultimately winning Super Bowl LVIII).

Celebrities are a common feature of sports games. Rubenstein and Stephenson note the example of the Atlanta Hawks, who make courtside seats available to celebrities with large social media followings in the hopes of increasing game attendance and TV ratings. Not every celebrity has the profile of Taylor Swift. However, the results in this study suggest that the Hawks' strategy might be a sensible strategy for increasing the profile of games. The NFL should take notice. Certainly, this would make much more sense than, as some conspiracy theorists would have you believe, biasing the officiating in favour of particular teams (like the Chiefs). So, leaving conspiracies aside, what we learn from this paper is that celebrity appearances at games can increase demand. That seems to be exactly what happened here, with Taylor Swift’s presence helping to increase the audience for Kansas City Chiefs games.

Friday, 20 March 2026

This week in research #118

Here's what caught my eye in research over the past week (a quiet week, following last week's bumper edition):

Rubenstein and Stephenson assess the effect of Taylor Swift’s relationship with Travis Kelce on the Kansas City Chiefs’ television audience, and find that viewership increases by about one-third beginning with Swift’s first time attending a Chiefs’ game
Bussoli and Fattobene (open access) find that Financial Graph Literacy is lower among older adults, those with less education, and lower-income groups, and is significantly associated with a greater likelihood of engaging in proactive financial behaviours such as saving, investing, budgeting, and using digital financial tools

Wednesday, 18 March 2026

How the 'travelling Pope' affected international trade

Pope John Paul II was known as 'the travelling Pope' because of the large number of international trips ('pastoral visits') he undertook (more than 100 during his reign from 1979 to 2004). He also had a huge following, as you might expect as the leader of the Catholic Church, but the advent of television meant that the public could follow his travels in a much closer way than ever before. And, through his pastoral visits and his following, he exposed Catholics the world over to new places they would otherwise not have seen or, in some cases, even heard of. What effects did that exposure have?

That is essentially the question addressed in this recent article by Alexander Popov (European Central Bank), published in the Economic Journal (ungated earlier version here). Popov focuses on the impact of the Pope's visits on exports from the visited country, and especially exports to Catholic countries. He employs an event study design - looking at how exports changed between the time before and the time after the Pope's first visit to a country, while controlling for GDP growth, population, the US dollar real exchange rate, and the extent of trade liberalisation and democracy. The key results are summarised in Figure 2(a) from the paper:

The figure shows how exports evolve before and after the Pope's visit. Beforehand, there isn't much evidence of a trend (notice that the red line hovers around zero). However, after the Pope's visit, exports increase (the red line is clearly above zero and trending upwards), and the effect is substantial. Popov notes that:

...the point estimate on Year 3 after the pope’s visit to a country is 0.1152, which implies that exports to the rest of the world are higher by 12.2%, relative to the year of the visit.

And the effects are even larger for exports to countries with larger Catholic populations. Specifically:

...exports to a trading partner with 54.3% (75th percentile), relative to a trading partner with 1.1% (25th percentile) Catholics in the population were higher by between 16.5% and 36.9% during years 1 to 5 after a visit by the pope.

Clearly, Catholics were paying attention to where the Pope was visiting. Popov then asks the obvious question: what explains this effect? He examines three hypotheses:

The first one is that during a foreign visit, the pope explicitly encourages Catholics around the world to engage with the host country on economic terms. I analyse 633 speeches given during the pope’s 130 first visits and I find rare occasions when he mentions words like ‘trade’, ‘economic’ or ‘globalisation’.

So, the Pope wasn't explicitly telling Catholics to buy more goods from the countries he was visiting. Then:

The second hypothesis is that, by simply visiting a country, the pope raises its profile, or ‘puts it on the map’ for the global Catholic family, especially if Catholics around the world are for cultural or economic reasons less connected with the visited country. I find that the effect on exports of a pastoral visit to a country is stronger if this country is relatively poor and if it has relatively fewer Catholics and relatively weaker bilateral trade links with the partner country. The third hypothesis is that Catholics around the world are simply buying souvenirs to commemorate the pope’s visit. I analyse data on bilateral trade at the product level, for ten different sectors, and I find that after a pastoral visit, the increase in exports I detect takes place in half of them.

So, the third hypothesis (souvenirs) doesn't have much support. Popov concludes that the second hypothesis shows the likely driver of the increase in exports. This evidence is consistent with the Pope raising the profile of the countries he visited, and those countries benefiting from their higher profile among Catholics in the form of higher exports, especially to Catholic countries.

What makes this paper interesting in an economic sense is that it suggests trade flows don't just depend on prices, trade policy, and distance. They also depend on visibility, familiarity, and the ways that cultural influence can affect economic outcomes. Pope John Paul II's visits appear to have increased visibility and familiarity, which may in turn have boosted trade. The 'travelling Pope' may have also been the 'trade-promoting Pope'.

Tuesday, 17 March 2026

Seven decades of change in the demographics and research styles of top economics research

Back in 2013, Daniel Hamermesh (University of Texas at Austin) published this article in the Journal of Economic Literature (ungated earlier version here), which summarised changes in the demographics and research styles of top economics research, based on articles published between 1963 and 2011 in three top journals: the American Economic Review (AER), the Journal of Political Economy (JPE), and the Quarterly Journal of Economics (QJE). A new update last year (open access) from Hamermesh extends the analysis to include articles up to 2024.

In terms of demographics, the trends show a continuation and in terms of gender, Hamermesh notes that:

The progression that occurred from the 1960s and 1970s, when only a minute fraction of authors were women, to the early twenty-first century has, if anything, accelerated.

This will be welcome news, given the persistent gender gap in economics (see this post and the links at the end of it). It likely reflects the changing demographics of young economists, with a growing proportion of the young 'stars' in economics being women (and noting that it is young stars who often get published in the top journals that Hamermesh is considering).

In terms of the age structure of authors, Hamermesh reports that:

The changes from 2011 to 2024 continued those that started in the 1980s, but the rate of change has not accelerated. Indeed, most noticeable from 2011 to 2024 was a continuing sharp and statistically significant drop in the representation of the youngest group (and a nearly equal sharp rise among those 36–50)...

...the average age of authorship has increased steadily since 1973.

Can I change my comment above about the young stars in economics? The increasing median age of authors in top journals seems to be a general trend across academia. Hamermesh then turns to research 'style', documenting a continued dramatic rise in the proportion of articles in those journals that are co-authored:

There were no four-authored papers as recently as 1983; today they account for 17 percent of articles. There were no papers with more than four authors in 2003; today nearly 12 percent of articles have five or more authors (with five articles written by six authors each and one by seven authors). Obversely, sole-authored papers are now quite scarce; and even two-authored papers today only account for slightly more than one-fourth of all articles (compared to a majority as recently as 2003).

Unsurprisingly, the increase in the number of co-authored articles means that the age diversity of author collaborations has increased over time as well. In terms of the types of research, he reports that:

The big changes are the continuing rise in empirical work based on original non-laboratory data and the rapid and even accelerating increase in experimental work. Today these two methods, which both involve collecting original data, account for over half of all published papers, compared to less than 4 percent four decades ago...

These trends are not all unrelated, of course. Experimental research, and the increasing use of large datasets, typically both require larger research teams. They also often require more detailed methods, which may involve both larger teams, and more experienced researchers. Larger teams might be more likely to include female team members. And larger teams often need someone to lead and coordinate all of the team members, and those leaders tend to be more experienced (and older) academics. So, it would not surprise me, if more detailed analysis was conducted, to see that the trends are interconnected.

Now, the interesting thing will be what happens going forward, given the increasing use of generative AI in research (see here, for example). Since generative AI can now do a lot of the work that research assistants and early career researchers previously did, will the trend towards larger research teams be reversed? How will that interact with the gender gap in research (given that the age of female economists skews younger at the moment). And how will it affect the age distribution of researchers (given that men, and younger people, are somewhat more likely to use generative AI). I'll be looking forward to Hamermesh's next update. Hopefully, we don't have to wait another 12 years.

[HT: Marginal Revolution, last year]

Monday, 16 March 2026

Changing their minds could be a good thing for economists

People don't like to change their minds. This may partly be an expression of loss aversion - we really want to avoid losses, including the loss of an idea that we previously thought was true. This leads to status quo bias - we prefer not to change things, and keep them the same, because changing things entails a loss. But what if changing our minds could make us better off? Would we be so reluctant to do so?

This 2025 paper by Matt Knepper (University of Georgia) and Brian Wheaton (UCLA) suggests that economists, at least, should not be afraid to change their minds, because doing so increases the number of citations to their research. Knepper and Wheaton investigate authors who undergo an 'ideological reversal' - previously publishing research that could be considered right-wing, before switching and publishing a paper that draws a left-wing-consistent conclusion, or the reverse (switching from left-wing to right-wing). Their main data source is every economics paper ever published in the top 100 economics journals indexed in Web of Science - some 200,000 articles. They also have a narrower dataset of papers referenced in meta-analyses on policy topics, including:

...the minimum wage, the economics of unions, the taxable income elasticity, the fiscal multiplier, intergenerational transfers, trade and productivity, trade and domestic employment, crowd-out, the gender wage gap, unemployment insurance, disability benefits, universal preschool, childcare and employment, immigration and wages, and more.

Knepper and Wheaton use this narrower dataset to train a machine learning model to categorise the rest of the papers in the dataset, as to how left-wing (or right-wing) the conclusions are. For instance, a paper that concludes that the minimum wage reduces employment is more right-wing, whereas one that concludes that there is no disemployment effect of the minimum wage is more left-wing. Knepper and Wheaton define an author as left-wing if they published more left-wing papers than right-wing ones over the previous five years, and the reverse for right-wing authors. They then use the larger dataset to investigate what happens to each economist who undergoes an 'ideological reversal'. They first outline some descriptive facts based on their dataset, including:

Fact #1: The typical author mostly publishes results on one side of the political spectrum.

Fact #2: Ideological reversals are not rare; they occur at least once for 40% of authors.

Fact #3: Ideological reversals become much more common later in an author’s career, with authors essentially never undergoing a reversal in the first decade of their career.

Fact #4: Most ideological reversals do not represent a permanent defection to the other side of the political spectrum, but rather the beginning of repeatedly publishing results on both sides of the spectrum.

Fact #5: Ideological reversals occur much more frequently amongst authors who are (initially) classified as right-wing.

That does seem like a surprisingly high proportion of economists who undergo at least one ideological reversal. However, perhaps we should take comfort in that - if the results point in a particular direction, our conclusions should say that, even if that conclusion is inconsistent with our previous conclusions on the same topic.

Do these ideological reversals matter though? Knepper and Wheaton employ a difference-in-differences analysis, comparing the difference in citations (and other metrics) between authors who did, and did not, undergo an ideological reversal, between the time before, and after, the reversal occurred. In other words, they look at whether citation counts rise more for economists who have an ideological reversal than for otherwise similar economists who do not. The results are striking, with:

...a sharp clear increase in citation count following an ideological reversal with essentially no evidence of pre-trends... The citation boost accumulates to approximately 9 over a one-decade period and 30 over a two-decade period.

The results remain consistent when Knepper and Wheaton limit the analysis to papers published before the ideological reversal, and when they limit the analysis to papers in the meta-analysis only (showing that the machine learning approach doesn't drive the results). Knepper and Wheaton also find evidence consistent with no change in the quality of papers before and after the ideological reversal, and that:

Both left-to-right and right-to-left reversals are rewarded by increased citations of roughly the same magnitude. The boost in citations received subsequent to a left-to-right reversal is mostly driven by citations from right-wing authors, and the boost in citations received subsequent to a right-to-left reversal is mostly driven by citations from left-wing authors. Encouragingly, however, the new right-wing (left-wing) audience garnered by a left-to-right (right-to-left) reversal... also engages with and cites the author's previous left-wing (right-wing) papers. This dynamic suggests that ideological reversals help prevent the formation of echo chambers in economics academia and expose authors to opposite ideological findings.

This last result is particularly important, and I believe it allows us to conclude that economists need not fear ideological reversals. In doing so, they can attract a new audience from the other side of the ideological spectrum, bringing the two sides closer together. Hopefully through that, we end up with higher-quality research overall.

[HT: Marginal Revolution, last year]

Saturday, 14 March 2026

Artificial intelligence and the 'age of leisure'

My ECONS101 class covered constrained optimisation last week, and one of the models we looked at was the labour-leisure trade-off for workers. Now artificial intelligence, and in particular generative AI, is likely to have large impacts on the labour-leisure trade-off. As the Financial Times reported last year (paywalled):

The idea that technological progress can enable people to work fewer hours is not outlandish...

But in order to believe a similar trend is going to take hold again, you have to assume three things. First: that AI will deliver a substantial boost to economic productivity...

Second, you have to assume the economic gains will be widely distributed...

Third, you have to believe workers will “cash in” those proceeds in the form of extra leisure, rather than higher income. But will they? In many developed countries, there has been a slowdown in the reduction in working hours in recent decades...

Far from trading income for leisure, it is the people with the highest salaries who tend to work the longest hours.

Will workers trade off higher productivity for more leisure time? Are we about to enter an 'age of leisure'? The constrained optimisation model for the worker (see also this post) can help us clarify the possibilities. In this model, we'll assume that AI increases productivity, and that the increase in productivity is represented by higher wages for workers. [*] The model will then tell us whether workers might respond by consuming more, or less, leisure.

Our model of the worker's decision is outlined in the diagram below. The worker's decision is constrained by the amount of discretionary time available to them. Let's call this their time endowment, E. If they spent every hour of discretionary time on leisure, they would have E hours of leisure, but zero income. That is one end point of the worker's budget constraint, on the x-axis. The x-axis measures leisure time from left to right, but that means that it also measures work time (from right to left, because each one hour less leisure means one hour more of work). The difference between E and the number of leisure hours is the number of work hours. Next, if the worker spent every hour working, they would have zero leisure, but would have an income equal to W0*E (the wage, W0, multiplied by the whole time endowment, E). That is the other end point of the worker's budget constraint, on the y-axis. The worker's budget constraint joins up those two points, and has a slope that is equal to the wage (more correctly, it is equal to -W0, and it is negative because the budget constraint is downward sloping). The slope of the budget constraint represents the opportunity cost of leisure. Every hour the worker spends on leisure, they give up the wage of W0. Now, we represent the worker's preferences over leisure and consumption by indifference curves. The worker is trying to maximise their utility, which means that they are trying to get to the highest possible indifference curve that they can, while remaining within their budget constraint. The highest indifference curve they can reach on our diagram is I0. The worker's optimum is the bundle of leisure and consumption where their highest indifference curve meets the budget constraint. This is the bundle A, which contains leisure of L0 (and work hours equal to [E-L0]), and consumption of C0.

Now, let's say that the situation shown above is the situation before the advent of AI. After AI is introduced, productivity increases, and so wages increase (from W0 to W1). This causes the budget constraint to pivot outwards and become steeper (since the slope of the budget constraint is equal to the wage, the slope has increased from -W0 to -W1). The worker can now reach a higher indifference curve, and it is the position of that higher indifference curve that determines the worker's response in terms of whether they consume more leisure or not. If they move to the higher indifference curve I1, then the worker's new optimum is the bundle of leisure and consumption B, which contains leisure of L1 (and work hours equal to [E-L1]), and consumption of C1. For this worker (whose response is shown in red on the diagram), leisure hours decrease as a result of the higher wage. On the other hand, if they move to the higher indifference curve I2, then the worker's new optimum is the bundle of leisure and consumption C, which contains leisure of L2 (and work hours equal to [E-L2]), and consumption of C2. For this worker (whose response is shown in blue on the diagram), leisure hours increase as a result of the higher wage. [**]

Either of these possibilities could happen. In fact, both could happen, with some workers increasing leisure time and others decreasing leisure time. By itself, this model doesn't answer the question of what will happen, but shows that both increased leisure and decreased leisure are possible outcomes.

The key difference here comes down to the size of the income effect of the increase in wages. When wages increase, the opportunity cost of leisure increases. That makes leisure relatively more expensive, and workers should respond by consuming less leisure. That is what we call the substitution effect - workers substitute away from leisure as it becomes more expensive. However, increased wages also lead to an income effect. Leisure is a normal good, which means that as the worker's income increases, they would like to consume more leisure. Notice that the substitution effect and the income effect are working in opposite directions here. For workers who overall decrease their leisure, the substitution effect (which says they should consume less leisure) must be bigger than the income effect (which says they should consume more leisure). For workers who overall increase their leisure, the reverse is true - the substitution effect must be smaller than the income effect.

AI may lead us into an age of leisure. But only if productivity gains lead to higher wages, and the income effect of higher wages more than offsets the substitution effect.

*****

[*] The assumption that productivity gains will lead to higher wages is a strong assumption. Indeed, the FT article questions whether this assumption is valid. If productivity gains don't lead to higher wages, then this model doesn't help us evaluate whether we're about to move into an 'age of leisure', and the impacts might be more macroeconomic than microeconomic. That is, we may end up with leisure, but arising through weaker labour demand, reduced hours, or unemployment rather than through workers voluntarily choosing more leisure as wages increase.

[**] Notice that the indifference curves I1 and I2 are crossing, and indifference curves cannot cross. However, those two indifference curves are for different workers, so there is no problem. I could easily have drawn two different diagrams, one for each worker, but I've kept them both on the same diagram for efficiency.

Friday, 13 March 2026

This week in research #117

Here's what caught my eye in research over the past week:

Zhang et al. find that Uber’s entry into a US city significantly reduces crime rates, with larger effects in areas facing greater liquidity constraints (less bank credit supply, fewer local job opportunities, higher personal bankruptcy risk, and greater household financial stress)
Sandorf and Navrud (open access) establish convergent validity between a contingent valuation survey and a discrete choice experiment (meaning that both measures are highly correlated), with the example they use being willingness-to-pay to reduce the spread of invasive crabs in Norway
Desierto and Koyama (with ungated earlier version here) explain the economics of medieval castles in Europe
Ordali and Rapallini (with ungated earlier version here) conduct a meta-analysis of the relationship between age and risk aversion, and confirm that there is a positive relationship in studies using survey data and lotteries
Singh and Mukherjee conduct a replication of an earlier study that established 'action bias' among goalkeepers facing a penalty kick, and find that jumping left or right rather than staying in the centre of the goal is not a sub-optimal action for goalkeepers in FIFA World Cup matches, and so the high frequency of jumping is not indicative of action bias (it is good to see a replication study published in a good journal)
Lindkvist et al. (open access) investigate attitudes toward research misconduct and questionable research practices among researchers and ethics reviewers across academic fields, and find that researchers and ethics reviewers in medicine, as well as more senior and female researchers and reviewers, took a more negative view of questionable research practices
Lei et al. use China’s Compulsory Schooling Law as a quasi-natural experiment to investigate the effect of education on HIV/AIDS, finding that mass education significantly enhances knowledge about HIV/AIDS, and that each additional year of exposure to the law reduces HIV/AIDS and mortality rates by 6.51 percent and 2.15 percent respectively
Daoud, Conlin, and Jerzak (open access) study the differential effects of World Bank and Chinese development projects in Africa between 2002 and 2013, using data across 9899 neighbourhoods in 36 African countries, and find that both donors raise wealth, with larger and more consistent gains for Chinese development projects
Stoelinga and Tähtinen (open access) find that conflict exposure, on average, increases support for democracy in African countries, but the effects vary by ethnicity and regime type, but interestingly, violence increases trust in ruling institutions in autocratic regimes
Ruiz et al. (with ungated earlier version here) find that, following the exodus of Cuban doctors from Brazil in 2018, the reduction in doctors was associated with persistent reductions in the care of chronic diseases, while service utilization for conditions requiring immediate care, such as maternal-related services and infections, quickly recovered
Geddes and Holz (open access) investigate the effect of rent control on domestic violence in San Francisco, and find that there was a nearly 10 percent decrease in assaults on women for the average ZIP code (some good news for advocates of rent control, but it hardly offsets the bad outcomes)
Clemens and Strain (with ungated earlier version here) add further to the literature on the disemployment effects of minimum wages, this time looking at the difference between large and small minimum wage changes, finding that relatively large minimum wage increases reduced usual hours worked per week among individuals with low levels of experience and education by just under one hour per week during the decade prior to the onset of the Covid-19 pandemic, while the effects of smaller minimum wage increases are economically and statistically indistinguishable from zero

Thursday, 12 March 2026

Anticipating higher future petrol prices, consumers actually push up petrol prices now

In his 1984 book The Evolution of Cooperation, Robert Axelrod suggested that people cooperate in repeated games because of 'the shadow of the future'. They alter their behaviour by cooperating now, because they anticipate that will lead to greater gains for them in the future. I really like this analogy of the shadow of the future affecting our decisions now, and not just in the context of game theory and repeated games. In fact, we've seen it play out in a different context this past week, as reported by the New Zealand Herald:

Kiwis are rushing to fill up their cars across the country amid fears of price increases at the pump because of escalating conflict in the Middle East.

Video sent to the Herald of Waitomo Tinakori petrol station in Wellington today showed a queue of cars waiting for fuel, with vehicles spilling out on to the road.

Waitomo Group CEO Simon Parham said there has been a similar increase in demand at stations across the country, with sales increasing by 10-15% this week.

“People are filling up and filling their cars ahead of the price increase that will flow through the market over the coming weeks because of the Iran conflict,” he said.

To see what is going on here, let's consider the retail market for petrol, as shown in the diagram below. Before the current conflict in the Middle East, the equilibrium price of petrol was P0, and Q0 petrol was traded per week. Then the conflict begins. Consumers anticipate that the price of petrol will increase in the future, so they decide to fill up their vehicles now. That increases the demand for petrol from D0 to D1. The equilibrium price of petrol increases to P1, and there is Q1 petrol traded in the week.

Notice that by trying to avoid the high petrol price in the future, the consumers cause the price to rise today, which is exactly the outcome they were trying to avoid! In effect, when consumers rush to fill up early, they bring some of the future price pressure forward into the present. Expectations about future prices can cause self-fulfilling prophecies like this, which is a point I will make in my ECONS101 class in several weeks, when we talk about financial markets (where self-fulfilling prophecies are a clear and present danger at all times). The shadow of the future matters - consumers' actions based on trying to avoid future price rises make those price rises happen now instead.

Tuesday, 10 March 2026

Consumers can't tell the difference in audio quality between high-end audio cables and a banana

Consumers are not very good judges of quality. They can't tell the difference between bottled water and tap water. They can't even tell the difference between pâté and dog food. And now, according to this article by Futurism last month, they can't tell the difference between audio cables and a banana:

High-quality cables have long been marketed as a key way to get the most out of high-end equipment, such as expensive studio-grade monitor speaker cables and gold-plated HDMI cables for cutting-edge TVs.

In the high-end audiophile world, which is renowned for eye-bulging prices, cables can cost tens of thousands of dollars for ultra-pure copper with silver plating, specialized insulation, and dozens of individual conductors that manufacturers claim will squeeze the most out of a luxury-grade sound system aimed at the uber-wealthy.

The laws of physics, however, have long dictated that spending that kind of cash on cables simply isn’t worth it in the vast majority of circumstances — as long as you don’t go for the cheapest option from the dollar store, of course.

To put the decades-long debate to the ultimate test, a moderator who goes by Pano at the audiophile enthusiast forum diyAudio conducted an eyebrow-raising experiment back in 2024, which was rediscovered by Headphonesty late last month and Tom’s Hardware last week.

Pano ran high-quality audio through a number of different mediums, including pro audio copper wire, an unripe banana, old microphone cable soldered to pennies, and wet mud. He then challenged his fellow forum members to listen to the resulting clips, which were musical recordings from official CD releases run through the different “cables.”

The results confirmed what most hobbyist audiophiles had already suspected: it was practically impossible to tell the difference.

Consumers are not fully informed about the quality of the products that they buy. When they lack quality information before they buy, but that information is revealed after the consumer buys the good, we say that quality is an experience characteristic (and goods like that are called experience goods). A used car is an example of an experience good - the consumer doesn't really know if it is a high-quality car until they drive it. However, for some goods, the quality isn't revealed even after the good is purchased. In that case, quality is a credence characteristic (and goods like that are called credence goods). Health care is a credence good, because patients don't know for sure what would have happened to them without treatment, so it is impossible to judge the quality of the treatment.

Coming back to using an unripe banana as an audio cable, it appears that the quality of audio cable may also be a credence characteristic. At least, that's what this research tells us.

Why does this matter? The thing about credence goods is that the buyer may be reliant on the seller telling them about the quality. In the case of audio cables, the industry has a strong incentive to convince buyers that a 'high-quality' audio cable matters for sound quality, even if the consumer can't tell the difference. That changes the nature of competition in the industry. When buyers cannot verify quality for themselves, sellers can't compete on quality, and instead rely on reputation, branding, expert language, and the seller’s ability to sound convincing. They aren't going to want to sell banana cables, even if the banana cable would be produce audio of equivalent quality to a 'fancier' cable. Overall, this is a good reminder that in some markets, what consumers pay for is not better quality, but a more persuasive story about quality.

[HT: Marginal Revolution]

Can people tell the difference between bottled water and tap water?

Saturday, 7 March 2026

Perceptions of inequality and satisfaction with democracy

Last week, my ECONS101 class covered (among many other things) the faulty causation fallacy. This occurs when we observe two variables that appear to be related to each other (they are correlated), but a change in one of the variables does not actually cause a change in the other variable (there is no causal relationship). We might observe a relationship between two variables (call them A and B), and it might be because a change in A causes a change in B, in which case the relationship is causal. But even if we can tell a really good story explaining why we think a change in A causes a change in B, that in itself doesn't make it true. We might observe that relationship because a change in B causes a change in A (we call this reverse causation). Or, we might observe that relationship because a change in some other variable causes a change in both A and B (we call this confounding). Or, the two variables might be completely unrelated, and the observed relationship happens by chance (we call this spurious correlation).

To illustrate this, I'm going to use the example of the research in this 2024 discussion paper by Nicholas Biddle and Matthew Gray (both Australian National University). They also wrote a non-technical summary of their paper on The Conversation. Biddle and Gray look at the relationship between perceptions of income inequality and faith in democratic institutions. To be fair to them, they do say in the paper that "This does not, however, demonstrate a causal relationship from views on inequality to views on democracy". However, most of their interpretations and their policy recommendations assume that the relationship is causal. For example, they conclude that:

The fundamental issue identified in this paper is that the Australian population has identified the income distribution in Australia as being unfair, and that this appears to be impacting views on democracy.

First though, let's take a step back and look at the research. Biddle and Gray use data from Waves 5 and 6 (from 2018 and 2023 respectively) of the Asian Barometer Survey (with a sample size of over a thousand in each wave for Australia), as well as from the ANUPoll surveys, which is a quarterly survey of public opinion run by the Social Research Centre at ANU. For the ANUPoll, they use the January 2024 data, which includes data from over 4000 respondents.

First, from the Asian Barometer, Biddle and Gray find that there is substantial concern about inequality:

In both waves 5 and 6 of the survey, respondents were asked ‘How fair do you think income distribution is in Australia?’... more Australians think that the income distribution is unfair or very unfair (60.5 per cent) than think it is fair or very fair. This gap has widened slightly since 2018, particularly in terms of those who think the distribution is very unfair as opposed to just unfair.

Second, in the ANUPoll data, they find that:

Combined, 30.3 per cent of Australians were not at all or not very satisfied with democracy in January 2024 (compared to 34.2 per cent in October 2023). This is still well above the January 2023 levels of dissatisfaction (22.9 per cent) and even more so the March 2008 levels (18.6 per cent).

So over time, Australians' perceptions of inequality have gotten worse (they think the income distribution is less fair), and they are less satisfied with democracy. It is reasonable, then, to ask whether those concerns about inequality affect people's faith in democratic institutions. Biddle and Gray next look at that relationship, using to the ANUPoll data, and find that:

There is a very strong relationship between views on income inequality in Australia and views on democracy...

Their model (shown in Table 1 in the paper [*]) shows that the most negative views of the income distribution are associated with negative satisfaction with democracy, while the more positive views of the income distribution are associated with positive views of democracy.

So, there is a strong correlation between perceptions of inequality and satisfaction with democracy. But is that just a correlation, or is there a causal relationship? We can tell a good story here (and Biddle and Gray do that). People who are less satisfied with the income distribution may lay some blame on government, and therefore their satisfaction with democracy falls.

Before we conclude that this relationship is causal though, let me lay out some alternatives. First, perhaps people who are less satisfied with democracy become less satisfied in general with many aspects of society, including the income distribution. In this case, there could be reverse causality. Second, perhaps people who are less satisfied with life in general express less satisfaction with many aspects of life and society, and so they answer more negatively when asked about the satisfaction with democracy, and they answer more negatively when asked about their views of the income distribution. In this case, there would be confounding. Third, perhaps satisfaction with democracy is declining over time for some reason, and views about the income distribution are becoming more negative for some completely different reason. But they look like they are related because they are both trending downwards. In this case, there would be a spurious correlation between perceptions of inequality and satisfaction with democracy.

It isn't straightforward to see two variables that appear to be related, and assume that a change in one of those variables causes a change in the other variable. Economists and other researchers have developed a number of statistical tools and experimental methods to try and tease out when a correlation really is demonstrating a causal relationship. Biddle and Gray haven't done that. It might be that negative perceptions of inequality reduce satisfaction with democracy. By itself, this research doesn't allow us to conclude that.

[HT: The Conversation]

*****

[*] Table 1 in the paper actually has an error. The explanatory variable in the table is labelled as satisfaction with democracy, when that is actually the dependent variable. It is perceptions of inequality that is the explanatory variable.

Friday, 6 March 2026

This week in research #116

Here's what caught my eye in research over the past week:

Numa and Zahran (with ungated earlier version here) show that W.E.B. Du Bois made enduring contributions to economics (and may be one of the most under-rated economists of the early 20th Century)
Federle et al. (with ungated earlier version here) study 150 years of the economic cost of war, and find that a war of average intensity is associated with an output drop of close to 10 percent in the war-site economy, while consumer prices rise by approximately 20 percent
Passaro, Kojima, and Pakzad-Hurson (with ungated earlier version here) find that when there are more men than women in a labour market, 'equal pay for similar work' policies increase the gender wage gap
Strulik and Trimborn (open access) show analytically that higher world population could causally lead to a lower long-run temperature increase under optimal carbon taxation (though I think that the optimal carbon taxation might be doing a lot of the work there)
Arellano-Bover et al. (with ungated earlier version here) look into the initial job-matching of US graduates by major, and find significant variation in callback rate returns to majors, with Biology and Economics majors receiving the highest rate, particularly in occupations involving high intensity of analytical and interpersonal skills
Xu et al. develop a geographically weighted autoregressive model with an adaptive spatial weights matrix (a bit pointy-headed for many readers of this blog, but of interest to me!)
Li, Liu, and Si find in a meta-analysis that minimum wages actually increase female employment (showing that the question of the employment effects of minimum wages is still not solved)

Wednesday, 4 March 2026

This is not how generative AI should be used in research

I've been using ChatGPT Pro to help with drafting research papers this year, as I noted that I would do in this post from January. It has amped up my productivity a lot, allowing me to finish writing up two papers already, with a third on the way. These were papers where the analysis was already done, but it was the writing that was holding up the process. Having ChatGPT to help with the drafting seems to kickstart my writing, even though I have ended up extensively re-writing everything that ChatGPT produces. I find it a good disciplining tool as much as anything. Several colleagues have asked whether I am disclosing my generative AI use to journal editors when I submit. And I do. I have a standard 'generative AI use statement' that I include in my papers, that notes how it was used, and that I remain responsible for all of the content. You can see an example in this recent working paper.

However, not everyone is as careful with their generative AI use, or as transparent. Consider this example:

That is both infuriating and a sad indictment of the reviewing, editing, and publishing process, not least because, as on Reddit commenter noted, many authors see high-quality work rejected by journals, whereas a paper like this, with obvious flaws, has successfully been published. And it's not an isolated incident. This 2025 article by Artur Strzelecki (University of Economics in Katowice), published in the journal Learned Publishing (open access), catalogues over 1300 instances of likely unacknowledged and frankly stupid use of ChatGPT, up to September 2024.

Strzelecki's approach is to search for text strings that are almost certainly ChatGPT responses to a prompt asking it to generate text. The main example Strzelecki uses, which is in the title of the article, is "as of my last knowledge update". No human author is going to say that in a research paper. Similarly, "as an AI language model", "I don't have access to", and "certainly, here is" are highly indicative of ChatGPT use. There are circumstances where a human might use those phrases in a research paper, but it seems unlikely. Strzelecki screens out papers that mention ChatGPT, and manually checks each paper to ensure the text was not in some way legitimate, and that leaves 1362 articles.

How do these articles get published with this content intact? There are lots of stopping points where this could be caught and corrected (or prevented), but these articles have gotten through all of them. Strzelecki outlines the process. First, perhaps it is only one of the authors (and not all of them) that used ChatGPT. In which case, why didn't the other co-authors pick it up? Next, the paper is submitted to a journal, and often goes through a text review by the publisher. And then the editor or editors (including associate editors) looks at it, and decides whether it should be sent out for peer review. And then the peer reviewers (usually more than one, sometimes four or more) look at the paper in detail and provide comments. Then the editor receives the review reports and makes a decision. The paper may go through more than one round of review and editorial decision. And then, once accepted for publication, the article may be copy-edited. And at any of those stages, this text could be picked up. And yet, for over 1300 articles as of September 2024, the ChatGPT-generated text has not been picked up.

Strzelecki particularly focuses on 89 articles that have been published in journals indexed by Scopus or Web of Science, which should be the most credible journals. Of these:

...as many as 28 of them are in journals with Scopus percentile values of 90 and above. Two journals have a 99th percentile, indicating that they are the top journals in their field...

In total, 64 articles were found in journals considered to be in Q1, top quartile, recognized as the group of the best journals in their respective fields. Twenty-five articles are in the percentile range between 50 and 75, indicating that the journals in which these articles are found belong to Q2.

So, this phenomenon is not limited to low-ranked 'predatory' journals. In fact, looking at the list, there are several journals published by MDPI and Frontiers (for more on those publishers, see here). However, there are a whole lot published by Elsevier and Springer, publishers that we should expect much better of. Although, those are also publishers that publish a lot of journals, and a lot of articles, so perhaps that accounts for their higher numbers within the 89 articles that Strzelecki focuses on. Fortunately, I don't see any reputable journals in economics in the list, but I could be wrong.

Anyway, the takeaway is not so much that generative AI use is widespread in the write-up of research. It is that authors are using generative AI, not being transparent in their use of it, and that the quality control system by journals, even high-ranking journals, is terrible. Strzelecki makes a good point in the conclusion of his article that 89 out of over 2.5 million articles indexed in Scopus is only 0.000035% of the total indexed articles. However, this analysis is only picking up the really, really obvious cases. There will be far more use of generative AI that has not been adequately checked or acknowledged by authors, and not picked up in quality control.

I'm not against using generative AI in the write-up of research. Obviously, because I am doing the same thing. What needs to happen is that researchers need to be transparent and honest when they use generative AI, so that editors, reviewers, and the readers of research can see how it was used. That way, the users of research can evaluate for themselves whether they should believe, discount, or discard research depending on the ways and the extent of generative AI use. Without transparency, that important evaluation step is lost.

[HT: Artur Strzelecki]

Lessons from Joshua Gans on AI for economics research

Monday, 2 March 2026

You can make future population decline disappear just by changing the way you categorise people and fertility

Fertility has been on a long-term declining trajectory worldwide and, apart from the occasional blip, in every country. There seems to be no prospect of a reversal of this trend, and no prospect of fertility returning to the replacement level of approximately 2.1 births per woman. So, when you see a research paper claiming that "high-fertility, high-retention groups persist, gain share, and lead the total population to grow", you should sit up and take notice. That is, at least, until you've carefully thought about the paper in question.

That's what happened to me with this 2025 NBER Working Paper by Sebastian Galiani (University of Maryland, College Park) and Raul Sosa (Universidad de San Andres). They create and calibrate models of fertility based on two different subgroupings (by race, and by religion), and taking account of cultural transmission of fertility rates from mothers to daughters. They then use their calibrated models to simulate population change going forward for ten generations. What they find when the population is categorised by race is a decreasing population, as shown in Figure 1 Panel A from the paper:

And when Galiani and Sosa categorise the population by religion, they instead find an increasing population, as shown in Figure 2 Panel A from the paper:

Now, this struck me as really odd. We’re talking about the same country and the same underlying population. If you split that population into subgroups and take a weighted average of what happens in each subgroup, you should get back the outcome for the population as a whole. If you are measuring the same underlying thing consistently, changing the subgroups (race in one analysis, and religion in another) shouldn’t magically create or destroy population growth in the model. At most, it should change which groups are growing faster and therefore how the composition by group changes over time, with high-fertility groups making up a larger share of the population and lower-fertility groups making up a smaller share. But the headline result here is much stronger than that, with the direction of population growth in aggregate changing direction entirely depending on the groupings that are employed. Galiani and Sosa use those results to conclude that:

...whenever at least one group remains above replacement on the female line and transmits identity effectively, its share rises and turns the aggregate path upward.

The first part of that conclusion makes sense, but the second part stretches credibility. It made me wonder whether the results were being driven by unusual features of the model, or by different modelling choices in the two analyses.

So, I dug into the paper, which is not an easy task as it is quite theoretical. And there are consequential differences between the two analyses (by race and by religion) that drive the difference in results. First, they use different measures of fertility, with the analysis by race based on the total fertility rate (TFR), while the analysis by religion is based on completed fertility (see this post for a brief discussion on the difference between those two measures). There is a consequential difference between the two measures. By definition, completed fertility can only be observed for women who have finished their childbearing years, so it covers a period over the last twenty or more years. In contrast, the total fertility rate that Galiani and Sosa use was measured in 2023, after a long period of fertility decline. By construction then, the analysis using completed fertility (the analysis by religion) will be assuming higher fertility than the analysis using the total fertility rate (the analysis by race). This is highlighted by Table 1 in the paper, which shows that nearly every racial group has a total fertility rate that is below replacement (Hispanic is highest among the large groups at a TFR of 1.946, while Native Hawaiian and Pacific Islanders have a TFR of 2.218), whereas there are several religious groups with completed fertility rates above replacement (including Mormons at 3.4, and Muslims at 2.4).

Second, their calibration implies much bigger gaps across religious groups than across racial groups. Specifically, they assume greater dispersion in fertility and retention by religion than by race. That means that the forces driving fertility change within population groups are much stronger in the analysis by religion than the analysis by race. So, essentially this doubles down on the effect of higher fertility that arises from the different data sources.

Overall, I don't find the comparison across the two models to be credible. They are employing different measures, taken from different points in time, and applying different modelling assumptions. In contrast, the results within each model showing that the relative group proportions change over time to favour groups that have higher fertility are plausible and are worth taking account of. For instance, Galiani and Sosa conclude that:

Although the objective is not to forecast outcomes for particular groups, our world simulations imply not only a more religious composition but also that, within the horizon we study, Muslims become the largest tradition by share.

That seems like a sensible conclusion to draw based on the evidence, especially as they explicitly note that they aren't trying to forecast the population. Nevertheless, they do forecast the population, and their results are not entirely consistent with what is expected to happen. World population is set to start declining later this century in large part because of declining overall fertility, and their results based on religion suggest that this is suddenly going to reverse course, and remain upward over a time horizon of ten generations. In reality, the long-run trend in fertility is difficult to change in the real world, and applying some complicated economic modelling in a way that appears to overturn the on-the-ground reality is not going to contribute to a change.

[HT: Marginal Revolution]