Saturday, 28 March 2026

More on the toxic environment in Economics Job Market Rumors

The Economics Job Market Rumors (EJMR) website began as a forum for PhD students to discuss the economics job market, but it has long since become notorious for misogyny, racism, and other toxic behaviour (see this post, for example), due in large part to the anonymous nature of the platform. And even though the user community at EJMR has been called out for their behaviour, it doesn't seem to have gotten much better over time. This is documented by this 2025 article by Florian Ederer (Boston University), Paul Goldsmith-Pinkham, and Kyle Jensen (both Yale University), published in the journal AEA Papers and Proceedings (ungated earlier version here).

Ederer et al. analyse content from EJMR over the period from January 2012 to May 2023, documenting a number of changes. First:

...starting in 2018, EJMR saw an explosion in discussions initiated by references to Twitter posts. This shift mirrors Twitter’s growing importance as a real-time source of information and debate in academic and public policy circles.

Twitter (now X) essentially took over from YouTube as being the source of initial references on EJMR from about 2018, which is about the time of the earlier research on toxicity and misogyny on the platform. There were also surprising declines in Marginal Revolution and NBER links as the starting point for EJMR discussions. Given the predominance of Twitter as a source, Ederer et al. then look in more detail at which Twitter accounts were most referenced, reporting that:

These accounts can be broadly categorized into three main groups: economists, right-wing commentators, and journalists. The group of economists (e.g., Claudia_Sahm, jenniferdoleac, and JustinWolfers) includes academic and professional economists from diverse institutions whose tweets often serve as springboards for debates on research findings, policy implications, and professional conduct. The second group includes polarizing and predominantly conservative commentators and agitators (e.g., realChrisBrunet, RichardHanania, and libsoftiktok) and reflects EJMR’s right-wing slant and engagement with contentious political and social issues. The third group is a collection of news sources and journalistic accounts, many of which have a conservative slant (e.g., visegrad24, disclosetv, and nypost).

Finally, Ederer et al. characterise the posts linking to each Twitter account in terms of 'hate speech', 'negativity', 'misogyny', and 'toxicity' (based on measures from their companion paper here), finding that:

Among the 10 most frequently mentioned Twitter accounts, there are four economists, including three female economists. EJMR posts referencing two of these female economists (Claudia_Sahm and jenniferdoleac) have very high average z-scores of 1.974 and 2.598 for the Misogynistic classifier, indicating that EJMR posters discuss them in strongly misogynistic terms compared to all other Twitter accounts mentioned on EJMR... The only other large average z-score for the Misogynistic measure is for EJMR posts referencing elben (z-score Misogynistic = 0.956), an academic economist who has championed LGBTQ-inclusive policies in the economics profession.

In other words, since 2018 EJMR has remained a hostile and misogynistic platform, with its toxicity increasingly fed by same antagonism and culture-war discourse on Twitter/X. EJMR is not just an academic forum, but has become part of that broader hostile ecosystem.

Economists need places where they can share research in progress, ideas, and practical advice, especially early in their careers. In its early days, EJMR served that purpose. However, it has long since become a space that early career economists are better off avoiding.

[HT: Marginal Revolution, in January last year]

Read more:

Friday, 27 March 2026

This week in research #119

Here's what caught my eye in research over the past week (another very quiet week, it seems):

  • Clemens et al. analyse the effect of California's $20 fast food minimum wage, which was implemented in 2024, and find that food away from home prices increased by 3.3 to 3.6 percent in areas subject to the minimum wage relative to control areas (so firms passed on their cost increase to consumers)

Tuesday, 24 March 2026

Evidence that artificial intelligence is increasing the impact, but narrowing the scope, of research

There is growing evidence of positive impacts of generative artificial intelligence on productivity. This includes productivity in research (see this post, for example), including my own. However, some have questioned whether increasing research productivity comes at a cost of narrowing the scope of research.

So, I was interested to read this article by Qianyue Hao (Tsinghua University) and co-authors, published in the prestigious journal Nature (ungated earlier version here) late last year. They look at the impact of AI tools (not limited to generative AI) on the productivity of researchers and the quality of research. Specifically, they look at authors publishing in six representative fields: biology, medicine, chemistry, physics, materials science, and geology, across three 'eras': (1) the 'machine learning era ' (from 1980 to 2014), the 'deep learning era' (from 2015 to 2022), and the 'generative AI era' (from 2023 onwards). Hao et al. compare authors who publish 'AI augmented papers' with those who do not. An 'AI augmented paper' is one that uses methods such as:

...support vector machines and principal component analysis from the machine learning era, and convolutional neural networks and generative adversarial networks from the deep learning era. Large language models, which have emerged in recent years, also rank among the most frequently used methods...

Using a dataset that includes over 27 million papers with complete records that were published between 1980 and 2025, of which about 310,000 were 'AI augmented', Hao et al. find that:

...annual citations to AI papers are 98.70% higher than those to non-AI papers on average...

So, AI augmented research gathers more citations, which suggests that authors using AI in their research achieve greater impact. This is reinforced by evidence that AI augmented papers are published in higher quality journals (with Q1 journals being the highest ranked). Hao et al. report that:

...the proportion of AI papers in Q1 journals is 18.60% higher than that of non-AI papers in all journals; in Q2 journals, the AI proportion is 1.59% higher; whereas Q3 and Q4 journals hold a relatively lower proportion of papers with AI... These results indicate a heterogeneous distribution of AI-augmented papers across journals, with a higher prevalence in high-impact journals.

And AI appears to make authors more productive, as:

On average, researchers adopting AI annually publish 3.02 times more papers... and garner 4.84 times more citations... than those not adopting AI, with consistency.

All of these results seem to hold across all of the disciplines that Hao et al. consider. However, it is not all good news. Hao et al. use machine learning to create a measure of the 'breadth of scholarly attention'. Using that measure, they find that:

Compared with conventional research, AI research is associated with a 4.63% contracted median collective knowledge extent across science, which is consistent across all six disciplines... Moreover, when dividing these disciplines into more than two hundred sub-fields, the contraction of knowledge extent can be observed in more than 70% of them...

Of course, some of the differences here may be due to selection, as the types of researchers, and the types of research, involving AI use may be meaningfully different from those that don't. However, putting the selection issues aside, Hao et al. note that there is a tension between the individual researcher's incentive to produce a greater quantity of research that has higher impact, which would suggest greater use of AI, and the social incentive to produce a greater breadth of research.

So, the takeaway from this paper is that we need to consider researcher incentives, not just productivity. Specifically, this research suggests that the use of AI in research is leading to a 'prisoners' dilemma' outcome: each individual researcher acting in their own best interests (and using AI in their research) leads to an outcome that is worse for society overall (less breadth of research and more incremental gains).

Hao et al. conclude that:

The substantial academic benefits of AI use may be a driving force behind its accelerated rate of adoption; however, we also find unintended consequences from the increased prevalence of AI-augmented research. In all fields, AI-augmented research focuses on a narrower scope of scientific topics and reduces the scientific engagement of follow-on research, leading to more overlapping research work that slows the expansion of knowledge. Further, with a greater concentration of collective attention to the same AI papers, the adoption of AI seems to induce authors to converge on the same solutions to known problems rather than create new ones.

So, what is the solution here? Society probably wants research to be higher quality and have a broad scope. But individual researchers' incentives to use AI in their research appears inconsistent with that outcome. The traditional prisoners' dilemma is a repeated game (see here or here, for example), and the players of that game can avoid the worst outcome by cooperating. In this case, the researchers could cooperate by agreeing not to use AI in their research. The problem is that every researcher has an incentive to cheat on that agreement, since if they use AI, then that will be good for their career. This prisoners' dilemma is more difficult to ensure cooperation in than the traditional game, because there are not just two players who need to cooperate, but thousands (or millions). Ensuring cooperation in a prisoners' dilemma game with many players, each of whom is far better off cheating than cooperating, is almost impossible (which is why solving the problem of climate change is so difficult).

My own view is that the answer is not to keep AI out of research. That is not realistic, in the same way that it's not realistic to expect students not to use generative AI. The incentives need to be redesigned, but this will be no easy task. As long as universities, research funders, and publishers reward researchers for quantity, citations, and publication in top-ranked outlets, then we should expect more AI-augmented work, with a narrower scope than society might prefer. If we want AI to expand knowledge rather than simply accelerate competition within narrow foci, then we need institutions that also reward novelty, breadth, and the discovery of new questions. That is the economic challenge we must face up to.

[HT: Marginal Revolution]

Monday, 23 March 2026

The relationship between obesity of politicians and corruption is correlation, not causation

Not every correlation between two variables represents a causal relationship. Even if we can tell a compelling story about why a change in one variable might cause a change in another, that doesn't make the relationship causal. Sometimes a correlation actually results from something other than the story you tell. Sometimes the correlation is just random noise (a spurious correlation). So, we should be cautious when interpreting correlations.

I was reminded of this when reading this 2021 article by Pavlo Blavatskyy (University of Montpellier), published in the journal Economics of Transition and Institutional Change (sorry, I don't see an ungated version online). The article even generated a small debate, with a comment by György Márk Kis, and then a reply by Blavatskyy, appearing in the same issue of the journal.

In the original article, Blavatskyy looks at the relationship between the body mass index (BMI) of politicians in a country and the Corruption Perceptions Index by Transparency International. The data Blavatskyy uses is for 2017, and the sample of countries is limited to 15 post-Soviet countries (Armenia, Azerbaijan, Belarus, Estonia, Georgia, Kazakhstan, Kyrgyzstan, Latvia, Lithuania, Moldova, Russia, Tajikistan, Turkmenistan, Ukraine, and Uzbekistan). The argument for why this correlation matters is explained in Blavatskyy's reply to Kis:

One common form of corruption/lobbying is inviting governmental officials to lavish banquets with excessive consumption of food and drinks... Corrupt politicians frequenting such banquets might risk gaining extra weight. This ‘hedonic theory of corruption’ postulates the existence of a positive relationship between median body mass index of public officials and the level of grand political corruption in society.

So, Blavatskyy is able to tell a good story for why greater corruption would cause higher BMI among politicians. However, that doesn't mean that the relationship is causal. Even though the correlation between perceived corruption and median politician BMI is clear, from Figure 1 in the original paper:

Low numbers in the Corruption Perceptions Index represent higher levels of perceived corruption. So, this figure shows that countries where the politicians have higher median politician BMI have higher levels of perceived corruption.

Kis took issue with a number of things in the paper. First, why those 15 countries? Why not all countries? Kis shows that if you separate the 15 countries in Blavatskyy's sample by their geographic location, you get different relationships within each subsample. However, the broader question is not what happens when you look at subsamples, but does this relationship hold if you add more countries to the sample? Neither Blavatskyy nor Kis answer that question. We should also wonder whether there is something special about 2017 that leads to this correlation. Does it hold in other years?

In his reply, Blavatskyy doesn't really address those two points (narrow sample, and a single year) in a convincing way. Instead, he narrows the sample even further to look at changes in politician BMI and perceived corruption for just one of the countries in his sample, Ukraine. In that analysis, he again shows a correlation between corruption perceptions and politician BMI, in this case over time for Ukraine. However, that simply raises the question of: why Ukraine? Why didn't he look at other countries in his sample in that way? And just because Ukraine shows a correlation over time, that still doesn't demonstrate a causal relationship.

Kis also takes issue with the machine learning algorithm that Blavatskyy uses to estimate the BMI for politicians in his sample. Kis notes that the accuracy of the algorithm is quite dubious (my words, not Kis's), with:

...errors of at least 5.5 in 21.1% of the time.

That's an error in the estimated BMI of 5.5 in over 20 percent of cases. That extent of measurement error would be problematic. To that, I would add that it is unclear whether the training sample that the machine learning algorithm was trained on included people from post-Soviet countries. The relationship between facial features and BMI could well be ethnic-specific, in ways that systematically bias the results. We have no way of knowing. And Blavatskyy didn't address this point in his reply.

Now, the point of this post is to focus on correlation or causation. From what I have seen, this seems a likely candidate for confounding. There are any number of variables that might increase politician BMI and increase corruption, without corruption being a cause of higher politician BMI. As one example, a country with high inequality might simultaneously have high corruption (with petty officials willing to take bribes to supplement their low incomes) and high politician BMI (since politicians would likely be among the wealthy class in society). Blavatskyy doesn't consider confounding variables such as inequality, or differences in age distribution, or differences in average BMI in the population, or regional differences in diet, in his analysis.

Now, to be fair to Blavatskyy, he doesn't adopt a causal interpretation of his results (except in his response to Kis, as I quoted above). Instead, Blavatskyy argues that, if BMI and perceived corruption are correlated, then we might infer how much corruption is being experienced in a country by looking at the median BMI of its politicians. However, even that inference is problematic, and Blavatskyy should know why. He gives the example of Swiss watches in China as a proxy for corruption, but then notes that:

...the rise of social media and Internet anti-corruption platforms in 2011–2012 made it no longer possible to measure grand political corruption through visible luxury Swiss watches. Luxury Swiss watches could still be a popular expenditure of corrupt governmental officials, but these officials are now more careful not to reveal their Swiss watches to the general public.

When politicians realised that their Swiss watches were giving away their corruption, they stopped showing off their Swiss watches. If politicians realised that their expanding waistlines were giving away their corruption, wouldn't they invest more in personal trainers (or liposuction)? As soon as this correlation was used for inference, the correlation would likely start to break down. This again illustrates the limited usefulness of such proxies.

Correlation does not imply causation. And sometimes, correlation today does not imply correlation in the future. We need to be much more cautious when considering analyses like this one.