Sex, Drugs and Economics: How prevalent is large language model use in the write-up of economics research?

Back in January, I poked fun at a paper on students' acceptance of ChatGPT that had parts that were clearly written by generative AI. And I recently read a working paper that was great (and I'll blog on it sometime soon), up until the Conclusion section, which was clearly written by generative AI. But how common is this? Academics worry about how often students are using AI to write assignments or essays, but how often are we doing so?

That is essentially the question addressed in this new article by Maryam Feyzollahi and Nima Rafizadeh (both University of Massachusetts Amherst), published in the journal Economics Letters (sorry, I don't see an ungated version online). Feyzollahi and Rafizadeh investigate the top 25 economics journals over the period from 2001 to 2024, and basically look for word choices that are characteristic of large language models (LLMs). As they explain:

We construct two equally-sized word sets for our analysis: treatment words that are characteristically associated with LLM-assisted writing, and control words that represent traditional academic writing patterns... The treatment words are selected based on two criteria. First, we analyze a large corpus of confirmed LLM-generated academic text to identify words that appear with systematically higher frequency compared to human writing. Second, we cross-reference our selections with existing literature on language model patterns... to validate our choices... The control words are selected based on two criteria. First, these words represent established economic and econometric concepts that have maintained consistent usage patterns in academic writing over our sample period. Second, they are semantically unrelated to our treatment words, ensuring that any potential changes in treatment word frequencies do not spillover to or correlate with control word usage through meaning associations.

I know that you're wondering about the word list, and it is provided in Table 2 from the paper:

That list seems more nuanced (I swear that ChatGPT did not write this sentence!) than the word choices that have previously highlighted as signals of LLM use, like "rich tapestry", "realm", or "mosaic". However, some old favourites like "delve" and "foster" do appear in the list, so clearly LLMs haven't completely evolved to avoid their characteristic phrases.

Feyzollahi and Rafizadeh compare the relative frequency of the treatment and control words. Their results are quite well illustrated in Figure 1 (a) from the paper, which shows how the use of the words "intricate" (a treatment word) and "coefficient" (a control word) have changed over time:

Maybe starting from 2023, research became more intricate, or there were more intricacies in the findings of research? Or more likely, LLMs suddenly started to play an increasing role in the write-up of research. Generalising from that comparison of just two words, Feyzollahi and Rafizadeh use a simple regression model and find:

...compelling evidence of increasing LLM adoption over time. When considering both post-treatment years... the analysis documents a significant increase of 4.76 percentage points in the frequency of LLM-associated terms, with the effect maintaining remarkable stability across all specifications.

And then when comparing 2023 and 2024, Feyzollahi and Rafizadeh find:

...an accelerating pattern of LLM adoption. The initial impact in 2023... shows an increase of 2.85 percentage points, while the effect more than doubles to 6.67 percentage points in 2024...

So, LLM use is small, but growing quickly in academic economics. And there are many reasons to believe that these results understate the true use of LLMs in the write-up of research in economics. It takes some time for research to get published, so there will likely be far more papers in the 'publication pipeline' that have used LLMs. Authors can re-write text that was drafted by an LLM in order to mask the LLM's contribution. LLMs may be getting better at writing in an 'academic style' that avoids the use of phrases that signal the use of an LLM (no more delving!).

Overall though, it is clear that LLMs are increasingly being used to write up research for publication. A relevant question to ask is: does LLM use reduce the quality of the underlying research? Personally, when I read a paper where an LLM has clearly been used in the writing, I chuckle to myself. However, I haven't as yet had cause to disbelieve the underlying results of the research. However, my reaction doesn't necessarily reflect the views of academics in general. Regardless, when an LLM is used that use should be transparently disclosed by the authors (indeed, John List reported results of a quick survey of his followers on LinkedIn recently, where only 14 percent of them suggested that the use of Claude in a research paper should not have been disclosed).

We should not be surprised that researchers are using LLMs. LLMs can increase our productivity by helping us to write up our research more quickly. In that sense, we face similar incentives to students, who are trying to complete their assignments and essays more quickly. Both students and researchers, though, should at the very least acknowledge their use of these tools.

The irony that is using ChatGPT to write a research paper about students' acceptance of ChatGPT

Sex, Drugs and Economics

Tuesday, 3 June 2025

How prevalent is large language model use in the write-up of economics research?

No comments:

Post a Comment