Wednesday, 4 March 2026

This is not how generative AI should be used in research

I've been using ChatGPT Pro to help with drafting research papers this year, as I noted that I would do in this post from January. It has amped up my productivity a lot, allowing me to finish writing up two papers already, with a third on the way. These were papers where the analysis was already done, but it was the writing that was holding up the process. Having ChatGPT to help with the drafting seems to kickstart my writing, even though I have ended up extensively re-writing everything that ChatGPT produces. I find it a good disciplining tool as much as anything. Several colleagues have asked whether I am disclosing my generative AI use to journal editors when I submit. And I do. I have a standard 'generative AI use statement' that I include in my papers, that notes how it was used, and that I remain responsible for all of the content. You can see an example in this recent working paper.

However, not everyone is as careful with their generative AI use, or as transparent. Consider this example:

That is both infuriating and a sad indictment of the reviewing, editing, and publishing process, not least because, as on Reddit commenter noted, many authors see high-quality work rejected by journals, whereas a paper like this, with obvious flaws, has successfully been published. And it's not an isolated incident. This 2025 article by Artur Strzelecki (University of Economics in Katowice), published in the journal Learned Publishing (open access), catalogues over 1300 instances of likely unacknowledged and frankly stupid use of ChatGPT, up to September 2024.

Strzelecki's approach is to search for text strings that are almost certainly ChatGPT responses to a prompt asking it to generate text. The main example Strzelecki uses, which is in the title of the article, is "as of my last knowledge update". No human author is going to say that in a research paper. Similarly, "as an AI language model", "I don't have access to", and "certainly, here is" are highly indicative of ChatGPT use. There are circumstances where a human might use those phrases in a research paper, but it seems unlikely. Strzelecki screens out papers that mention ChatGPT, and manually checks each paper to ensure the text was not in some way legitimate, and that leaves 1362 articles.

How do these articles get published with this content intact? There are lots of stopping points where this could be caught and corrected (or prevented), but these articles have gotten through all of them. Strzelecki outlines the process. First, perhaps it is only one of the authors (and not all of them) that used ChatGPT. In which case, why didn't the other co-authors pick it up? Next, the paper is submitted to a journal, and often goes through a text review by the publisher. And then the editor or editors (including associate editors) looks at it, and decides whether it should be sent out for peer review. And then the peer reviewers (usually more than one, sometimes four or more) look at the paper in detail and provide comments. Then the editor receives the review reports and makes a decision. The paper may go through more than one round of review and editorial decision. And then, once accepted for publication, the article may be copy-edited. And at any of those stages, this text could be picked up. And yet, for over 1300 articles as of September 2024, the ChatGPT-generated text has not been picked up.

Strzelecki particularly focuses on 89 articles that have been published in journals indexed by Scopus or Web of Science, which should be the most credible journals. Of these:

...as many as 28 of them are in journals with Scopus percentile values of 90 and above. Two journals have a 99th percentile, indicating that they are the top journals in their field...

In total, 64 articles were found in journals considered to be in Q1, top quartile, recognized as the group of the best journals in their respective fields. Twenty-five articles are in the percentile range between 50 and 75, indicating that the journals in which these articles are found belong to Q2.

So, this phenomenon is not limited to low-ranked 'predatory' journals. In fact, looking at the list, there are several journals published by MDPI and Frontiers (for more on those publishers, see here). However, there are a whole lot published by Elsevier and Springer, publishers that we should expect much better of. Although, those are also publishers that publish a lot of journals, and a lot of articles, so perhaps that accounts for their higher numbers within the 89 articles that Strzelecki focuses on. Fortunately, I don't see any reputable journals in economics in the list, but I could be wrong.

Anyway, the takeaway is not so much that generative AI use is widespread in the write-up of research. It is that authors are using generative AI, not being transparent in their use of it, and that the quality control system by journals, even high-ranking journals, is terrible. Strzelecki makes a good point in the conclusion of his article that 89 out of over 2.5 million articles indexed in Scopus is only 0.000035% of the total indexed articles. However, this analysis is only picking up the really, really obvious cases. There will be far more use of generative AI that has not been adequately checked or acknowledged by authors, and not picked up in quality control.

I'm not against using generative AI in the write-up of research. Obviously, because I am doing the same thing. What needs to happen is that researchers need to be transparent and honest when they use generative AI, so that editors, reviewers, and the readers of research can see how it was used. That way, the users of research can evaluate for themselves whether they should believe, discount, or discard research depending on the ways and the extent of generative AI use. Without transparency, that important evaluation step is lost.

[HT: Artur Strzelecki]

Read more:

Monday, 2 March 2026

You can make future population decline disappear just by changing the way you categorise people and fertility

Fertility has been on a long-term declining trajectory worldwide and, apart from the occasional blip, in every country. There seems to be no prospect of a reversal of this trend, and no prospect of fertility returning to the replacement level of approximately 2.1 births per woman. So, when you see a research paper claiming that "high-fertility, high-retention groups persist, gain share, and lead the total population to grow", you should sit up and take notice. That is, at least, until you've carefully thought about the paper in question.

That's what happened to me with this 2025 NBER Working Paper by Sebastian Galiani (University of Maryland, College Park) and Raul Sosa (Universidad de San Andres). They create and calibrate models of fertility based on two different subgroupings (by race, and by religion), and taking account of cultural transmission of fertility rates from mothers to daughters. They then use their calibrated models to simulate population change going forward for ten generations. What they find when the population is categorised by race is a decreasing population, as shown in Figure 1 Panel A from the paper:

And when Galiani and Sosa categorise the population by religion, they instead find an increasing population, as shown in Figure 2 Panel A from the paper:

Now, this struck me as really odd. We’re talking about the same country and the same underlying population. If you split that population into subgroups and take a weighted average of what happens in each subgroup, you should get back the outcome for the population as a whole. If you are measuring the same underlying thing consistently, changing the subgroups (race in one analysis, and religion in another) shouldn’t magically create or destroy population growth in the model. At most, it should change which groups are growing faster and therefore how the composition by group changes over time, with high-fertility groups making up a larger share of the population and lower-fertility groups making up a smaller share. But the headline result here is much stronger than that, with the direction of population growth in aggregate changing direction entirely depending on the groupings that are employed. Galiani and Sosa use those results to conclude that:

...whenever at least one group remains above replacement on the female line and transmits identity effectively, its share rises and turns the aggregate path upward.

The first part of that conclusion makes sense, but the second part stretches credibility. It made me wonder whether the results were being driven by unusual features of the model, or by different modelling choices in the two analyses. 

So, I dug into the paper, which is not an easy task as it is quite theoretical. And there are consequential differences between the two analyses (by race and by religion) that drive the difference in results. First, they use different measures of fertility, with the analysis by race based on the total fertility rate (TFR), while the analysis by religion is based on completed fertility (see this post for a brief discussion on the difference between those two measures). There is a consequential difference between the two measures. By definition, completed fertility can only be observed for women who have finished their childbearing years, so it covers a period over the last twenty or more years. In contrast, the total fertility rate that Galiani and Sosa use was measured in 2023, after a long period of fertility decline. By construction then, the analysis using completed fertility (the analysis by religion) will be assuming higher fertility than the analysis using the total fertility rate (the analysis by race). This is highlighted by Table 1 in the paper, which shows that nearly every racial group has a total fertility rate that is below replacement (Hispanic is highest among the large groups at a TFR of 1.946, while Native Hawaiian and Pacific Islanders have a TFR of 2.218), whereas there are several religious groups with completed fertility rates above replacement (including Mormons at 3.4, and Muslims at 2.4). 

Second, their calibration implies much bigger gaps across religious groups than across racial groups. Specifically, they assume greater dispersion in fertility and retention by religion than by race. That means that the forces driving fertility change within population groups are much stronger in the analysis by religion than the analysis by race. So, essentially this doubles down on the effect of higher fertility that arises from the different data sources.

Overall, I don't find the comparison across the two models to be credible. They are employing different measures, taken from different points in time, and applying different modelling assumptions. In contrast, the results within each model showing that the relative group proportions change over time to favour groups that have higher fertility are plausible and are worth taking account of. For instance, Galiani and Sosa conclude that:

Although the objective is not to forecast outcomes for particular groups, our world simulations imply not only a more religious composition but also that, within the horizon we study, Muslims become the largest tradition by share.

That seems like a sensible conclusion to draw based on the evidence, especially as they explicitly note that they aren't trying to forecast the population. Nevertheless, they do forecast the population, and their results are not entirely consistent with what is expected to happen. World population is set to start declining later this century in large part because of declining overall fertility, and their results based on religion suggest that this is suddenly going to reverse course, and remain upward over a time horizon of ten generations. In reality, the long-run trend in fertility is difficult to change in the real world, and applying some complicated economic modelling in a way that appears to overturn the on-the-ground reality is not going to contribute to a change.

[HT: Marginal Revolution]

Read more:

Sunday, 1 March 2026

Why specialist vape retailers may tend to locate in more socially deprived areas

When I first started studying the social impacts of alcohol outlets, one of the things my research team and I were interested in was where alcohol outlets located. We found (see here) that off-licence outlets tended to locate in areas of high deprivation in Manukau City. I've since replicated that analysis a couple of times in unpublished work, for both South Auckland and Hamilton.

I was interested to see that this new article by Robin van der Sanden (Massey University) and co-authors, published in the New Zealand Medical Journal (sorry, no ungated version online, but you can sign up for open access for free), finds very similar results for specialist vape retailers (which are defined here). They used Google Maps and Google Street View data to locate all of the specialist vape retailers across 14 Auckland suburbs, then categorised them into three types: (1) upmarket; (2) budget; and (3) 'store-within-a-store' (which are located inside or attached to convenience stores, petrol stations or liquor stores. The main results in terms of the relationship between store numbers and social deprivation are shown in Figure 1 from the paper:

This figure shows the median number of specialist vape retailers (in total and by type) by social deprivation. In their sample, stores tend to be more likely to be located in the most deprived two deciles (9-10), and least likely to be in the least deprived two deciles (1-2). Aside from that, I wouldn't draw too much from the analysis here. Because these are median counts per suburb group (not per capita or per land area), differences could reflect population size, commercial zoning, or land area rather than ‘density’. So if high deprivation suburbs also tend to have higher populations, or to be larger in area, then the apparent relationship between social deprivation and the number of specialist vape retailers is confounded. However, at the highest level there does seem to be some tendency. Van der Sanden et al. worry about this, concluding that:

The concentration of SVRs in high-deprivation suburbs in Auckland may warrant further regulatory responses that better balance the needs of predominately adults to access vaping products as a means to stop smoking with limiting vape products to young people who have never smoked...

However, Van der Sanden et al. don't really explore why specialist vape retailers may locate in areas of high deprivation. I've done quite a bit of exploration and thinking on this in relation to off-licence alcohol outlets, and I suspect that the reasons might be similar. And it doesn't require retailers to be 'targeting' high deprivation communities in some predatory business strategy. I have a few hypothesised reasons for more specialist vape retailers in more socially deprived areas can be explained with some simple economics.

First, if a prospective retailer is looking to run a retail store that maximises profits, one of the aspects that they must consider is the costs of operating the business. Ceteris paribus (all else held constant), a store with lower costs will be more profitable. Areas of high deprivation tend to have lower commercial rents, and are therefore less costly to operate, and will generate higher profits from the same revenue.

The second hypothesis is a little more complex, and involves a bit of economic geography. Each store may have a particular 'catchment area', which is the area from which its customers come to the store. In a low deprivation area, where everyone owns a car, and often commutes a fair distance for work, the catchment area for a store might be quite large. So, stores that are located close together will be in direct competition for consumers, since their 'catchment areas' will substantially overlap. In contrast, in a high deprivation area, fewer people might own cars, or they may not run reliably, or they may only be able to afford to drive them to and from work without long side-quests to buy vapes. So, the 'catchment area' for a store will be much smaller, and stores can be located closer together without being in direct competition for consumers. And so, we might expect to see more vape stores in areas of high deprivation than in areas of low deprivation, because the retailers are trying to minimise competition with other stores (although they may then need to balance a smaller catchment, which has less spending power, against the costs of operating the store).

Finally, the differences may reflect differences in demand. If vaping rates are higher in more socially deprived areas, then demand for vaping products may also be higher in those areas, and attract more vape retailers. I don't really know whether there is a social gradient in vaping, although the New Zealand Health Survey suggests that there is, with more vaping among people living in areas in the most socially deprived quintile. Of course, there is a potential reverse causation problem with the demand-side explanation, because more specialist vape retailers located in socially deprived areas might drive more vaping in those areas.

None of that is to say that having more specialist vape retailers in more socially deprived areas is a desirable outcome (especially if they do indeed drive more vaping). Van der Sanden's proposed policy response may be appropriate. However, the situation we observe could be explained by some simple economics. So if policymakers want to reduce retail availability of vaping products, they can focus on practical levers (licensing, zoning, proximity rules) without relying on arguments about predatory business practices, or vilifying store owners (both of which I have seen in the case of alcohol retailers).

Friday, 27 February 2026

This week in research #115

Here's what caught my eye in research over the past week (another slow week):

  • Mortágua gets deep into the theoretical weeds on the question of whether crypto-assets are money
  • Carpenter et al. (with ungated earlier version here) use 2021 Canadian Census data to look at earnings disparities experienced by nonbinary people, and find that nonbinary individuals assigned male at birth, transgender men, transgender women, and cisgender women all earn significantly less than comparable cisgender men

Also new from the Waikato working papers series:

  • Valera, Lubangco, and Holmes propose a new measure of revisions to consumer inflation expectations that uses repeated cross-sections rather than panel data, and show that individual inflation expectations are sensitive to price changes across 14 food and energy goods