Friday, 28 November 2025

This week in research #103

Here's what caught my eye in research over the past week (clearly a very quiet week!):

  • Jurkat, Klump, and Schneider (with ungated earlier version here) report on a meta-analysis of 55 papers containing 2,468 estimates of the impact of industrial robots on wages, finding that the overall effect is close to zero and statistically insignificant
  • Chekenya and Dzingirai find, using African data from 1997 to 2014, that migration significantly increases conflict incidence, with effects concentrated in countries and regions in Africa with weak governance and economic stress
  • Cafferata, Dominguez, and Scartascini (with ungated earlier version here) find that overconfident individuals (in the US and Latin America) are more willing to accept the use of guns and more likely to declare their willingness to use guns
  • Bucher-Koenen et al. (with ungated earlier version here) find that financial advisors in Germany offer more self-serving advice to women, while men are more likely to receive sales fee rebates and less likely to be recommended expensive in-house multi-asset funds

And the latest paper from my own research (or, more accurately, from the thesis research of my successful PhD student Jayani Wijesinghe, on which I am a co-author along with Susan Olivia and Les Oxley):

  • Our new article (online early version, open access) in the journal Economics and Human Biology describes the patterns of lifespan inequality at the state level in the United States between 1959 and 2018, and identifies the state-level demographic and socioeconomic factors that are associated with lifespan inequality

Wednesday, 26 November 2025

Shots fired at the end of a debate on contingent valuation

I have written a number of posts about debates on the contingent valuation method (most recently here, but see the links at the end of this post for more). A 2016 debate that I blogged about here, was picked up again in 2020 (but I didn't blog about it then because I was kind of busy trying to manage the COVID lockdown-online teaching debacle). So, what happened? The first of two 2020 articles published in the journal Ecological Economics (sorry I don't see an ungated version online) is by John Whitehead (Appalachian State University), a serial participant in contingent valuation debates.

This part of the debate centres on 'adding up tests', which essentially test for scope problems. To reiterate (from this post):

Scope problems arise when you think about a good that is made up of component parts. If you ask people how much they are willing to pay for Good A and how much they are willing to pay for Good B, the sum of those two WTP values often turns out to be much more than what people would tell you they are willing to pay for Good A and Good B together. This issue is one I encountered early in my research career, in joint work with Ian Bateman and Andreas Tsoumas (ungated earlier version here).

An 'adding up test' tests for whether the willingness to pay for the global good (Good A and Good B together) is more than adding the willingness to pay for Good A alone to the willingness-to-pay for Good B alone. In relation to this particular debate, Whitehead summarises where we are up to:

Desvousges et al. (2012) reinterpret the two-scenario scope test in Chapman et al. (2009) as a three-scenario adding-up test. They then assert that the implicit third willingness-to-pay estimate is not of adequate size. Whitehead (2016) critiques the notion of the adding-up test as an adequacy test and proposes a measure to assess the economic significance of the scope test: scope elasticity. Chapman et al. (2016) argue that Desvousges et al. (2012) misinterpret their scope test. Desvousges et al. (2016) reply that they did not misinterpret the Chapman et al. (2009) scope test and assert that their adding-up test in Desvousges et al. (2015) demonstrates one of their points.

Desvousges et al. (2015) field the Chapman et al. (2009) survey with new sample data collected with a different survey sample mode than that used by Chapman et al. (2009) and three additional scenarios. Desvousges et al. (2015) conduct an adding-up test and argue that willingness-to-pay (WTP) for the whole should be equal to willingness-to-pay for the sum of four parts (the first, second, third and fourth increment scenarios). Desvousges et al. (2015) find that “The sum of the four increments … is about three times as large as the value of the whole” (p. 566).

Whitehead joins the debate on the side of Chapman et al., defending them by examining Desvousges et al.'s analysis and showing that it actually does meet an 'adding up test', thereby showing that there are no scope problems in the original Chapman et al. paper. Whitehead concludes that there are a number of problems in the Desvousges et al. analysis:

First, they do not elicit WTP estimates explicitly consistent with the theory of the adding-up test. Their survey design suggests that a one-tailed test be conducted where the sum of the WTP parts is expected to be greater than the WTP whole. Second, there are several data quality problems: non-monotonicity, flat portions over wide ranges of the bid function and fat tails. Each of these data problems leads to high variability in mean WTP across estimation approach and larger standard errors than those associated with nonparametric estimators that rely on smoothed data.

I'm not going to get into the weeds here, because what I want to highlight is the response by William Desvousges, Kristy Mathews (both independent consultants), and Kenneth Train (University of California - Berkeley), also published in the journal Ecological Economics (and also no ungated version available). The response is only two pages long, and is a very effective takedown of Whitehead. Along the way, Desvousges et al. note that Whitehead:

...made numerous mistakes in his calculations... When these errors are corrected, adding-up fails for each theoretically valid parametric model that Whitehead used.

One example of Whitehead's errors is:

He used medians for the tests instead of means, assuming – incorrectly – that the sum of medians is the median of the sum.

That's a fair criticism. However, Desvousges et al. are not satisfied leaving it at that. Instead, they go onto the attack:

Also, we examined the papers authored or co-authored by Whitehead that are cited in the recent reviews... These papers provide 15 CV datasets. Each of the three problems that Whitehead identified for our paper is evidenced in these datasets:

  • Non-monotonicity: 12 of the 15 datasets exhibit non-monotonicity.
  • Flat portions of the response curve: All 15 datasets have flat areas for at least half of the possible adjacent prompts, and 4 datasets have flat areas for all adjacent prompts.
  • Fat tails: In our data, the yes-share at the highest cost prompt ranged from 15 to 45%, depending on the program increment. In Whitehead's studies, the share ranged from 14 to 53%.

If Whitehead's data are no worse than typical CV studies, then his papers indicate the pervasiveness of these problems in CV studies.

Ouch! That seems to have ended that particular debate. My takeaway (apart from not messing with Desvousges et al.) is that the contingent valuation method is far from perfect. In particular, it is vulnerable to scope problems (which my own research with Ian Bateman and Andreas Tsoumas (ungated earlier version here) showed some years ago. Ironically, that contingent valuation has particular problems is a message that John Whitehead himself has also argued (see here).

Read more:

Tuesday, 25 November 2025

The economics of fertility in high-income countries

Earlier this year, Melissa Kearney and Phillip Levine released an NBER Working Paper on the economics of fertility in high-income countries. In part, this paper is a follow-up on their 2022 article on cohort effects and fertility (which I discussed here), as well as building on this theoretical and empirical review (ungated here) by Doepke et al. (which I discussed here).

Kearney and Levine first review the trends and patterns in fertility in high-income countries, focused in particular on cohort-based measures. This exercise re-establishes the by now well-known trend of declining fertility, across the six example countries that they selected (Canada, Japan, Netherlands, Norway, Portugal, and the US).

Kearney and Levine then turn their attention to why fertility has declined, as well as why various policies and incentives have mostly failed to arrest the declining fertility trends. Taking an economic perspective that builds from Gary Becker's work on the economics of the family, but broadens its consideration (as shown by Doepke et al.), Kearney and Levine state that:

...the evidence points us to the view that the recent decline in fertility is likely less about changes in current constraints and more about cumulative cultural and economic forces that influence fertility decisions over time. Generally, economists are loathe to rely on changes in preferences to explain behavior because that can explain virtually anything. But there are reasons to believe that the lifestyle, broadly defined, that is consistent with having a child or multiple children is becoming less desirable for many adults.

Kearney and Levine point out several times (as in the quote above) how much economists dislike resorting to changes in preferences as an explanation, because changes in preferences can be used to explain essentially anything (which renders models basically worthless). However, they acknowledge that in this context, and based on the evidence from many studies, that it is likely that "shifting priorities" (a convenient alternative name for changing preferences) are at play. These "shifting priorities":

...refer broadly to changes in individual values, which potentially reflect evolving opportunities and constraints, changing norms and expectations about work, parenting, and gender roles, and social and cultural factors.

However, Kearney and Levine still want to avoid letting changes in preferences take over. That leads them to note that:

...changes in preferences may not be generated randomly and it is important to consider the forces that might have led to such changes. In our review of empirical evidence below, we highlight a number of potential social and cultural factors that might have altered preferences for and attitudes toward childbearing in recent decades, including peer effects, media and social media influences, the role of religion and religious messaging, and changing norms around parenting and gender roles in the home and society.

For me, the key contributions of the paper are not the review sections, but the theoretical and empirical implications. For example, in terms of theory, Kearney and Levine suggest that economic modelling of family decisions needs to change. Specifically:

We propose that it is now more appropriate to consider and model labor force participation as the default option, and fertility as the discretionary activity. This reflects a major shift in societal norms and practices over the past several decades. Women in earlier cohorts were more likely to have children and less likely to work. Back then, it is reasonable to consider having children as a widespread priority for women, perhaps reflecting societal norms and expectations, and sustained participation in the paid labor force as the more “optional” choice.

That presumptive ranking quite possibly has reversed. If market work is now the norm, the labor market norms and practices, including the expectations of “greedy jobs” as described by Goldin (2014), may alter fertility behavior. The tradeoff between market work and childbearing is now about the tension between a lifetime career and the way motherhood interrupts or alters that lifetime career progression, rather than about whether women work at all after they are married or have had their first child.

In terms of empirical implications, Kearney and Levine note that economists could learn a lot from demographers, in particular in relation to recognising cohort effects. They also note that:

...a challenge for economic research going forward is that the empirical methods we often rely on for causal identification are not particularly well-suited for studying changes across cohorts, nor the impact of widespread social and cultural changes... The statistical demands on the data for causal identification often lead to a focus on the immediate impact of period-specific factors. But as noted throughout this paper, the key questions that remain to be answered in this area are about cohort-level changes and the role of less immediate and discrete changes.

In addition, a typical approach to identifying period-specific effects might generate misleading or limited policy lessons. Consider an intervention that relaxes some constraints on having a child at a point-in-time. Younger women—say, 18-year-olds—may incorporate that change into their long-term decision making, but they may not respond immediately. Meanwhile, women in their early 30s may be less responsive, having already made many related life choices (regarding careers, relationships, lifestyle, etc.). In such cases, we might observe little to no immediate effect, even if the policy ultimately influences lifetime fertility...

A policy change may lead women to move up the timing of a birth to respond to some incentive, but to have the same number of children over their childbearing years. Our methods may conclude that this policy “worked,” even though completed fertility was unaffected. 

It is important for economists to recognise where the current widely used empirical methods are likely to lead to incorrect conclusions being drawn, and Kearney and Levine have provided some important cautions here. Fertility decline is topical, and many economists will be working on research questions related to this, especially as policy initiatives are rolled out by governments trying to return to above-replacement fertility. This review by Kearney and Levine is both timely and very helpful.

[HT: Marginal Revolution]

Read more:

Sunday, 23 November 2025

The misery of diversity?

I just finished reading this 2024 NBER Working Paper by Resul Cesur (University of Connecticut) and Sadullah Yıldırım (Marmara University), provocatively titled "The Misery of Diversity". They look at whether greater genetic diversity is associated with subjective wellbeing (SWB, measured as happiness, or life satisfaction, or affect balance), and find that:

...diversity lowers human SWB, measured by cognitive life evaluations and hedonic assessments of emotional states.

Cesur and Yıldırım demonstrate these results using data on genetic diversity that comes from this 2013 article by Ashraf and Galor (ungated version here). As Cesur and Yıldırım explain:

Population geneticists demonstrate that the dispersal of anatomically modern humans via migratory routes determined within-ethnic genetic heterogeneity. As one moves away from Ethiopia via migratory tracts, genetic diversity, defined as the likelihood of two randomly picked individuals having dissimilar genetic material, decreases...

Our diversity measure impacts the outcomes of interest through social ecology, which, over many generations, likely has influenced cultural evolution. In particular, interpersonal diversity determines the endowment of genetic variation, a measure of social diversity, capturing within-group interpersonal differences across the globe...

This measure of social diversity performs better than conventional diversity indicators, such as the indices of fractionalization and polarization, in capturing the true extent of diversity... In particular, these authors show that while interpersonal population diversity has a substantial and precisely estimated impact on intrastate conflict, fractionalization, and polarization indices fail to explain it.

Underlying data for this index is the expected heterozygosity measures of 53 indigenous human populations genotyped at 780 microsatellite loci as a part of the Human Genome Diversity Project (HGDP–CEPH). It captures the probability that two randomly selected individuals within an ethnic group differ in genetic makeup. In light of the Out of Africa hypothesis, Ashraf and Galor (2013a) constructed predicted genetic diversity for each country by using the coefficient estimate of the impact of migratory distance to Addis Ababa on genetic diversity in the sample of indigenous ethnic groups across the world. Although

Using this measure, with an instrumental variables analysis, Cesur and Yıldırım show that genetic diversity causally decreases subjective wellbeing at both the country level and the individual level (using data from the World Values Survey and the World Happiness Report). Their results are robust to excluding countries that experienced large migrations after 1500 (such as countries in North America and Oceania), and to various other modelling choices. Cesur and Yıldırım dig into the mechanisms for lower subjective wellbeing, and conclude that:

...the misery of diversity is an evolutionary trap caused by the mismatch it creates between the ancestral and current social environments via reduced social cohesion, retarded state capacity, elevated mistrust, and increased inequality of economic opportunities.

So, it seems like this is good evidence that genetic diversity decreases subjective wellbeing. However, there are a couple of problems. First, when most people think about diversity, they are thinking about between-group diversity, not within-group diversity. Between-group diversity is what you get when people from different ethnic groups are together. Within-group diversity is what you get when people from the same ethnic group differ genetically from each other. Cesur and Yıldırım's measure is heavily weighted towards within-group diversity. And indeed, in one of their analyses they find that it is within-group diversity that matters the most in their analysis. When they split their measure into within-group and between-group diversity, within-group diversity has a statistically significant (and negative) effect on subjective wellbeing measures, while between-group diversity is statistically significant.

So, Cesur and Yıldırım's analysis might be correct, but at the same time kind of misses the point. Between-group diversity is something that has potential policy levers (migration policy), whereas within-group genetic diversity is not something that is amenable to policy change. At least, not without eugenics (and, to be clear, I am not advocating for that). 

The second problem comes from the analysis of first-generation and second-generation immigrants in Europe and the US, where Cesur and Yıldırım find that:

...while home country diversity continues to hurt the SWB of first-generation immigrants, such effects weaken among the second-generation, suggesting that long-run improvements in the social environment can mitigate the misery of diversity over generations.

These results are not well-explained. If a person is born in one country, and then moves to a new country, shouldn't it matter how long they are exposed to the genetic diversity in the country of birth, and how long they are exposed to the genetic diversity in the destination country, in terms of the impact on subjective wellbeing? Cesur and Yıldırım don't show any dose-response relationship here. And there should be no effects at all on the second generation (which is what they find), because for the second-generation immigrants, the genetic diversity they have been exposed to is the country of their own birth, not the country of birth of their parents. However, that is only a small problem in an otherwise interesting paper.

Overall, I think Cesur and Yıldırım need to engage a bit more with why anyone should care about genetic diversity, given that it is not amenable to policy change. Until they can do that, this paper can be filed under the interesting, but unhelpful category.

[HT: Marginal Revolution, last year]

Friday, 21 November 2025

This week in research #102

Here's what caught my eye in research over the past week (clearly a very quiet week!):

  • Buckles et al. (open access) describe the Census Tree database, which links records across historical US censuses between 1850 and 1940 (a very valuable resource!)

Thursday, 20 November 2025

The impact of a lower drink-driving limit on bars and pubs

New Zealand decreased the drink-driving limit from 0.08 to 0.05 percent BAC (blood alcohol concentration) in December 2014. In the lead-up to the change, there were worries that bars and pubs would lose business (for example, see here). Similarly, after the changes came in, there were a number of news stories about negative impacts (for example, see here and here). When my research team was doing fieldwork in 2019, a local bar owner we talked to complained about how dead the Hamilton CBD was during the week (from our observations, the weekends were still pretty busy! [*]).

So, I was interested to read this 2020 article by Colin Sumpter (NHS Forth Valley) and co-authors, published in the journal Drug and Alcohol Review (open access). They interviewed bar and pub managers and owners in 2018, over three years after Scotland introduced the same reduction in drink-driving limit that New Zealand did [**]. Sumpter et al. note that before the law change, there was a lot of opposition from within the industry, similar to what we saw in New Zealand around the same time. However, Sumpter et al.'s results, based on qualitative analysis of in-depth interviews with 16 bar or pub owners or managers, shows that the results are more nuanced. First:

Most participants reported that prior to the limit change, there was little concern about the potential impact the change would have on their own business, although many felt it would impact on the hospitality industry as a whole. Post-limit change, most participants felt there had been no overall impact on their profits. A few reported a short-term impact that had lasted six to 12 months, but had seen profits return to normal after this period. A small minority reported a significant and persisting financial impact on their business and a similar number reported a smaller persisting financial impact. Rural pubs were more likely to report a negative economic impact while urban food-led establishments were less likely to report this as customers had continued to eat out while switching alcohol for soft drinks.

The perceived impacts on drinkers were interesting, and essentially what public health advocated would have hoped for:

Participants described three groups of drinkers that were particularly affected by the limit change. First and most commonly mentioned was the ‘after-work drinker’ group, which mainly comprised of men who would have dropped in on the way home from work. Participants reported that this behaviour had declined and attributed this to a public perception that the limit had changed from a ‘two-pint limit’ to a ‘no pint limit’...

The second affected group comprised of the ‘next morning driver’. Participants had observed that these people were now finishing drinking earlier on most nights, and particularly Sundays...

The third affected group comprised of the ‘lunchtime drinker’, although these were reportedly less affected by the limit change. In food-led establishments, it was often female customers who would previously have shared a bottle of wine, or had single glasses, but who now preferred to either have a designated driver or drink only soft drinks.

Finally, businesses adapted (or, at least, those businesses that were still around three years later had adapted!):

The major change in practice was around the provision of alternatives to alcohol. While participants from drink-only venues reported that their main income still came from alcoholic drinks, others described a growing trend in customer demand for no/low-alcohol drinks, and the range and quality of these drinks on offer from manufacturers. Whereas previously only one no/low-alcohol alternative would have been sold (other than soft drinks), examples were given of no/low-alcohol ranges intended to mimic the experience of drinking alcohol. This trend was primarily for beer but also present in cider and wine.

Don't forget mocktails! I think I rarely saw a premium mocktail on a menu prior to 2014, but now they are standard fare for most pubs and bars (in New Zealand, at least). The research participants also noted incentives for designated drivers, such as free soft drinks, but like in New Zealand, those were often available before the drink-driving limit reduced. Overall, it appears that the pre-law worries about the negative impacts on bars and pubs, those worries were not borne out. In fact, Sumpter et al. find that:

Overall, despite the reservations of participants (regardless of premise type or location), there was broad acceptance of the limit change, disapproval of drink-driving, and little suggestion that the reduction should be reversed.

If even bar and pub owners and managers approve of the change afterwards, then it was clearly a positive change overall. 

*****

[*] You can read about that research here and here, or our earlier research just after the drink driving limited changed here and here (ungated version here).

[**] Incidentally, Scotland's new drink-driving limit came into force just four days after New Zealand's new limit (5 December 2014 vs. 1 December 2014).

Tuesday, 18 November 2025

In wildfires, people prefer to save people rather than endangered species

If you were an incident controller who needed to deploy firefighting resources in a wildfire, how would you decide where to distribute those resources? If there is not enough firefighting to cover all areas at once, which areas should receive priority? Saving human lives seems like it should be a priority, but what about animal lives? What about preserving biodiversity, or saving endangered species from the fire? What about built infrastructure? What about important cultural artifacts? Some of these questions may seem easy to resolve, but there are important trade-offs, and understanding those trade-offs is important.

That is where this 2024 article by John Woinarski, Stephen Garnett, and Kerstin Zander (all Charles Darwin University), published in the journal Conservation Biology (open access, with non-technical summary on The Conversation), comes in. They surveyed a sample of over 2000 Australians, asking them to repeatedly make best-worst choices among five different alternatives (of eleven total). As they explain:

...respondents are asked to state which item among a set of items they consider as best and worst... In our survey, best meant the asset the respondent most wanted to save and worst meant the asset the respondent least wanted to save.

By getting the research participants to repeat this task many times (eleven times, in fact), with different sets of items to choose from, Woinarski et al. develop a good picture of the relative ranking of each of the eleven items, both for each research participant and for the sample overall. This best-worst scaling (BWS) method is a form of non-market valuation, since it essentially works out the relative value (in terms of ranking) of the different options that research participants are presented with. [*]

The eleven options that research participants were ranking overall were:

  1. A person with a car stuck behind a fallen tree, whom you know had not received advice to evacuate;
  2. A person with car stuck behind a fallen tree, whom you know had ignored repeated advice to evacuate beforehand;
  3. A house that you know has no people in it;
  4. A farm shed with some hay bales and a tractor;
  5. A flock of 50 sheep—a few of which will be killed by fire, but survivors are likely to be badly injured;
  6. A population of 50 koalas—a few of which will be killed by fire, but survivors are likely to be badly injured;
  7. The last population of a native snail species for which the fire will kill all individuals, thereby causing the species’ extinction;
  8. The last population of a small native shrub, for which the fire will kill all plants, thereby causing the species’ extinction;
  9. One of only two populations of a rare wallaby for which the fire will kill all individuals of one of the populations (but not affect the other), thereby making it more endangered;
  10. Ancient rock art that will be destroyed if fire gets into the weeds now growing in the rock shelter; and
  11. An old tree with an ancient Aboriginal carving on the trunk.

The results are interesting, if not terribly surprising:

In terms of relative importance, saving a person who ignored evacuation advice was rated 57% as important as saving a person who had not received warnings... Saving the koala population was rated slightly lower (56% as important as saving a person who had not received warnings). Saving the wallaby population was 45% as important as saving a person who was not warned. Saving the house and shed had the lowest rankings (14% and 9%, respectively, as important as saving a person who was not warned).

For completeness, compared with saving a person who had not received warnings, saving the shrub was rated as 25% as important. Saving the sheep was rated as 26% as important, saving the snails was rated as 25% as important, saving the ancient rock art was rated as 15% as important, and saving the carved tree was rated 12% as important, respectively. Woinarski et al. bemoan that no one loves snails, but I also think the loss of the cultural artifacts would be a tragedy as well. I guess that reflects that each of us would place different weightings on things, and come out with different rankings. And that is what Woinarski et al. look at next, finding that:

Female respondents placed higher importance than male respondents on the protection of the rare wallaby population, the koala population, the sheep, and the tree carving and lower importance than male respondents on the protection of the house, shed, native shrub, and rock art... Older respondents (>65 years) rated protecting people more highly than younger respondents, but rated the tree carving less highly than younger respondents...

Respondents who self-identified as Indigenous placed a higher score on protecting the rock art and tree carvings than those identifying as non-Indigenous.

Those differences may not come as a surprise either. Now, in my ECONS102 class, when we discuss non-market valuation (specifically in the context of estimating the value of a statistical life), I point out that personal experience of the risk makes a difference. And that is true in this case as well. Woinarski et al. find that:

Survey respondents affected by wildfires and those assessing themselves as being prepared for wildfires were less likely to save a person who had not received warnings... Those who rated themselves as prepared for wildfire were also less likely to save a person who ignored warnings, whereas those who had been affected by wildfire were more likely to do so.

It is interesting to consider what the differences mean here. If a person has personal experience of wildfires, then they know how devastating they can be, and how unpredictable and fast-moving. In my mind, that should make them more likely to want to save a person who has not received warnings, but instead they are less likely. On the other hand, it does make sense that they would be more likely to save someone who ignored warnings. Woinarski et al. don't provide a good explanation for that result (although, to be fair, they are focused on the results related to conservation, rather than humans!). On the other hand, people who are well prepared being less willing to help those who ignored warnings makes some sense.

The takeaway message from this paper, though, is that people prefer to save people, rather than endangered species. Especially snails.

*****

[*] If one of the options had been monetary, Woinarski et al. could have used their results to work out the rough monetary value of each option.

Monday, 17 November 2025

Population diversity and economic growth

Population diversity has a theoretically ambiguous effect on economic growth. On the one hand, having a more diverse population makes it more difficult for people to agree on things like spending on public goods (e.g. see this post), it can open the door to policies that favour certain ethnic groups, and lead to conflict over resources and the management of public services. On the other hand, having a more diverse population brings people together with different (and complementary) skills, experiences, and ways of thinking, which can boost innovation and productivity, as well as fostering connections with different communities (and other countries), which may increase international trade and investment.

Many studies have tested the relationship between population diversity and economic growth, with little consensus. That makes the literature ripe for meta-analysis, where the results of many studies are combined in order to estimate an overall relationship. That is the approach in this new article by Andreas Sintos (University of Luxembourg), published in the Journal of Economic Surveys (open access). Sintos collates the results from 83 studies, with 1537 estimates of the relationship between some measure of population diversity and some measure of economic growth.

First, Sintos establishes that there is a small publication bias overall, with studies that find a negative relationship between diversity and growth being more likely to be published than would be expected given the overall distribution of results. Then, after adjusting for publication bias and methodological quality of the studies, he finds that:

...while ethnic and linguistic diversity demonstrates a small and statistically insignificant positive effect on economic growth, the remaining dimensions of diversity—religious, genetic, birthplace, and the residual category—demonstrate a significant positive impact on economic growth, with effect sizes spanning from moderate to large.

So, population diversity (specifically religious, genetic, and birthplace diversity, as well as a residual category that captures other forms of diversity) is positively associated with economic growth. Places that have more diversity of those types (but not places that have more ethnic or linguistic diversity) grow faster. A 'moderate to large' effect here means that each standard deviation higher diversity is associated with 0.1 to 0.4 standard deviations higher economic growth. That is not to be sneezed at.

What Sintos isn't able to do, though, is explore the mechanisms that underlie that positive relationship. So, while meta-analysis can give us an overall estimate of the relationship, it can't tell us why that relationship exists. To do that, we would need to go and look at the individual studies, particularly those that found a positive relationship between diversity and growth, and see if they explored the mechanisms.

Finally, this article made me chuckle, as it is clear that substantial portions of it were written by generative AI. No human uses the word "elucidate" 14 times in a research paper, and quantitative papers rarely refer to the "scholarly discourse". I should really have been alerted to this when the first paragraph included the LLM-ese sentence: "The significance of population diversity within the economic sphere is multifaceted". Perhaps diversity's significance is multifaceted. This article doesn't tell us that though. All it tells us is that the relationship between diversity (by some measures) and economic growth is positive. More diverse places tend to grow faster.

Sunday, 16 November 2025

Andrew Leigh on big data vs. randomised controlled trials

'Big data' has become the catchcry of many data scientists and researchers in recent years. It's also become increasingly used in economics. However, by itself the analysis of big data doesn't provide anything but big data correlations. Even when big datasets are available, there is still a place for randomised controlled trials (RCTs). That is the essence of this new article by Andrew Leigh (Parliament of Australia), published in the journal Australian Economic Review (sorry, I don't see an ungated version online).

It should come as no surprise that Leigh is pro-RCT. After all, he is the author of the book Randomistas (which I reviewed here), which was essentially a tribute to RCTs. Leigh clearly sees the rise of big data, and its increasing use as a substitute for RCTs, as a threat to good research. In the article, he takes great pains to point out instances where big data draws the wrong conclusions, compared with RCTs on the same topic. For example:

Randomised trials have demonstrated a strongly beneficial effect of statins on reducing cardiovascular mortality. Yet when they analysed a database covering the entire Danish population, researchers found that the chance of death from cardiovascular causes was one‐quarter higher among those who took statins than among those who did not. The explanation is straightforward: people who were prescribed statins were at elevated risk of having a heart attack. Yet even when researchers made statistical adjustments, using all the variables available in the database, they were unable to reproduce the well‐known finding that statins have a beneficial effect on cardiovascular mortality.

Analysis of the Danish database also suggested that the relative risk of cancer was 15% lower among patients who took statins, an effect that remained statistically significant even after controlling for other observed factors about the patients. Yet this result is at odds with the evidence from randomised trials. A meta‐analysis of randomised trials, covering more than 10,000 cases of cancer, found no effects of statins on the incidence of cancer, nor on deaths from cancer...

The observational data was doubly wrong. Observational data failed to replicate the well‐known finding that statins improve heart health. And observational data wrongly suggested that statins reduce the risk of cancer. Randomised trials, which were not biased by selection effects, provided the correct answer. 

That is only one example of many in the article. However, while Leigh is pro-RCT, he is not anti-big-data. He notes that:

Large data sets are a valuable complement to randomised trials. But big data is not a substitute for randomisation.

If we take anything away from Leigh's article, it should be that point. Big data is incredibly useful. However, it must be analysed using the tools of causal inference (of which randomised controlled trials are just one example) if we want to move beyond finding correlations. The problem with big data is compounded by a focus on statistical significance (as Ziliak and McCloskey noted in their book The Cult of Statistical Significance, which I reviewed here). Big datasets will find statistically significant correlations even when the size of the relationship is very small. That is an asset when causal methods are applied, but is very much a liability when big data are analysed without consideration of causality. RCTs are one way of disciplining our research approach in order to ensure that the effects we estimate are causal, and as Leigh notes:

While correlations in large data sets do not necessarily indicate causation, administrative data can be enormously helpful in ensuring the precision of estimates from randomised trials.

The article finishes with high-level strategies that policy makers and practitioners can use to ensure that RCTs are embedded within the analysis of public policy:

I advocate five approaches. Encourage curiosity in yourself and those you lead. Seek simple trials, especially at the outset. Ensure experiments are ethically grounded. Foster institutions that push people towards more rigorous evaluation. Collaborate internationally to share best practice and identify evidence gaps.

Those all sound like good approaches. I would add a sixth: Employ analysts with a thorough grounding in causal inference methods generally, if not RCTs specifically. We need more policy analysis that establishes causal evidence of impact.

Friday, 14 November 2025

This week in research #101

Here's what caught my eye in research over the past week:

  • Kuehn (open access) discusses the under-recognised contributions of W.E.B. Du Bois to marginalist wage theory
  • Sintos (open access) provides a meta-regression analysis of the effects of population diversity on economic growth, finding that ethnic and linguistic diversity exhibit a small and statistically insignificant positive effect on economic growth, while religious, genetic, birthplace, and other forms of diversity exert a significant positive impact on growth, with effect sizes ranging from moderate to large
  • Baltrunaite, Casarico, and Rizzica (with ungated earlier version here) study gender differences in reference letters for graduate students in economics and finance, and find that men are described more often as standout and women as grindstone, i.e., hardworking and diligent, that these differences are mainly driven by male letter writers, especially more senior ones, and that standout characteristics relate positively to subsequent career outcomes whereas grindstone characteristics relate negatively to subsequent career outcomes
  • Asquith and Mast (with ungated earlier version here) study county-level population decline in the US, and find that falling fertility has caused migration rates that used to generate growth to instead result in decline, and that only 10 percent of counties would have declined during the 2010s if fertility had remained at its initial levels

Wednesday, 12 November 2025

The wicked problem of generative AI and assessment

Schools, universities, and teachers at all levels are having to grapple with the challenges of student use of generative AI. This new article by Thomas Corbin and colleagues (all from Deakin University), published in the journal Assessment and Evaluation in Higher Education (open access), describes it as a 'wicked problem':

Wicked problems, as originally conceptualised by Rittel and Webber (1973), describe challenges that defy simple solutions... Unlike their counterpart ‘tame’ problems, which have clear definitions and measurable solutions, wicked problems lack definitive formulations, and their solutions are not true or false but rather better or worse, requiring judgment, compromise, and adaptation. This distinction is key because it disrupts the assumption that there is a ‘correct’ policy, assessment method, or institutional response waiting to be discovered. Instead, every approach carries trade-offs, is shaped by context, and must be continually reassessed in response to evolving conditions. For those tasked with navigating wicked problems, this reality has a significant personal toll; every decision feels provisional, every choice open to criticism, and the pressure to find the ‘right’ solution persists even when no such solution exists...

I'm sure that description resonates with many teachers, when they think about generative AI and assessment. Corbin et al. back up their assertion that this is a wicked problem with qualitative research, based on interviews with 20  'Unit Chairs', who are responsible for running a subject. It would have been interesting if they had interviewed lecturers as well, since they are on the front lines in dealing with students' use of generative AI, but I suspect the results would not have differed too much.

The results make for interesting reading. Corbin et al. work their way through all of the criteria that Rittel and Webber used to define a 'wicked problem' in their 1973 article (ungated version here). I don't agree with them on all criteria, so I'm going to use this post to push back on a few things. However, I think that their paper does provide some good talking points, starting with:

The first defining feature of wicked problems is that they cannot be clearly or conclusively defined. Unlike technical problems where stakeholders can in theory agree on what needs fixing, wicked problems mean different things to different people and these varying definitions pull solutions in contradictory directions. Without agreement on what the problem is, a singular, cohesive response becomes impossible.

This pretty much captures things I think:

Consider for example the frustration of the teacher who stated: ‘I’ve spent so much fucking time on developing this stuff. They’re really good as units, things that I’m proud of. Now I’m looking at what AI can do, and I’m like, what the fuck do I do? I’m really at a loss, to be honest’. (T10).

We are all just trying to find our way in the era of generative AI. But no one agrees on what should be done, or even what the problem is (see yesterday's post as one example!). Second:

The second defining characteristic of a wicked problem is that it has no stopping rule – that is, there are no clear criteria for knowing when you have reached ‘the solution’...

When asked about determining success, one teacher responded: ‘How do we actually tell? You can’t’ (T15).

I guess we just do what we can in the moment. However, all of us are looking around at what other people are trying, and constantly wondering if we can do better. I have a solution for my papers. I don't think it is the solution, and certainly it isn't a one-size-fits-all solution for every paper. It seems to be working all right for now, at least. But in coming up with a solution that has some benefits, it trades off with other things that we have to give up. And that is the third characteristic of a wicked problem:

Technical problems have correct answers that can be verified. Wicked problems, on the other hand, have only trade-offs, where every response sacrifices something valuable...

Another unit chair worried: ‘We can make assessments more AI-proof, but if we make them too rigid, we just test compliance rather than creativity’ (T3).

These types of statements illustrate how moves toward assessment security sacrifice something else, be it authenticity, creativity, or real-world relevance.

In my case, we assess knowledge and comprehension and application (which are low on Bloom's taxonomy), but by adopting in-person tests we forego the ability to authentically assess higher-level skills such as analysis and synthesis and evaluation (which, to be fair, shouldn't necessarily be assessed in a first-year paper anyway!).

On the fourth criterion, Corbin et al. note that:

...wicked problems lack clear metrics for testing whether solutions have succeeded...

Several unit chairs expressed uncertainty about whether their assessment adaptations were effective. When asked about determining success, one stated simply: ‘If a student uses AI appropriately for brainstorming, we might never know. If they use it inappropriately, we also might never know’ (T18).

Again, this one definitely depends on assessment style, and in some cases, you can tell whether your approach has succeeded. In my case, I am fairly confident that I am able to assess my students' learning in the test environment, and that the use of AI tutors is, if anything, improving that learning (more to come on that point though, as I will be reporting on the actual evaluation in the next month). And that means that I also disagree with Corbin et al's next point, which is:

The fifth characteristic of wicked problems is that solutions cannot be found through experimenting with solutions because every attempt has real consequences.

I think you still can try things, and see if they work (and if I didn't think that, then I probably wouldn't try things in the first place!). Yes, there are consequences. But there are also consequences to not experimenting with finding a solution. The era of generative AI is not going to pause so that we can just keep doing what we always have done. We have to embrace the uncertainty! And that links to the next point that Corbin et al. raise, which is that:

...wicked problems present limitless possible approaches with no way to determine if all options have been considered.

Yes, but to be fair that was probably true before generative AI as well. If there was a single silver bullet solution to teaching and learning, we all would have been doing it already. All teachers have their own pedagogical approaches, which hopefully leverage their strengths as teachers and academics, while mitigating their weaknesses. And that means that there isn't one approach that will work in all circumstances for all teachers. In fact, I adopt different approaches in different papers, given what I hope will work (and experimenting, while testing whether my approach is successful). And that links to the next point:

The appeal of standardized solutions - whether "best practice" templates or institutional mandates - assumes that similar-looking problems can be solved with similar approaches. But wicked problems resist this logic because each instance emerges from an irreducibly specific context.

Yes, but not necessarily for the reasons that Corbin et al. outline (or, not only for the reasons that they outline). As I noted above, every teacher has different strengths and weaknesses, and so what is best practice for one teacher need not be best practice for everyone else.

The next criterion is:

Wicked problems do not exist in isolation but instead emerge from and reveal deeper structural issues.

Several participants saw AI vulnerabilities as symptoms of institutional business models. One teacher argued: ‘A university like [the one in which I work], which is based on a business model, which is online-based, where you cannot incentivize students to come in person, and all the assessments are based on tasks you ask students to do at home in their own time, this model is the most vulnerable to fraud in an age of AI’ (T9).

Generative AI is not operating in a vacuum, so of course it intersects with other issues. Online assessment was already a problem before generative AI came on the scene. How quickly have we all forgotten about Chegg, the bane of online assessment during the lockdowns? Moving on:

The ninth characteristic of a wicked problem is that the way the problem is framed shapes which solutions become possible. This relies on the claim that how we define a problem constrains what kinds of responses can be imagined or pursued. In other words, how we frame the AI and assessment challenge predetermines which solutions appear reasonable and which remain invisible...

When teachers framed AI as a threat to academic integrity, they favoured control-based solutions. One stated: ‘I know I would still prefer exams to come back on campus because it would be the only piece of assessment that we can truly say this is their own work’ (T4)... Those who framed AI as a professional necessity proposed integration: ‘I think GenAI is going to stay, right? It’s already part of the workforce, like us as well. Students need to be able to use it efficiently. The part of their skills they will need to learn would be to use GenAI efficiently’ (T17).

This is definitely an issue. I know of colleagues from both ends of this spectrum. The worst part is that I have sympathy for both views (as regular readers of this blog will probably recognise)! But again, there need not be a one-size-fits-all solution here, and while AI might be a threat in some papers, it might be an integral part of the teaching and learning and assessment in another. Both of those things can be true at the same time. Finally:

The tenth characteristic of wicked problems is that decision-makers bear full responsibility for the consequences of their choices. Unlike theoretical problems where errors have no real-world impact, those addressing wicked problems are, as Rittel and Webber (1973, 167) note, ‘liable for the consequences of the solutions they generate’...

One teacher worried about graduating unprepared professionals: ‘How many are we missing? Are we in fact sending students out into the workforce who can get through an interview, but when they start doing the job, they can’t?’ (T11). The personal vulnerability this created was articulated starkly: ‘I feel very, very vulnerable within the university running assessments like this because I know that there are pockets of the university management who would really like to just see us do traditional, detached, academic assessments that don’t threaten to push students’ (T6).

As teachers, we do bear some responsibility. The problem here, and this is highlighted in the second quote above, is where the university creates an environment where teachers' ability to ensure students have met learning objectives is undermined by institutional practices. And too often, teachers are finding themselves in that position. As noted in yesterday's post, Simas Kucinskas made the point that "take-home assignments are obsolete". Our assessments need to reflect that fact, and universities shouldn't be putting teacher staff in a position where they are forced to adopt assessment practices that are no longer fit for purpose. Of course, this would still be an issue even if generative AI and assessment wasn't a wicked problem.

While I'm not convinced by all elements of Corbin et al's argument, I do agree that generative AI and assessment is a wicked problem. That doesn't mean that we should give up. There are solutions out there, but there is unlikely to be one solution that will work for all teachers and in all circumstances. We need to keep experimenting, and sharing our learnings. That is the only way that we will move forward, in ensuring that student learning is still assessed in a meaningful way.

Read more:

Tuesday, 11 November 2025

Simas Kucinskas on AI, university education, and the 'mushy middle'

Simas Kucinskas has an interesting Substack post on university education in the age of AI. His TL;DR summary of the post is:

AI now solves university assignments perfectly in minutes. Students often use LLMs as a crutch rather than as a tutor, getting answers without understanding. To address these problems, I propose a barbell strategy: pure fundamentals (no AI) on one end, full-on AI projects on the other, with no mushy middle. Universities should focus on fundamentals.

Kucinskas starts by making the point that "take-home assignments are obsolete", and that students are outsourcing too much of their learning to generative AI. I have to agree. When ChatGPT can write an essay, solve problem sets, draft reports, and answer online test questions, the options for assessment that provides a genuine evaluation of whether students have met particular learning outcomes narrow significantly. That's why, in my classes, we've moved back to predominantly in-person assessment (or oral examinations online). They're not bulletproof assessments, but they are better than the alternatives that are far more vulnerable to generative AI.

Kucinskas's solution is what he terms the "barbell strategy":

One end of the barbell: courses that are deliberately non-AI. Work through proofs by hand. Read academic papers. Write essays without AI. It’s hard, but you build mental strength.

The other end of the barbell: embrace AI fully for applied projects. Attend vibecoding hackathons. Build apps with Cursor. Use Veo to create videos. Master these tools effectively.

 Kucinskas dismisses the "mushy middle":

...where students “use AI responsibly” or instructors teach basic prompting as an afterthought. That’s the worst of both worlds. Students don’t build thinking skills, but they also don’t learn the full potential of AI.

Here, I differ with Kucinskas. I agree about the starting point. We need the basic courses that teach the fundamentals of a discipline to be designed to be AI-free, at least in terms of the assessment (AI can still be a useful learning tool, such as the AI tutors in my papers). And I agree with Kucinskas about the end point. We need students to be embracing AI fully for applied projects by the end of their degree. Where we differ is how we get students from the starting point to the end point, and I prefer a much more scaffolded approach (as I outlined briefly in this post).

The problem is that Kucinskas has a rosy view of how self-directed students will be in learning how best to use generative AI. Highly self-directed (and/or tech-savvy) students will be fine without any direction from universities or lecturing staff. They will spend the time and effort to figure it out themselves, and will excel because of the learning that they engage in along the way. Those are the students that Kucinskas is thinking about. However, not all students are like that. Some (perhaps many) won't know what they are doing, may fail more than they succeed, and eventually try to wholesale outsource the applied projects to AI. This is exactly what Kucinskas is worried about for university education. His approach doubles down on what is happening already, for students who are least self-directed.

Students who are less self-directed by definition require a more directive approach from lecturing staff. These students need to be scaffolded through the process of recognising the value of generative AI, learning to use generative AI within a narrowly-scoped set of activities, and gradually building their skills with prompting and learning from each other and from the generative AI, before being let loose on the applied projects that are the end-point of the learning journey.

So, there is definitely a role for the 'mushy middle' in university education. However, by making it more directive we can hopefully reduce the degree of mushiness.

[HT: Marginal Revolution]

Read more:

Sunday, 9 November 2025

Book review: In This Economy

Kyla Scanlon rose to some prominence during and after the pandemic, through her short explanatory videos about the economy, money, and finance. She may not have been the first, but certainly is one of the most prominent members of the #EconTok community on TikTok (as well as being active on other social media as well). Certainly, she has developed a large following, particularly among younger people. So, I was really interested to read her 2024 book, In This Economy.

I have to say that I was quite disappointed though. On the plus side, Scanlon plays to her strengths, and the early parts of the book are strong on exploring the role of vibes on the economy (Scanlon coined the term 'vibecession', to mean "a period of temporary vibe decline during which economic data such as trade and industrial activity are okay-ish"). Those chapters are generally good (although see my later comments). However, significant parts of chapters are less explainers about "how money and markets really work", which is the subtitle for the book, and more a commentary on current US policy on housing, immigration, clean energy, and the like. This is not just apparent in the final chapter, which is supposed to be more policy focused. The parts of the book where Scanlon held forth on her views were far less compelling to me, because the role of vibes was largely forgotten. It would have been more interesting to know how vibes may play a role in housing policy, or immigration policy, and whether a change in vibes might change policy. The book could have been tightened up significantly, and made an interesting contribution that other authors are less well equipped to make.

What put me off most though, were the inaccuracies in the book. The worst offence (to a New Zealand economist) was this, about inflation targeting:

That's because the 2 percent figure is sort of random. The idea originally came from Arthur Grimes, the Labour Party finance minster [sic] of New Zealand in the 1980s. He went on TV and said, "Two percent should be our inflation target," and now everybody goes after that magic number.

Arthur Grimes was never an MP, let alone finance minister (I checked this with him!). Scanlon might owe Arthur an apology for confusing him with Roger Douglas. One of my colleagues ventured that perhaps ChatGPT wrote those sentences. It is the sort of hallucination we might expect from an LLM, but who knows if that was the source. Sadly, it is indicative of the inaccuracies in the book. Consider this one:

In one example of the extremity of market moves, the yield on thirty-year U.K. inflation-linked bonds jumped by more than 250% (meaning that they fell 250% in price) after the Bank made the announcement that it was not going to intervene.

If something falls in price by more than 100 percent, that means that the seller pays the buyer to buy it from them. The correct figure here should be 60 percent I think, not 250 percent. Similarly:

So when news headlines say, "Inflation Rate Falls to 3 Percent," that doesn't mean that prices fell three percent; it just means that the rate of change of price increases fell three percent.

No, it means that the rate of change of prices fell to three percent (from whatever it was before). There is unfortunately a lot of this sort of lack of attention to detail. At one point, Scanlon provides an estimate of GDP for the 'Gingerbread Yeti economy', then converts it to 'real nominal GDP' by dividing by one plus the current year's inflation rate. First, there's no such thing as 'real nominal GDP'. There is 'nominal GDP' and there is 'real GDP'. And second, the calculation does provide a measure of real GDP, measured in terms of dollars from the year before. However, the calculation that is presented gives the impression that dividing by one plus the current year's inflation rate is the standard way of calculating real GDP. It isn't. It's not just the current year's inflation that matters in calculating real GDP, but the inflation in every year between the current year and the base year. The base year matters, and the base year is not always the year before the current year.

Despite my grumpiness, there are some good aspects to the book. Scanlon does have a good way with words that I think connects with younger people (and that much is clear from her success on social media). She also provides some interesting examples to illustrate her explanations, such as the 'economics kingdom' (which illustrates how parts of the economy are related), the 'cake of uncertainty' (which relates expectations, theory, and reality), and the aforementioned 'Gingerbread Yeti economy'. Scanlon also refers to a lot of memes, probably many more than I would recognise. And yet I found the explanation of how 'meme stocks' worked to be a bit underdone.

Sadly, I don't think I can recommend this book, even to my younger students who might connect with the contemporary material more than they would with earlier pop economics books. There are simply too many bits where I worry that the book would steer them wrong. Normally, I find that Tyler Cowen makes excellent book recommendations. In this case, I'm really not seeing whatever he saw in this one.

Saturday, 8 November 2025

Survey evidence on the labour market impacts of generative AI

A picture of the labour market impacts of generative AI is slowly emerging. At this stage, there is little consensus on what the impacts will be. I just stumbled across this working paper, by Jonathan Hartley (Stanford University) and co-authors, which I had put aside to read earlier this year. Unlike some of the research I have discussed in recent posts (linked at the end of this post), Hartley et al. make use of a nationally representative survey of US workers.

The survey has had three waves in the US (plus one Canadian wave), and the first US wave had over 4200 respondents (Hartley et al. don't report how many respondents there were for the other waves). The results make for interesting reading. First, in terms of who is using generative AI, they report that:

...LLM adoption at work among U.S. survey respondents above 18 has increased rapidly from 30.1% as of December 2024, to 43.2% as of March/April 2025, and to 45.9% as of June/July 2025...

Conditional on using Generative AI at work, about 33% of workers use Generative AI five days per week at work (every weekday). Roughly 12% of Generative AI users use such tools at work only 1 day at work. About 17% and 18% of Generative AI users use Generative AI tools at work two and three days per week respectively...

That is a lot of people using generative AI for work, and using it often when they do. It is interesting to sit these results alongside those of Chatterji et al. (whose paper I discussed in this post). They found growth in both work-related and non-work-related ChatGPT messages over time.

Who is using generative AI at work, though? Hartley et al. find that:

...Generative AI tools like large language models (LLMs) are most commonly used in the labor force by younger individuals, more highly educated individuals, higher income individuals, and those in particular industries such as customer service, marketing and information technology.

These results are similar to those of Chatterji et al., except that Hartley et al. also report gender differences (with greater use of generative AI by men), whereas Chatterji et al. report that the gender gap that was apparent among early adopters of ChatGPT has closed completely.

Hartley et al. then move on to estimating the productivity gains from generative AI. Given that this is survey-based, and not observational or experimental, we should take these results with a very large grain of salt. Hartley et al. ask their respondents how long it takes then to complete various tasks with and without generative AI. The results are summarised in Figure 12 in the paper:

Notice that every task is reported to take less time with generative AI (the green dots) than without (the blue dots). The productivity gains are different for different tasks. However, I find this figure and the data to be very fishy. How could generative AI create a huge decrease in time on 'Persuasion' tasks? Or 'Repairing' (which has one of the biggest productivity gains). Also, notice how almost every task takes between 25 and 39 minutes with generative AI. I strongly suspect that the research participants are anchoring their responses to this question on 30 minutes with GenAI for some reason. Without seeing the particular questions that are being asked though, it is hard to tell why. [*]

Hartley et al. then try to estimate the impact of generative AI on job postings, employment, and wages, using a difference-in-differences research design. They find no impact on job postings or employment, but significant impacts on wages. However, here things get strange. The coefficients that they report in Tables 6 and 7 of the paper are clearly negative, and yet Hartley et al. write that:

Our estimated coefficients... imply economically meaningful wage effects: a one-standard deviation increase in occupational Generative AI exposure corresponds to a significant increase in median annual wages...

Going back to their regression equations, their 'exposure to generative AI variable' is more positive when exposure is high, so a negative coefficient should imply that more exposure to generative AI is associated with lower wages. I must be missing something?

Given the deficiencies in the data and the regression modelling, I don't think that this paper really adds much to our understanding of the labour market effects of generative AI. Which is disappointing, because survey-based evidence would provide us with a complementary data source that would help us to triangulate with the results from other data sources and methods.

[HT: Marginal Revolution]

*****

[*] On a slightly more technical note, we might expect there to be as much variation (in relative terms) in the 'with GenAI' data as in the 'without GenAI' data. However, the coefficient of variation (the standard deviation expressed as a percentage of the mean) is 0.109 for the 'with GenAI' data, but 0.226 for the 'without GenAI' data. So, there is less than half the variation in the reported task times with GenAI than without. Again, that suggests that this data is fishy.

Read more:

  • ChatGPT and the labour market
  • More on ChatGPT and the labour market
  • The impact of generative AI on contact centre work
  • Some good news for human accountants in the face of generative AI
  • Good news, bad news, and students' views about the impact of ChatGPT on their labour market outcomes
  • Swiss workers are worried about the risk of automation
  • How people use ChatGPT, for work and not
  • Generative AI and entry-level employment
  • Friday, 7 November 2025

    This week in research #100

    Here's what caught my eye in research over the past week:

    • Barr and Castleman (with ungated earlier version here) demonstrate that intensive advising during high school and college significantly increases bachelor’s degree attainment among lower-income students, primarily driven by improvements in initial enrolment quality (enrolling in a four-year programme rather than a two-year programme)
    • Nyarko and Pozen (open access) find that joining Twitter increases citation counts by an average of 22% per year and improves article placements by up to 10 ranks for law professors, relative to a synthetic control group
    • Matusiewicz finds, using data on European countries, that while GDP per capita correlates positively with the human development index and income equality, it does not guarantee higher life satisfaction
    • Branilović and Rutar (open access) find, using data from 2011 to 2022, that increases in both neoliberalism and globalization are associated with increases in democracy, and that it is freedom of international trade, modesty of regulation, legal system and property rights, and social globalisation that drive the relationship at the aggregate level

    Thursday, 6 November 2025

    The economics of maps

    I have always liked maps. When I was growing up, one of my favourite books was my Rand McNally atlas. I may even still have it, tucked away with its spine held together by masking tape (after years of overuse by my primary-school-aged self). When I'm reading some fantasy novel that has a map on the inside cover, I can find myself lost in the map before even getting to read the book, and then flicking back to the map any time some new location is mentioned. Right next to my laptop while I'm writing this is a sepia-toned desk globe than, in truth, takes up too much space on the desk but will not be foregone.

    Given my interest in maps, I've been planning to read this 2020 article by Abhishek Nagaraj (University of California at Berkeley) and Scott Stern (MIT), published in the Journal of Economic Perspectives (open access), for some time (like many articles that have sat in my digital to-be-read pile for a long time). Nagaraj and Stern explain the economics of maps. This isn't the economics that uses maps, such as in the field of economic geography, but two other aspects. First, they review the economic and social consequences of maps. Second, they review the economics of mapmaking. Most of the article is devoted to the latter, and that's what I want to focus on as well.

    First though, what is a map? In my classes, I use maps as an example of a model - an abstraction or simplification of reality. Nagaraj and Stern note that maps are composed of two elements: (1) spatial data; and (2) a design. As they explain:

    At its core, a map takes selected attributes attached to a specific positional indicator (spatial data) and pairs it with a graphical illustration or visualization (design)...

    Having separated a map into its constituent elements, Nagaraj and Stern then look at the economics of spatial data, and the economics of design. On data, they note that:

    ...mapping data is in many respects a classical public good. Almost by definition, mapping data is non-rival insofar as the use of data for a map by any one person does not preclude its use by others; moreover, the information underlying a given database is non-excludable because copyright law does not protect the copying of factual information. While the precise expression included within a database can be protected through copyright, the underlying geographical facts reflected in the database cannot be protected.

    And just like most other public goods:

    The combination of non-rivalry and non-excludability of mapping data makes its production prone to private underinvestment, providing a rationale for government support. Indeed, many of the most widely used maps rely on publicly funded geospatial data, including US Geological Survey topographical maps, Census demographic information, and local land-use and zoning maps.

    On the other hand:

    ...there are important cases where mapping data is in fact excludable, either through secrecy or contract... Mapping data that allows for excludability exhibits properties more akin to a club good than a traditional public good. Specifically, the significant fixed costs of data collection combined with relatively cheap reproducibility creates entry barriers that supports natural monopolies or oligopolistic competition. It may be efficient for only a single firm to engage in data collection and for the industry to simply license these data (under agreed-upon contractual terms) from this monopoly provider.

    Now, even when spatial data is protected and excludable:

    ...in the absence of perfect price discrimination, private entities may only provide mapping data at a high price (relative to near-zero marginal cost), reducing efficient access. Beyond pricing, the private provision of mapping data may additionally be concentrated in locations with high demand (such as urban areas) to the exclusion of less concentrated regions.

    And that all accords with what we see. There are free sources of spatial data, which are public goods supported by governments or universities, alongside proprietary spatial databases that are club goods and only available at relatively high cost (to the dismay of researchers such as me!).

    Turning to map designs, Nagaraj and Stern note that:

    Like data, designs are also a knowledge good in that multiple individuals can use a particular map design (and so a design is non-rival) and the degree of excludability for a given design may vary with the institutional and intellectual property environment. With that said, a striking feature of a map design is that, almost by construction, a map is created for the purpose of visual inspection, and it is much easier to copy than a database (which might be protected by secrecy or contract). One consequence of this is that there may be underinvestment in high-quality and distinct designs for a given body of geospatial data.

    They use this to explain why there is a lot of competition in the provision of map designs, which is why so many maps for particular purposes look the same. As Nagaraj and Stern explain:

    A potential consequence of the non-excludability of mapping data and designs is inefficient overproduction of mapping products that compete with each other. Once a given map is produced for a particular location and application (say, a city-level tourist map), copycat maps can be produced at a lower sunk cost; because demand for maps of a given quality and granularity is largely fixed, free entry based on a given map involves significant business-stealing...

    Taking both spatial data and map designs together, the role of intellectual property protection is important:

    On the one hand, an absence of formal intellectual property protection leads to underinvestment in mapping data and high-quality map design, but inefficient entry by copycat mapmakers. On the other hand, a high level of formal intellectual property protection can shift the basis of competition away from imitation and towards duplicative investment. For example, over the past two decades, no less than four different organizations—including Google Street View, Microsoft StreetSide, OpenStreetCam project, and TomTom—have undertaken comprehensive and qualitatively similar initiatives to gather street-level imagery and mapping coordinates for the entire US surface road system.

    So that explains why there are multiple Street View clones available. The firms are over-investing in goods that are protected by intellectual property. Do we really need multiple copycats of Google Street View? Also, in terms of intellectual property protection, I found this interesting:

    In addition to employing copyright, firms often invest in additional strategies to protect their intellectual property. In particular, mapmakers have devised the idea of inserting fictional “paper towns” or “trap streets” in maps... This strategy allows them to detect rivals who might copy their data (rather than collecting similar data through an original survey) and thereby protect costly investment in original data collection. Such strategies are commonly deployed by mapmakers to this day for factual data...

    Does that help to explain why people have been caught out following roads that don't exist, or trying to find towns that are misplaced? I guess that 'trap streets' or 'paper towns' are a good idea on a paper map, which requires a certain amount of attention to follow, but less suitable for digital maps that people follow blindly.

    Nagaraj and Stern's article opens our eyes to the economics of maps, as well as their consequences. And now, I'm going to search my garage for my beloved Rand McNally atlas. If only I had a map to guide me as to where it is hiding!

    Tuesday, 4 November 2025

    Generative AI and seniority-biased technological change

    Skills-biased technological change occurs when technology increases the productivity, and hence the value created, by workers with higher skills, compared to those with lower skills. One of the canonical examples is computers, which have made skilled white-collar workers more productive, but automated away the jobs of some lower-skilled workers.

    As noted in yesterday's post, something similar may be happening with AI. In this case though, generative AI is making more experienced (senior) workers more productive, while at the same time automating away the jobs of entry-level (junior) workers. Think of this as seniority-biased technological change. At least, that's what this new working paper by Seyed Hosseini and Guy Lichtinger (both Harvard University) calls it. Their explanation goes like this:

    In many such [high-skill, white-collar] jobs, workers begin at the bottom of the career ladder performing intellectually mundane tasks, i.e., routine yet cognitively demanding activities such as debugging code or reviewing legal documents, which are likely to be especially exposed to recent advances in GenAI. As these workers gain experience, they typically move up the career ladder to more senior roles that involve more complex problem-solving or managerial responsibilities... If GenAI disproportionately substitutes for entry-level tasks, the lower rungs of these career ladders may be eroding...

    Hosseini and Lichtinger use data from Revelio Labs, which is drawn from public LinkedIn profiles. Importantly:

    A key feature of the dataset is the standardized seniority level variable for each position, constructed by Revelio through an ensemble modeling approach based on multiple sources of information.

    Hosseini and Lichtinger group the standardised positions into juniors (Entry and Junior levels) and seniors (Associate and above). They also use data on job postings from Revelio. The resulting dataset is huge, and:

    ...covers 284,974 firms that were successfully matched to both employee position data and job postings, and that were actively hiring between January 2021 and March 2025... For these firms, we observe 156,765,776 positions dating back to 2015 and 198,773,384 job postings since 2021...

    The raw data shows a pattern that is very similar to the pattern from the Brynjolfsson et al. paper I discussed yesterday. Figure 1 from the Hosseini and Lichtinger paper charts the change in employment compared with January 2015, for juniors and seniors (and overall):

    Notice that junior and senior employment follow similar trends until 2020, at which point they diverge, with senior employment continuing to grow, while junior employment does not (and even starts to decline after 2023). Brynjolfsson et al. showed that entry-level employment started declining from 2022, so these results are similar (although the point of departure is somewhat different).

    Brynjolfsson et al. weren't able to show definitively that generative AI was the cause of the divergence (although they were able to eliminate general trends in their robustness checks). Hosseini and Lichtinger use the raw description data for each job posting, and identify "GenAI integrator" positions - "those reflecting an active attempt to recruit workers tasked with adopting or implementing GenAI in the firm’s workflows". They then:

    ...define a firm as a GenAI adopter if it has posted at least one GenAI integrator vacancy. By this criterion, 10,599 firms qualify as adopters. While they make up only 3.72 percent of the 284,974 firms in our sample, adopters are disproportionately large... and account for 17.3 percent of the employment (positions) in our dataset.

    Hosseini and Lichtinger then compare employment changes between GenAI adopter firms and 'non-adopters', between the period before the first quarter of 2023 and the period after, in a 'difference-in-differences' analysis. They find that:

    ...junior employment in adopting firms fell by 7.7 percent relative to controls six quarters after the diffusion of generative AI. By contrast, coefficients for senior workers show a persistent upward trajectory throughout the sample, suggesting that adopting firms expanded senior employment more strongly than non-adopters over the last decade.

    Hosseini and Lichtinger then extend their analysis to a triple-difference-in-differences analysis, comparing the difference in employment between juniors and seniors, in GenAI adopter and non-adopter firms, before and after the first quarter of 2023. In this more strenuous analysis, they find that:

    Aside from a brief dip in early 2021, the coefficients are essentially flat through 2022Q4. Starting in 2023Q1, however, the coefficients decline sharply, reaching roughly a 10 percent drop after six quarters.

    That means that juniors in GenAI adopting firms had a 10 percent greater decrease in employment relative to seniors than juniors in non-adopter firms, between the time before and after the first quarter of 2023 (phew!). This is strong evidence in favour of seniority-biased technological change arising from the adoption of generative AI.

    Hosseini and Lichtinger go on to show similar effects using an event study research design, and similar effects comparing juniors in occupations that are more exposed to generative AI (compared with those in less exposed occupations). The latter is similar in nature to the results of Brynjolfsson et al.

    Hosseini and Lichtinger then look at whether the change arises from a decrease in hiring of junior employees, an increase in job separation, or a change in promotion, finding that:

    ...that the sharp contraction in junior employment among adopters is driven primarily by a slowdown in hiring, rather than by increased exits. Specifically, the coefficient on Hiring implies that, relative to non-adopters, GenAI-adopting firms hired on average 5.0 fewer junior workers per quarter after 2023Q1... For senior employees, by contrast, hiring shows little change, while separations rise modestly, leading to a small net decline in senior headcount.

    So, again the news is not good for new graduates moving into the workforce. Firms that have adopted generative AI are employing fewer junior employees, and that's because they are hiring fewer junior employees. The effects for new graduates are somewhat heterogeneous though, as Hosseini and Lichtinger also find that:

    Juniors from tier-3 and tier-4 universities experienced the steepest relative declines in employment, while juniors from tiers 1, 2 and 5 also saw reductions, but of smaller magnitude.

    'Tier-3 universities' are "strong national or regional institutions", while 'tier-4 universities' are "lower-tier but standard institutions". It's easy to see why they might be more affected than 'tier-1 universities' (the Ivy League and elite universities), and 'tier-2 universities' (highly respected international institutions). Signals still matter in education. However, it is hard to see why "tier-5 universities", which are "weak or diploma-mill-type institutions" are less affected. Perhaps students from the lowest quality universities select into occupations that are less likely to be affected by generative AI? Hosseini and Lichtinger don't control for the specific occupation in that analysis, but that might provide an answer.

    Unlike Brynjolfsson et al., Hosseini and Lichtinger don't try to end their paper on a positive note. Instead, they conclude that:

    GenAI adoption appears to shift work away from entry-level tasks, narrowing the bottom rungs of internal career ladders. Because early-career jobs are central to lifetime wage growth and mobility, such shifts may have lasting consequences for inequality and the college wage premium. Taken together, our evidence suggests that GenAI diffusion constitutes a form of seniority-biased technological change, with far-reaching implications for how careers begin, how firms cultivate talent, and how the gains from new technologies are distributed.

    I prefer the more upbeat conclusion of Brynjolfsson et al., which is that the labour market will eventually adjust, and the workers who are disadvantaged now, will end up redeployed into other jobs that open up as a result of generative AI. I guess we will find out as this technological change plays out in real time!

    [HT: Marginal Revolution]

    Read more:

    • ChatGPT and the labour market
    • More on ChatGPT and the labour market
    • The impact of generative AI on contact centre work
    • Some good news for human accountants in the face of generative AI
    • Good news, bad news, and students' views about the impact of ChatGPT on their labour market outcomes
    • Swiss workers are worried about the risk of automation
    • Generative AI and entry-level employment