Friday, 5 December 2025

This week in research #104

This week I hosted the ANZRSAI (Australia New Zealand section of the Regional Science Association international) conference in Hamilton. Hosting a conference keeps you pretty busy, but I still managed to attend some sessions, and here are some of the highlights I found from the conference:

  • Bruce Newbold kicked off the conference with an excellent keynote on the mobility of older people in Canada, and I was surprised how many older people move from other provinces to Alberta, and how few move to the Atlantic provinces
  • Bill Cochrane presented on residential segregation by occupation (a proxy for socioeconomic status or class), showing that the population became more segregated between 2013 and 2018, but not between 2018 and 2023 (and Hamilton was an outlier on various aspects of the analysis, possibly because the satellite towns of Te Awamutu and Cambridge were not included)
  • Michelle Thompson-Fawcett's keynote showed how urban design can incorporate Mātauranga Māori, and in my view this talk pointed to one way of considering Indigenous regional science and urban planning
  • Robert Tanton showed how he created a synthetic population equivalent to the Australian Census, to explore access by older people to doctors (but the use cases for a synthetic census population are far wider than that)
  • Iain White closed the conference with an excellent keynote on urban growth and climate change resilience, drawing on many of his previous research projects

Aside from the conference, here's what caught my eye in research over the past week:

  • Smith and Grimes (open access) explore the impact of income measurement issues on the estimated relationship between income and life satisfaction
  • Prince and Wallsten (with ungated earlier version here) find that there is not a strong preference for data to be stored locally, except for data types where privacy is already of high value, such as financial and biometric data, and home address and phone number
  • Berens, Henao, and Schneider (with ungated earlier version here) find that abolishing moderate tuition fees in Germany led students to reduce their academic effort, by postponing graduation and withdrawing from registered exams, and that the number of 'ghost students' increased
  • Hua and Humphreys find that new players whose careers started at the time of the cancelled 2004-05 NHL season experienced shorter careers than those not exposed (including European players)
  • Banerjee et al. (open access) conduct an experiment on a major international online freelancing platform, and find that, while both men and women prefer flexible work hours, the elasticity of response for women is twice that for men
  • Guelmamen, Garcia, and Mayol (open access) find that while inter-municipal cooperation in water supply in France is associated with higher water prices, these increased tariffs are offset by better network performance, as indicated by lower water loss indices and improved water quality (seems important given the trajectory of change in New Zealand right now)
  • Palacios-Huerta (with ungated earlier version here) reviews the beauty that is using sports as a setting for testing models and hypotheses

Thursday, 4 December 2025

The decline in high school economics in Australia, and what should be done about it

Since I became a lecturer in economics some twenty years ago, one thing that has become apparent has been the decline in the number of students studying economics at high school. When I taught my first classes, nearly half of students (most of which were management students, or social science students) had taken at least one economics class at high school. Now, it is down to less than one quarter. One of the contributing factors to that decline is that many schools replaced economics as a subject with business studies.

So, I was interested to read this new article by Tanya Livermore and Mike Major (both Reserve Bank of Australia), published in the journal Australian Economic Papers (ungated earlier version here). They find similar results for Australia, which are summarised in Figure 1 of the paper:

The figure shows economics enrolments in Year 12 (the final year of high school in Australia) declining by around two-thirds between the mid-1990s and 2023. At the same time, the gender split in economics enrolments has grown, with the male share of economics high school enrolments increasing from near-parity to over two-thirds by 2023. Neither trend is a good thing.

Livermore and Major look to understand why these trends have occurred. First, they refer to qualitative evidence collected from educators, which concludes that a number of factors are to blame:

First, too few educators are equipped to teach Economics and too little relevant Australian economics content is available, providing school leaders with limited incentive to offer (or promote) the subject. Second, it has been reported that many students do not select Economics because they do not understand what it is and how it might be relevant to them... Third, the introduction of Business Studies to the NSW Higher School Certificate (HSC) in the early 1990s saw a large number of students take up the subject instead of Economics, with reports that Business Studies, which is more vocationally oriented, is perceived as being easier to learn and more helpful for employment...

If Livermore and Major had written the same sentences in relation to New Zealand, I expect they would be equally valid. To further explore the factors associated with taking (or not taking) economics at high school, Livermore and Major then rely on a survey of students in NSW in Years 10 to 12. The survey was undertaken at 51 schools in 2019, and the sample size is over 4600 students. Of the Years 11-12 students, less than 10% were studying any economics. Livermore and Major also look at the school-level factors that are associated with whether or not a school offers economics (and for this, they have a sample of 768 schools). In this school-level analysis, they find that:

...schools are significantly more likely to teach Economics if they have a higher ICSEA score, a larger Year 12 cohort, teach a larger variety of subjects, or are all boys.

Again, these results would not come as a surprise if they were described as being for New Zealand schools. Larger schools, and those that teach a larger variety of subjects, will be more likely to offer economics. ICSEA is the Index of Community Socio-Educational Advantage, so more advantaged schools are more likely to offer economics. And single-sex boys' schools are more likely to offer economics (which raises the question of what girls' schools are offering instead of economics). Turning to the proportion of students who study economics (and accounting for the fact that not all schools offer economics), Livermore and Major find that:

...being at a school with a higher ICSEA score is associated with increased demand for Economics amongst students... Non-government schools experience lower demand for Economics relative to government schools, holding ICSEA and other characteristics constant. Relative to co-ed schools, all-boys schools are associated with greater student demand for Economics, and all-girls schools are associated with less.

I would have thought that non-government schools (most of which are likely to be religious schools) would have been more likely to offer economics. At least, that is my impression of New Zealand schools, but that might be a key difference with NSW schools. Anyway, turning to individual students' subject choice, Livermore and Major find that:

...males are more likely to choose Economics than females, even when controlling for school characteristics...

ICSEA is also significantly associated with taking economics at the student level, with students from more advantaged schools more likely to study economics. Turning to the survey results, Livermore and Major first identify the positive and negative perceptions that students have about economics, finding that:

What positive perceptions do students have about Economics? Students typically believe that economics can be used for social good, is not all about money, and that an economics degree leads to a wide range of career options...

What negative perceptions do students have about Economics? Students generally do not perceive Economics as interesting and have little desire to know more about it. Economics is perceived as having a heavier workload than most other Year 11 and 12 subjects. Although Economics is seen as providing skills and tools for everyday life, students generally indicated they prefer to study Business Studies because they think it will be more useful for their future and more interesting... Although students perceive an economics degree to lead to a wide range of career opportunities, students are less likely to have a clear understanding of Economics (the subject) or the careers available if they were to choose Economics (as a subject).

The sheer weight of perceptions is clearly to the negative side, and that is a worry. Does it explain the gender difference in enrolments? Livermore and Major find that:

...females were less likely than males to ‘have a good understanding of what Economics is’, ‘find Economics interesting as a subject’ or ‘want to know more about Economics’... Females were also more likely than males to perceive Business Studies as easier, more useful and more interesting than Economics. In terms of career development, females were less likely to have clear or positive perceptions of career opportunities from studying economics.

Economics at high school clearly has an image problem. I am convinced this is true in New Zealand as well, based on the number of students in my first-year economics classes who express how much different economics is than what they thought it was going to be (I take that as a compliment!). If we want to increase student enrolments in economics, and narrow the gender gap, then the image problem needs to be addressed. Livermore and Major conclude that:

...one possible intervention to address diversity deficits in Economics is to improve students' understanding of what Economics entails.

Yes, but how? Sadly, Livermore and Major have pointed out the problem, identified the source, but not offered much in the way of solutions. Unfortunately, it is not as simple as providing students with information (see this post, and the links at the end of that post). We may need to look at fundamentally changing the way that economics is taught at high school. Some of us have made substantial changes to the teaching of university economics, and I believe it has had a positive effect (at least, there is evidence of gender parity at the top of our introductory economics classes, and in economics majors at Waikato). Those changes (or related changes) to teaching need to be allowed to propagate down to high school. And that is especially important given that we are finding that studying high school economics causally improves students' performance in introductory economics at university (I'll have much more to say on that research, with one of my Masters students, in a future post).

If the NSW evidence travels to New Zealand (and my classes suggest it does), then economics doesn’t have a demand problem so much as an information and curriculum design problem. High school students are choosing business studies instead of economics because business studies looks clearer, closer to jobs, and less risky. Fixing economics doesn’t mean dumbing the subject down. It means teaching the really interesting stuff earlier and better. Ditch the abstract mathematics, and focus on real-world applications, especially those with a more social focus. That's what we have done in introductory economics at Waikato. In my experience, that approach keeps the students, and the teachers, more engaged, and will hopefully allow economics to regain some of its lost ground.

Wednesday, 3 December 2025

The lifespan benefit of being elected to the MLB Hall of Fame

There is a clear difference in life expectancy between the rich and the poor (see this post, for example). However, disentangling how much of the life expectancy differential is a causal effect of socioeconomic status on mortality is difficult, because there are so many things that affect both socioeconomic status and mortality. This recent article by Chengyuan Hua and Brad Humphreys (both West Virginia University), published in the journal Economics Letters (sorry, I don't see an ungated version online), takes an interesting approach to answering the question.

Hua and Humphreys look at lifespan of professional baseball players, comparing those that have been elected to the MLB Hall of Fame with those who narrowly missed out on election. The idea is that election to the Hall of Fame increases socioeconomic status, and so comparing those who were elected and those who were not but are otherwise similar, means that the difference attributable just to the change in socioeconomic status can be identified.

In relation to election to the Hall of Fame, Hua and Humphreys note that:

Baseball players elected to the HoF must appear on 75% of the annual ballots cast, get removed from the ballot after appearing on fewer than 5% of ballots, and can only appear on a limited number of consecutive ballots...

The exogenous 75% election threshold permits a fuzzy regression discontinuity design (RDD) to identify the causal effect of HoF election on longevity.

Their dataset:

...includes the universe of candidates eligible for HoF induction from 1936 to 2024. We divide the sample into two groups: a treatment group of 131 players voted into the HoF while alive and a control group of 1067 players nominated by the BBWAA but not inducted.

Comparing the two groups, Hua and Humphreys find that:

...HoF members live 1.97 years longer than HoF nominees.

Hua and Humphreys go on to look at possible mechanisms that might explain the lifespan benefit of Hall of Fame election. They find that:

...HoFers are 5.8 p.p. more likely to become an MLB manager... MLB managers lived 2.86 years longer than their counterparts. We interpret this as evidence that HoFers are more likely to become MLB managers, a high-paying occupation.

In other words, Hua and Humphreys argue that the mechanism is that higher socioeconomic status leads to a better paying occupation, which in turn leads to longer lifespan. Of course, it could be more likely that healthier players are more likely to become managers, so the RDD approach isn't as clean in terms of identifying the mechanism. Nevertheless, it is plausible.

Now, what these results tell us more broadly about socioeconomic status and lifespan is unclear. Baseball players are very different from the general population. The sample here is both unusually affluent and unusually healthy, before we even consider the effect of raising their socioeconomic status. At best, these results tell us something about groups at a similar prior level of affluence and health.

Nevertheless, the implications for professional baseball players are clear. It's Hall of Fame or bust (two years earlier)!

Friday, 28 November 2025

This week in research #103

Here's what caught my eye in research over the past week (clearly a very quiet week!):

  • Jurkat, Klump, and Schneider (with ungated earlier version here) report on a meta-analysis of 55 papers containing 2,468 estimates of the impact of industrial robots on wages, finding that the overall effect is close to zero and statistically insignificant
  • Chekenya and Dzingirai find, using African data from 1997 to 2014, that migration significantly increases conflict incidence, with effects concentrated in countries and regions in Africa with weak governance and economic stress
  • Cafferata, Dominguez, and Scartascini (with ungated earlier version here) find that overconfident individuals (in the US and Latin America) are more willing to accept the use of guns and more likely to declare their willingness to use guns
  • Bucher-Koenen et al. (with ungated earlier version here) find that financial advisors in Germany offer more self-serving advice to women, while men are more likely to receive sales fee rebates and less likely to be recommended expensive in-house multi-asset funds

And the latest paper from my own research (or, more accurately, from the thesis research of my successful PhD student Jayani Wijesinghe, on which I am a co-author along with Susan Olivia and Les Oxley):

  • Our new article (online early version, open access) in the journal Economics and Human Biology describes the patterns of lifespan inequality at the state level in the United States between 1959 and 2018, and identifies the state-level demographic and socioeconomic factors that are associated with lifespan inequality

Wednesday, 26 November 2025

Shots fired at the end of a debate on contingent valuation

I have written a number of posts about debates on the contingent valuation method (most recently here, but see the links at the end of this post for more). A 2016 debate that I blogged about here, was picked up again in 2020 (but I didn't blog about it then because I was kind of busy trying to manage the COVID lockdown-online teaching debacle). So, what happened? The first of two 2020 articles published in the journal Ecological Economics (sorry I don't see an ungated version online) is by John Whitehead (Appalachian State University), a serial participant in contingent valuation debates.

This part of the debate centres on 'adding up tests', which essentially test for scope problems. To reiterate (from this post):

Scope problems arise when you think about a good that is made up of component parts. If you ask people how much they are willing to pay for Good A and how much they are willing to pay for Good B, the sum of those two WTP values often turns out to be much more than what people would tell you they are willing to pay for Good A and Good B together. This issue is one I encountered early in my research career, in joint work with Ian Bateman and Andreas Tsoumas (ungated earlier version here).

An 'adding up test' tests for whether the willingness to pay for the global good (Good A and Good B together) is more than adding the willingness to pay for Good A alone to the willingness-to-pay for Good B alone. In relation to this particular debate, Whitehead summarises where we are up to:

Desvousges et al. (2012) reinterpret the two-scenario scope test in Chapman et al. (2009) as a three-scenario adding-up test. They then assert that the implicit third willingness-to-pay estimate is not of adequate size. Whitehead (2016) critiques the notion of the adding-up test as an adequacy test and proposes a measure to assess the economic significance of the scope test: scope elasticity. Chapman et al. (2016) argue that Desvousges et al. (2012) misinterpret their scope test. Desvousges et al. (2016) reply that they did not misinterpret the Chapman et al. (2009) scope test and assert that their adding-up test in Desvousges et al. (2015) demonstrates one of their points.

Desvousges et al. (2015) field the Chapman et al. (2009) survey with new sample data collected with a different survey sample mode than that used by Chapman et al. (2009) and three additional scenarios. Desvousges et al. (2015) conduct an adding-up test and argue that willingness-to-pay (WTP) for the whole should be equal to willingness-to-pay for the sum of four parts (the first, second, third and fourth increment scenarios). Desvousges et al. (2015) find that “The sum of the four increments … is about three times as large as the value of the whole” (p. 566).

Whitehead joins the debate on the side of Chapman et al., defending them by examining Desvousges et al.'s analysis and showing that it actually does meet an 'adding up test', thereby showing that there are no scope problems in the original Chapman et al. paper. Whitehead concludes that there are a number of problems in the Desvousges et al. analysis:

First, they do not elicit WTP estimates explicitly consistent with the theory of the adding-up test. Their survey design suggests that a one-tailed test be conducted where the sum of the WTP parts is expected to be greater than the WTP whole. Second, there are several data quality problems: non-monotonicity, flat portions over wide ranges of the bid function and fat tails. Each of these data problems leads to high variability in mean WTP across estimation approach and larger standard errors than those associated with nonparametric estimators that rely on smoothed data.

I'm not going to get into the weeds here, because what I want to highlight is the response by William Desvousges, Kristy Mathews (both independent consultants), and Kenneth Train (University of California - Berkeley), also published in the journal Ecological Economics (and also no ungated version available). The response is only two pages long, and is a very effective takedown of Whitehead. Along the way, Desvousges et al. note that Whitehead:

...made numerous mistakes in his calculations... When these errors are corrected, adding-up fails for each theoretically valid parametric model that Whitehead used.

One example of Whitehead's errors is:

He used medians for the tests instead of means, assuming – incorrectly – that the sum of medians is the median of the sum.

That's a fair criticism. However, Desvousges et al. are not satisfied leaving it at that. Instead, they go onto the attack:

Also, we examined the papers authored or co-authored by Whitehead that are cited in the recent reviews... These papers provide 15 CV datasets. Each of the three problems that Whitehead identified for our paper is evidenced in these datasets:

  • Non-monotonicity: 12 of the 15 datasets exhibit non-monotonicity.
  • Flat portions of the response curve: All 15 datasets have flat areas for at least half of the possible adjacent prompts, and 4 datasets have flat areas for all adjacent prompts.
  • Fat tails: In our data, the yes-share at the highest cost prompt ranged from 15 to 45%, depending on the program increment. In Whitehead's studies, the share ranged from 14 to 53%.

If Whitehead's data are no worse than typical CV studies, then his papers indicate the pervasiveness of these problems in CV studies.

Ouch! That seems to have ended that particular debate. My takeaway (apart from not messing with Desvousges et al.) is that the contingent valuation method is far from perfect. In particular, it is vulnerable to scope problems (which my own research with Ian Bateman and Andreas Tsoumas (ungated earlier version here) showed some years ago. Ironically, that contingent valuation has particular problems is a message that John Whitehead himself has also argued (see here).

Read more:

Tuesday, 25 November 2025

The economics of fertility in high-income countries

Earlier this year, Melissa Kearney and Phillip Levine released an NBER Working Paper on the economics of fertility in high-income countries. In part, this paper is a follow-up on their 2022 article on cohort effects and fertility (which I discussed here), as well as building on this theoretical and empirical review (ungated here) by Doepke et al. (which I discussed here).

Kearney and Levine first review the trends and patterns in fertility in high-income countries, focused in particular on cohort-based measures. This exercise re-establishes the by now well-known trend of declining fertility, across the six example countries that they selected (Canada, Japan, Netherlands, Norway, Portugal, and the US).

Kearney and Levine then turn their attention to why fertility has declined, as well as why various policies and incentives have mostly failed to arrest the declining fertility trends. Taking an economic perspective that builds from Gary Becker's work on the economics of the family, but broadens its consideration (as shown by Doepke et al.), Kearney and Levine state that:

...the evidence points us to the view that the recent decline in fertility is likely less about changes in current constraints and more about cumulative cultural and economic forces that influence fertility decisions over time. Generally, economists are loathe to rely on changes in preferences to explain behavior because that can explain virtually anything. But there are reasons to believe that the lifestyle, broadly defined, that is consistent with having a child or multiple children is becoming less desirable for many adults.

Kearney and Levine point out several times (as in the quote above) how much economists dislike resorting to changes in preferences as an explanation, because changes in preferences can be used to explain essentially anything (which renders models basically worthless). However, they acknowledge that in this context, and based on the evidence from many studies, that it is likely that "shifting priorities" (a convenient alternative name for changing preferences) are at play. These "shifting priorities":

...refer broadly to changes in individual values, which potentially reflect evolving opportunities and constraints, changing norms and expectations about work, parenting, and gender roles, and social and cultural factors.

However, Kearney and Levine still want to avoid letting changes in preferences take over. That leads them to note that:

...changes in preferences may not be generated randomly and it is important to consider the forces that might have led to such changes. In our review of empirical evidence below, we highlight a number of potential social and cultural factors that might have altered preferences for and attitudes toward childbearing in recent decades, including peer effects, media and social media influences, the role of religion and religious messaging, and changing norms around parenting and gender roles in the home and society.

For me, the key contributions of the paper are not the review sections, but the theoretical and empirical implications. For example, in terms of theory, Kearney and Levine suggest that economic modelling of family decisions needs to change. Specifically:

We propose that it is now more appropriate to consider and model labor force participation as the default option, and fertility as the discretionary activity. This reflects a major shift in societal norms and practices over the past several decades. Women in earlier cohorts were more likely to have children and less likely to work. Back then, it is reasonable to consider having children as a widespread priority for women, perhaps reflecting societal norms and expectations, and sustained participation in the paid labor force as the more “optional” choice.

That presumptive ranking quite possibly has reversed. If market work is now the norm, the labor market norms and practices, including the expectations of “greedy jobs” as described by Goldin (2014), may alter fertility behavior. The tradeoff between market work and childbearing is now about the tension between a lifetime career and the way motherhood interrupts or alters that lifetime career progression, rather than about whether women work at all after they are married or have had their first child.

In terms of empirical implications, Kearney and Levine note that economists could learn a lot from demographers, in particular in relation to recognising cohort effects. They also note that:

...a challenge for economic research going forward is that the empirical methods we often rely on for causal identification are not particularly well-suited for studying changes across cohorts, nor the impact of widespread social and cultural changes... The statistical demands on the data for causal identification often lead to a focus on the immediate impact of period-specific factors. But as noted throughout this paper, the key questions that remain to be answered in this area are about cohort-level changes and the role of less immediate and discrete changes.

In addition, a typical approach to identifying period-specific effects might generate misleading or limited policy lessons. Consider an intervention that relaxes some constraints on having a child at a point-in-time. Younger women—say, 18-year-olds—may incorporate that change into their long-term decision making, but they may not respond immediately. Meanwhile, women in their early 30s may be less responsive, having already made many related life choices (regarding careers, relationships, lifestyle, etc.). In such cases, we might observe little to no immediate effect, even if the policy ultimately influences lifetime fertility...

A policy change may lead women to move up the timing of a birth to respond to some incentive, but to have the same number of children over their childbearing years. Our methods may conclude that this policy “worked,” even though completed fertility was unaffected. 

It is important for economists to recognise where the current widely used empirical methods are likely to lead to incorrect conclusions being drawn, and Kearney and Levine have provided some important cautions here. Fertility decline is topical, and many economists will be working on research questions related to this, especially as policy initiatives are rolled out by governments trying to return to above-replacement fertility. This review by Kearney and Levine is both timely and very helpful.

[HT: Marginal Revolution]

Read more:

Sunday, 23 November 2025

The misery of diversity?

I just finished reading this 2024 NBER Working Paper by Resul Cesur (University of Connecticut) and Sadullah Yıldırım (Marmara University), provocatively titled "The Misery of Diversity". They look at whether greater genetic diversity is associated with subjective wellbeing (SWB, measured as happiness, or life satisfaction, or affect balance), and find that:

...diversity lowers human SWB, measured by cognitive life evaluations and hedonic assessments of emotional states.

Cesur and Yıldırım demonstrate these results using data on genetic diversity that comes from this 2013 article by Ashraf and Galor (ungated version here). As Cesur and Yıldırım explain:

Population geneticists demonstrate that the dispersal of anatomically modern humans via migratory routes determined within-ethnic genetic heterogeneity. As one moves away from Ethiopia via migratory tracts, genetic diversity, defined as the likelihood of two randomly picked individuals having dissimilar genetic material, decreases...

Our diversity measure impacts the outcomes of interest through social ecology, which, over many generations, likely has influenced cultural evolution. In particular, interpersonal diversity determines the endowment of genetic variation, a measure of social diversity, capturing within-group interpersonal differences across the globe...

This measure of social diversity performs better than conventional diversity indicators, such as the indices of fractionalization and polarization, in capturing the true extent of diversity... In particular, these authors show that while interpersonal population diversity has a substantial and precisely estimated impact on intrastate conflict, fractionalization, and polarization indices fail to explain it.

Underlying data for this index is the expected heterozygosity measures of 53 indigenous human populations genotyped at 780 microsatellite loci as a part of the Human Genome Diversity Project (HGDP–CEPH). It captures the probability that two randomly selected individuals within an ethnic group differ in genetic makeup. In light of the Out of Africa hypothesis, Ashraf and Galor (2013a) constructed predicted genetic diversity for each country by using the coefficient estimate of the impact of migratory distance to Addis Ababa on genetic diversity in the sample of indigenous ethnic groups across the world. Although

Using this measure, with an instrumental variables analysis, Cesur and Yıldırım show that genetic diversity causally decreases subjective wellbeing at both the country level and the individual level (using data from the World Values Survey and the World Happiness Report). Their results are robust to excluding countries that experienced large migrations after 1500 (such as countries in North America and Oceania), and to various other modelling choices. Cesur and Yıldırım dig into the mechanisms for lower subjective wellbeing, and conclude that:

...the misery of diversity is an evolutionary trap caused by the mismatch it creates between the ancestral and current social environments via reduced social cohesion, retarded state capacity, elevated mistrust, and increased inequality of economic opportunities.

So, it seems like this is good evidence that genetic diversity decreases subjective wellbeing. However, there are a couple of problems. First, when most people think about diversity, they are thinking about between-group diversity, not within-group diversity. Between-group diversity is what you get when people from different ethnic groups are together. Within-group diversity is what you get when people from the same ethnic group differ genetically from each other. Cesur and Yıldırım's measure is heavily weighted towards within-group diversity. And indeed, in one of their analyses they find that it is within-group diversity that matters the most in their analysis. When they split their measure into within-group and between-group diversity, within-group diversity has a statistically significant (and negative) effect on subjective wellbeing measures, while between-group diversity is statistically significant.

So, Cesur and Yıldırım's analysis might be correct, but at the same time kind of misses the point. Between-group diversity is something that has potential policy levers (migration policy), whereas within-group genetic diversity is not something that is amenable to policy change. At least, not without eugenics (and, to be clear, I am not advocating for that). 

The second problem comes from the analysis of first-generation and second-generation immigrants in Europe and the US, where Cesur and Yıldırım find that:

...while home country diversity continues to hurt the SWB of first-generation immigrants, such effects weaken among the second-generation, suggesting that long-run improvements in the social environment can mitigate the misery of diversity over generations.

These results are not well-explained. If a person is born in one country, and then moves to a new country, shouldn't it matter how long they are exposed to the genetic diversity in the country of birth, and how long they are exposed to the genetic diversity in the destination country, in terms of the impact on subjective wellbeing? Cesur and Yıldırım don't show any dose-response relationship here. And there should be no effects at all on the second generation (which is what they find), because for the second-generation immigrants, the genetic diversity they have been exposed to is the country of their own birth, not the country of birth of their parents. However, that is only a small problem in an otherwise interesting paper.

Overall, I think Cesur and Yıldırım need to engage a bit more with why anyone should care about genetic diversity, given that it is not amenable to policy change. Until they can do that, this paper can be filed under the interesting, but unhelpful category.

[HT: Marginal Revolution, last year]

Friday, 21 November 2025

This week in research #102

Here's what caught my eye in research over the past week (clearly a very quiet week!):

  • Buckles et al. (open access) describe the Census Tree database, which links records across historical US censuses between 1850 and 1940 (a very valuable resource!)

Thursday, 20 November 2025

The impact of a lower drink-driving limit on bars and pubs

New Zealand decreased the drink-driving limit from 0.08 to 0.05 percent BAC (blood alcohol concentration) in December 2014. In the lead-up to the change, there were worries that bars and pubs would lose business (for example, see here). Similarly, after the changes came in, there were a number of news stories about negative impacts (for example, see here and here). When my research team was doing fieldwork in 2019, a local bar owner we talked to complained about how dead the Hamilton CBD was during the week (from our observations, the weekends were still pretty busy! [*]).

So, I was interested to read this 2020 article by Colin Sumpter (NHS Forth Valley) and co-authors, published in the journal Drug and Alcohol Review (open access). They interviewed bar and pub managers and owners in 2018, over three years after Scotland introduced the same reduction in drink-driving limit that New Zealand did [**]. Sumpter et al. note that before the law change, there was a lot of opposition from within the industry, similar to what we saw in New Zealand around the same time. However, Sumpter et al.'s results, based on qualitative analysis of in-depth interviews with 16 bar or pub owners or managers, shows that the results are more nuanced. First:

Most participants reported that prior to the limit change, there was little concern about the potential impact the change would have on their own business, although many felt it would impact on the hospitality industry as a whole. Post-limit change, most participants felt there had been no overall impact on their profits. A few reported a short-term impact that had lasted six to 12 months, but had seen profits return to normal after this period. A small minority reported a significant and persisting financial impact on their business and a similar number reported a smaller persisting financial impact. Rural pubs were more likely to report a negative economic impact while urban food-led establishments were less likely to report this as customers had continued to eat out while switching alcohol for soft drinks.

The perceived impacts on drinkers were interesting, and essentially what public health advocated would have hoped for:

Participants described three groups of drinkers that were particularly affected by the limit change. First and most commonly mentioned was the ‘after-work drinker’ group, which mainly comprised of men who would have dropped in on the way home from work. Participants reported that this behaviour had declined and attributed this to a public perception that the limit had changed from a ‘two-pint limit’ to a ‘no pint limit’...

The second affected group comprised of the ‘next morning driver’. Participants had observed that these people were now finishing drinking earlier on most nights, and particularly Sundays...

The third affected group comprised of the ‘lunchtime drinker’, although these were reportedly less affected by the limit change. In food-led establishments, it was often female customers who would previously have shared a bottle of wine, or had single glasses, but who now preferred to either have a designated driver or drink only soft drinks.

Finally, businesses adapted (or, at least, those businesses that were still around three years later had adapted!):

The major change in practice was around the provision of alternatives to alcohol. While participants from drink-only venues reported that their main income still came from alcoholic drinks, others described a growing trend in customer demand for no/low-alcohol drinks, and the range and quality of these drinks on offer from manufacturers. Whereas previously only one no/low-alcohol alternative would have been sold (other than soft drinks), examples were given of no/low-alcohol ranges intended to mimic the experience of drinking alcohol. This trend was primarily for beer but also present in cider and wine.

Don't forget mocktails! I think I rarely saw a premium mocktail on a menu prior to 2014, but now they are standard fare for most pubs and bars (in New Zealand, at least). The research participants also noted incentives for designated drivers, such as free soft drinks, but like in New Zealand, those were often available before the drink-driving limit reduced. Overall, it appears that the pre-law worries about the negative impacts on bars and pubs, those worries were not borne out. In fact, Sumpter et al. find that:

Overall, despite the reservations of participants (regardless of premise type or location), there was broad acceptance of the limit change, disapproval of drink-driving, and little suggestion that the reduction should be reversed.

If even bar and pub owners and managers approve of the change afterwards, then it was clearly a positive change overall. 

*****

[*] You can read about that research here and here, or our earlier research just after the drink driving limited changed here and here (ungated version here).

[**] Incidentally, Scotland's new drink-driving limit came into force just four days after New Zealand's new limit (5 December 2014 vs. 1 December 2014).

Tuesday, 18 November 2025

In wildfires, people prefer to save people rather than endangered species

If you were an incident controller who needed to deploy firefighting resources in a wildfire, how would you decide where to distribute those resources? If there is not enough firefighting to cover all areas at once, which areas should receive priority? Saving human lives seems like it should be a priority, but what about animal lives? What about preserving biodiversity, or saving endangered species from the fire? What about built infrastructure? What about important cultural artifacts? Some of these questions may seem easy to resolve, but there are important trade-offs, and understanding those trade-offs is important.

That is where this 2024 article by John Woinarski, Stephen Garnett, and Kerstin Zander (all Charles Darwin University), published in the journal Conservation Biology (open access, with non-technical summary on The Conversation), comes in. They surveyed a sample of over 2000 Australians, asking them to repeatedly make best-worst choices among five different alternatives (of eleven total). As they explain:

...respondents are asked to state which item among a set of items they consider as best and worst... In our survey, best meant the asset the respondent most wanted to save and worst meant the asset the respondent least wanted to save.

By getting the research participants to repeat this task many times (eleven times, in fact), with different sets of items to choose from, Woinarski et al. develop a good picture of the relative ranking of each of the eleven items, both for each research participant and for the sample overall. This best-worst scaling (BWS) method is a form of non-market valuation, since it essentially works out the relative value (in terms of ranking) of the different options that research participants are presented with. [*]

The eleven options that research participants were ranking overall were:

  1. A person with a car stuck behind a fallen tree, whom you know had not received advice to evacuate;
  2. A person with car stuck behind a fallen tree, whom you know had ignored repeated advice to evacuate beforehand;
  3. A house that you know has no people in it;
  4. A farm shed with some hay bales and a tractor;
  5. A flock of 50 sheep—a few of which will be killed by fire, but survivors are likely to be badly injured;
  6. A population of 50 koalas—a few of which will be killed by fire, but survivors are likely to be badly injured;
  7. The last population of a native snail species for which the fire will kill all individuals, thereby causing the species’ extinction;
  8. The last population of a small native shrub, for which the fire will kill all plants, thereby causing the species’ extinction;
  9. One of only two populations of a rare wallaby for which the fire will kill all individuals of one of the populations (but not affect the other), thereby making it more endangered;
  10. Ancient rock art that will be destroyed if fire gets into the weeds now growing in the rock shelter; and
  11. An old tree with an ancient Aboriginal carving on the trunk.

The results are interesting, if not terribly surprising:

In terms of relative importance, saving a person who ignored evacuation advice was rated 57% as important as saving a person who had not received warnings... Saving the koala population was rated slightly lower (56% as important as saving a person who had not received warnings). Saving the wallaby population was 45% as important as saving a person who was not warned. Saving the house and shed had the lowest rankings (14% and 9%, respectively, as important as saving a person who was not warned).

For completeness, compared with saving a person who had not received warnings, saving the shrub was rated as 25% as important. Saving the sheep was rated as 26% as important, saving the snails was rated as 25% as important, saving the ancient rock art was rated as 15% as important, and saving the carved tree was rated 12% as important, respectively. Woinarski et al. bemoan that no one loves snails, but I also think the loss of the cultural artifacts would be a tragedy as well. I guess that reflects that each of us would place different weightings on things, and come out with different rankings. And that is what Woinarski et al. look at next, finding that:

Female respondents placed higher importance than male respondents on the protection of the rare wallaby population, the koala population, the sheep, and the tree carving and lower importance than male respondents on the protection of the house, shed, native shrub, and rock art... Older respondents (>65 years) rated protecting people more highly than younger respondents, but rated the tree carving less highly than younger respondents...

Respondents who self-identified as Indigenous placed a higher score on protecting the rock art and tree carvings than those identifying as non-Indigenous.

Those differences may not come as a surprise either. Now, in my ECONS102 class, when we discuss non-market valuation (specifically in the context of estimating the value of a statistical life), I point out that personal experience of the risk makes a difference. And that is true in this case as well. Woinarski et al. find that:

Survey respondents affected by wildfires and those assessing themselves as being prepared for wildfires were less likely to save a person who had not received warnings... Those who rated themselves as prepared for wildfire were also less likely to save a person who ignored warnings, whereas those who had been affected by wildfire were more likely to do so.

It is interesting to consider what the differences mean here. If a person has personal experience of wildfires, then they know how devastating they can be, and how unpredictable and fast-moving. In my mind, that should make them more likely to want to save a person who has not received warnings, but instead they are less likely. On the other hand, it does make sense that they would be more likely to save someone who ignored warnings. Woinarski et al. don't provide a good explanation for that result (although, to be fair, they are focused on the results related to conservation, rather than humans!). On the other hand, people who are well prepared being less willing to help those who ignored warnings makes some sense.

The takeaway message from this paper, though, is that people prefer to save people, rather than endangered species. Especially snails.

*****

[*] If one of the options had been monetary, Woinarski et al. could have used their results to work out the rough monetary value of each option.

Monday, 17 November 2025

Population diversity and economic growth

Population diversity has a theoretically ambiguous effect on economic growth. On the one hand, having a more diverse population makes it more difficult for people to agree on things like spending on public goods (e.g. see this post), it can open the door to policies that favour certain ethnic groups, and lead to conflict over resources and the management of public services. On the other hand, having a more diverse population brings people together with different (and complementary) skills, experiences, and ways of thinking, which can boost innovation and productivity, as well as fostering connections with different communities (and other countries), which may increase international trade and investment.

Many studies have tested the relationship between population diversity and economic growth, with little consensus. That makes the literature ripe for meta-analysis, where the results of many studies are combined in order to estimate an overall relationship. That is the approach in this new article by Andreas Sintos (University of Luxembourg), published in the Journal of Economic Surveys (open access). Sintos collates the results from 83 studies, with 1537 estimates of the relationship between some measure of population diversity and some measure of economic growth.

First, Sintos establishes that there is a small publication bias overall, with studies that find a negative relationship between diversity and growth being more likely to be published than would be expected given the overall distribution of results. Then, after adjusting for publication bias and methodological quality of the studies, he finds that:

...while ethnic and linguistic diversity demonstrates a small and statistically insignificant positive effect on economic growth, the remaining dimensions of diversity—religious, genetic, birthplace, and the residual category—demonstrate a significant positive impact on economic growth, with effect sizes spanning from moderate to large.

So, population diversity (specifically religious, genetic, and birthplace diversity, as well as a residual category that captures other forms of diversity) is positively associated with economic growth. Places that have more diversity of those types (but not places that have more ethnic or linguistic diversity) grow faster. A 'moderate to large' effect here means that each standard deviation higher diversity is associated with 0.1 to 0.4 standard deviations higher economic growth. That is not to be sneezed at.

What Sintos isn't able to do, though, is explore the mechanisms that underlie that positive relationship. So, while meta-analysis can give us an overall estimate of the relationship, it can't tell us why that relationship exists. To do that, we would need to go and look at the individual studies, particularly those that found a positive relationship between diversity and growth, and see if they explored the mechanisms.

Finally, this article made me chuckle, as it is clear that substantial portions of it were written by generative AI. No human uses the word "elucidate" 14 times in a research paper, and quantitative papers rarely refer to the "scholarly discourse". I should really have been alerted to this when the first paragraph included the LLM-ese sentence: "The significance of population diversity within the economic sphere is multifaceted". Perhaps diversity's significance is multifaceted. This article doesn't tell us that though. All it tells us is that the relationship between diversity (by some measures) and economic growth is positive. More diverse places tend to grow faster.

Sunday, 16 November 2025

Andrew Leigh on big data vs. randomised controlled trials

'Big data' has become the catchcry of many data scientists and researchers in recent years. It's also become increasingly used in economics. However, by itself the analysis of big data doesn't provide anything but big data correlations. Even when big datasets are available, there is still a place for randomised controlled trials (RCTs). That is the essence of this new article by Andrew Leigh (Parliament of Australia), published in the journal Australian Economic Review (sorry, I don't see an ungated version online).

It should come as no surprise that Leigh is pro-RCT. After all, he is the author of the book Randomistas (which I reviewed here), which was essentially a tribute to RCTs. Leigh clearly sees the rise of big data, and its increasing use as a substitute for RCTs, as a threat to good research. In the article, he takes great pains to point out instances where big data draws the wrong conclusions, compared with RCTs on the same topic. For example:

Randomised trials have demonstrated a strongly beneficial effect of statins on reducing cardiovascular mortality. Yet when they analysed a database covering the entire Danish population, researchers found that the chance of death from cardiovascular causes was one‐quarter higher among those who took statins than among those who did not. The explanation is straightforward: people who were prescribed statins were at elevated risk of having a heart attack. Yet even when researchers made statistical adjustments, using all the variables available in the database, they were unable to reproduce the well‐known finding that statins have a beneficial effect on cardiovascular mortality.

Analysis of the Danish database also suggested that the relative risk of cancer was 15% lower among patients who took statins, an effect that remained statistically significant even after controlling for other observed factors about the patients. Yet this result is at odds with the evidence from randomised trials. A meta‐analysis of randomised trials, covering more than 10,000 cases of cancer, found no effects of statins on the incidence of cancer, nor on deaths from cancer...

The observational data was doubly wrong. Observational data failed to replicate the well‐known finding that statins improve heart health. And observational data wrongly suggested that statins reduce the risk of cancer. Randomised trials, which were not biased by selection effects, provided the correct answer. 

That is only one example of many in the article. However, while Leigh is pro-RCT, he is not anti-big-data. He notes that:

Large data sets are a valuable complement to randomised trials. But big data is not a substitute for randomisation.

If we take anything away from Leigh's article, it should be that point. Big data is incredibly useful. However, it must be analysed using the tools of causal inference (of which randomised controlled trials are just one example) if we want to move beyond finding correlations. The problem with big data is compounded by a focus on statistical significance (as Ziliak and McCloskey noted in their book The Cult of Statistical Significance, which I reviewed here). Big datasets will find statistically significant correlations even when the size of the relationship is very small. That is an asset when causal methods are applied, but is very much a liability when big data are analysed without consideration of causality. RCTs are one way of disciplining our research approach in order to ensure that the effects we estimate are causal, and as Leigh notes:

While correlations in large data sets do not necessarily indicate causation, administrative data can be enormously helpful in ensuring the precision of estimates from randomised trials.

The article finishes with high-level strategies that policy makers and practitioners can use to ensure that RCTs are embedded within the analysis of public policy:

I advocate five approaches. Encourage curiosity in yourself and those you lead. Seek simple trials, especially at the outset. Ensure experiments are ethically grounded. Foster institutions that push people towards more rigorous evaluation. Collaborate internationally to share best practice and identify evidence gaps.

Those all sound like good approaches. I would add a sixth: Employ analysts with a thorough grounding in causal inference methods generally, if not RCTs specifically. We need more policy analysis that establishes causal evidence of impact.

Friday, 14 November 2025

This week in research #101

Here's what caught my eye in research over the past week:

  • Kuehn (open access) discusses the under-recognised contributions of W.E.B. Du Bois to marginalist wage theory
  • Sintos (open access) provides a meta-regression analysis of the effects of population diversity on economic growth, finding that ethnic and linguistic diversity exhibit a small and statistically insignificant positive effect on economic growth, while religious, genetic, birthplace, and other forms of diversity exert a significant positive impact on growth, with effect sizes ranging from moderate to large
  • Baltrunaite, Casarico, and Rizzica (with ungated earlier version here) study gender differences in reference letters for graduate students in economics and finance, and find that men are described more often as standout and women as grindstone, i.e., hardworking and diligent, that these differences are mainly driven by male letter writers, especially more senior ones, and that standout characteristics relate positively to subsequent career outcomes whereas grindstone characteristics relate negatively to subsequent career outcomes
  • Asquith and Mast (with ungated earlier version here) study county-level population decline in the US, and find that falling fertility has caused migration rates that used to generate growth to instead result in decline, and that only 10 percent of counties would have declined during the 2010s if fertility had remained at its initial levels

Wednesday, 12 November 2025

The wicked problem of generative AI and assessment

Schools, universities, and teachers at all levels are having to grapple with the challenges of student use of generative AI. This new article by Thomas Corbin and colleagues (all from Deakin University), published in the journal Assessment and Evaluation in Higher Education (open access), describes it as a 'wicked problem':

Wicked problems, as originally conceptualised by Rittel and Webber (1973), describe challenges that defy simple solutions... Unlike their counterpart ‘tame’ problems, which have clear definitions and measurable solutions, wicked problems lack definitive formulations, and their solutions are not true or false but rather better or worse, requiring judgment, compromise, and adaptation. This distinction is key because it disrupts the assumption that there is a ‘correct’ policy, assessment method, or institutional response waiting to be discovered. Instead, every approach carries trade-offs, is shaped by context, and must be continually reassessed in response to evolving conditions. For those tasked with navigating wicked problems, this reality has a significant personal toll; every decision feels provisional, every choice open to criticism, and the pressure to find the ‘right’ solution persists even when no such solution exists...

I'm sure that description resonates with many teachers, when they think about generative AI and assessment. Corbin et al. back up their assertion that this is a wicked problem with qualitative research, based on interviews with 20  'Unit Chairs', who are responsible for running a subject. It would have been interesting if they had interviewed lecturers as well, since they are on the front lines in dealing with students' use of generative AI, but I suspect the results would not have differed too much.

The results make for interesting reading. Corbin et al. work their way through all of the criteria that Rittel and Webber used to define a 'wicked problem' in their 1973 article (ungated version here). I don't agree with them on all criteria, so I'm going to use this post to push back on a few things. However, I think that their paper does provide some good talking points, starting with:

The first defining feature of wicked problems is that they cannot be clearly or conclusively defined. Unlike technical problems where stakeholders can in theory agree on what needs fixing, wicked problems mean different things to different people and these varying definitions pull solutions in contradictory directions. Without agreement on what the problem is, a singular, cohesive response becomes impossible.

This pretty much captures things I think:

Consider for example the frustration of the teacher who stated: ‘I’ve spent so much fucking time on developing this stuff. They’re really good as units, things that I’m proud of. Now I’m looking at what AI can do, and I’m like, what the fuck do I do? I’m really at a loss, to be honest’. (T10).

We are all just trying to find our way in the era of generative AI. But no one agrees on what should be done, or even what the problem is (see yesterday's post as one example!). Second:

The second defining characteristic of a wicked problem is that it has no stopping rule – that is, there are no clear criteria for knowing when you have reached ‘the solution’...

When asked about determining success, one teacher responded: ‘How do we actually tell? You can’t’ (T15).

I guess we just do what we can in the moment. However, all of us are looking around at what other people are trying, and constantly wondering if we can do better. I have a solution for my papers. I don't think it is the solution, and certainly it isn't a one-size-fits-all solution for every paper. It seems to be working all right for now, at least. But in coming up with a solution that has some benefits, it trades off with other things that we have to give up. And that is the third characteristic of a wicked problem:

Technical problems have correct answers that can be verified. Wicked problems, on the other hand, have only trade-offs, where every response sacrifices something valuable...

Another unit chair worried: ‘We can make assessments more AI-proof, but if we make them too rigid, we just test compliance rather than creativity’ (T3).

These types of statements illustrate how moves toward assessment security sacrifice something else, be it authenticity, creativity, or real-world relevance.

In my case, we assess knowledge and comprehension and application (which are low on Bloom's taxonomy), but by adopting in-person tests we forego the ability to authentically assess higher-level skills such as analysis and synthesis and evaluation (which, to be fair, shouldn't necessarily be assessed in a first-year paper anyway!).

On the fourth criterion, Corbin et al. note that:

...wicked problems lack clear metrics for testing whether solutions have succeeded...

Several unit chairs expressed uncertainty about whether their assessment adaptations were effective. When asked about determining success, one stated simply: ‘If a student uses AI appropriately for brainstorming, we might never know. If they use it inappropriately, we also might never know’ (T18).

Again, this one definitely depends on assessment style, and in some cases, you can tell whether your approach has succeeded. In my case, I am fairly confident that I am able to assess my students' learning in the test environment, and that the use of AI tutors is, if anything, improving that learning (more to come on that point though, as I will be reporting on the actual evaluation in the next month). And that means that I also disagree with Corbin et al's next point, which is:

The fifth characteristic of wicked problems is that solutions cannot be found through experimenting with solutions because every attempt has real consequences.

I think you still can try things, and see if they work (and if I didn't think that, then I probably wouldn't try things in the first place!). Yes, there are consequences. But there are also consequences to not experimenting with finding a solution. The era of generative AI is not going to pause so that we can just keep doing what we always have done. We have to embrace the uncertainty! And that links to the next point that Corbin et al. raise, which is that:

...wicked problems present limitless possible approaches with no way to determine if all options have been considered.

Yes, but to be fair that was probably true before generative AI as well. If there was a single silver bullet solution to teaching and learning, we all would have been doing it already. All teachers have their own pedagogical approaches, which hopefully leverage their strengths as teachers and academics, while mitigating their weaknesses. And that means that there isn't one approach that will work in all circumstances for all teachers. In fact, I adopt different approaches in different papers, given what I hope will work (and experimenting, while testing whether my approach is successful). And that links to the next point:

The appeal of standardized solutions - whether "best practice" templates or institutional mandates - assumes that similar-looking problems can be solved with similar approaches. But wicked problems resist this logic because each instance emerges from an irreducibly specific context.

Yes, but not necessarily for the reasons that Corbin et al. outline (or, not only for the reasons that they outline). As I noted above, every teacher has different strengths and weaknesses, and so what is best practice for one teacher need not be best practice for everyone else.

The next criterion is:

Wicked problems do not exist in isolation but instead emerge from and reveal deeper structural issues.

Several participants saw AI vulnerabilities as symptoms of institutional business models. One teacher argued: ‘A university like [the one in which I work], which is based on a business model, which is online-based, where you cannot incentivize students to come in person, and all the assessments are based on tasks you ask students to do at home in their own time, this model is the most vulnerable to fraud in an age of AI’ (T9).

Generative AI is not operating in a vacuum, so of course it intersects with other issues. Online assessment was already a problem before generative AI came on the scene. How quickly have we all forgotten about Chegg, the bane of online assessment during the lockdowns? Moving on:

The ninth characteristic of a wicked problem is that the way the problem is framed shapes which solutions become possible. This relies on the claim that how we define a problem constrains what kinds of responses can be imagined or pursued. In other words, how we frame the AI and assessment challenge predetermines which solutions appear reasonable and which remain invisible...

When teachers framed AI as a threat to academic integrity, they favoured control-based solutions. One stated: ‘I know I would still prefer exams to come back on campus because it would be the only piece of assessment that we can truly say this is their own work’ (T4)... Those who framed AI as a professional necessity proposed integration: ‘I think GenAI is going to stay, right? It’s already part of the workforce, like us as well. Students need to be able to use it efficiently. The part of their skills they will need to learn would be to use GenAI efficiently’ (T17).

This is definitely an issue. I know of colleagues from both ends of this spectrum. The worst part is that I have sympathy for both views (as regular readers of this blog will probably recognise)! But again, there need not be a one-size-fits-all solution here, and while AI might be a threat in some papers, it might be an integral part of the teaching and learning and assessment in another. Both of those things can be true at the same time. Finally:

The tenth characteristic of wicked problems is that decision-makers bear full responsibility for the consequences of their choices. Unlike theoretical problems where errors have no real-world impact, those addressing wicked problems are, as Rittel and Webber (1973, 167) note, ‘liable for the consequences of the solutions they generate’...

One teacher worried about graduating unprepared professionals: ‘How many are we missing? Are we in fact sending students out into the workforce who can get through an interview, but when they start doing the job, they can’t?’ (T11). The personal vulnerability this created was articulated starkly: ‘I feel very, very vulnerable within the university running assessments like this because I know that there are pockets of the university management who would really like to just see us do traditional, detached, academic assessments that don’t threaten to push students’ (T6).

As teachers, we do bear some responsibility. The problem here, and this is highlighted in the second quote above, is where the university creates an environment where teachers' ability to ensure students have met learning objectives is undermined by institutional practices. And too often, teachers are finding themselves in that position. As noted in yesterday's post, Simas Kucinskas made the point that "take-home assignments are obsolete". Our assessments need to reflect that fact, and universities shouldn't be putting teacher staff in a position where they are forced to adopt assessment practices that are no longer fit for purpose. Of course, this would still be an issue even if generative AI and assessment wasn't a wicked problem.

While I'm not convinced by all elements of Corbin et al's argument, I do agree that generative AI and assessment is a wicked problem. That doesn't mean that we should give up. There are solutions out there, but there is unlikely to be one solution that will work for all teachers and in all circumstances. We need to keep experimenting, and sharing our learnings. That is the only way that we will move forward, in ensuring that student learning is still assessed in a meaningful way.

Read more:

Tuesday, 11 November 2025

Simas Kucinskas on AI, university education, and the 'mushy middle'

Simas Kucinskas has an interesting Substack post on university education in the age of AI. His TL;DR summary of the post is:

AI now solves university assignments perfectly in minutes. Students often use LLMs as a crutch rather than as a tutor, getting answers without understanding. To address these problems, I propose a barbell strategy: pure fundamentals (no AI) on one end, full-on AI projects on the other, with no mushy middle. Universities should focus on fundamentals.

Kucinskas starts by making the point that "take-home assignments are obsolete", and that students are outsourcing too much of their learning to generative AI. I have to agree. When ChatGPT can write an essay, solve problem sets, draft reports, and answer online test questions, the options for assessment that provides a genuine evaluation of whether students have met particular learning outcomes narrow significantly. That's why, in my classes, we've moved back to predominantly in-person assessment (or oral examinations online). They're not bulletproof assessments, but they are better than the alternatives that are far more vulnerable to generative AI.

Kucinskas's solution is what he terms the "barbell strategy":

One end of the barbell: courses that are deliberately non-AI. Work through proofs by hand. Read academic papers. Write essays without AI. It’s hard, but you build mental strength.

The other end of the barbell: embrace AI fully for applied projects. Attend vibecoding hackathons. Build apps with Cursor. Use Veo to create videos. Master these tools effectively.

 Kucinskas dismisses the "mushy middle":

...where students “use AI responsibly” or instructors teach basic prompting as an afterthought. That’s the worst of both worlds. Students don’t build thinking skills, but they also don’t learn the full potential of AI.

Here, I differ with Kucinskas. I agree about the starting point. We need the basic courses that teach the fundamentals of a discipline to be designed to be AI-free, at least in terms of the assessment (AI can still be a useful learning tool, such as the AI tutors in my papers). And I agree with Kucinskas about the end point. We need students to be embracing AI fully for applied projects by the end of their degree. Where we differ is how we get students from the starting point to the end point, and I prefer a much more scaffolded approach (as I outlined briefly in this post).

The problem is that Kucinskas has a rosy view of how self-directed students will be in learning how best to use generative AI. Highly self-directed (and/or tech-savvy) students will be fine without any direction from universities or lecturing staff. They will spend the time and effort to figure it out themselves, and will excel because of the learning that they engage in along the way. Those are the students that Kucinskas is thinking about. However, not all students are like that. Some (perhaps many) won't know what they are doing, may fail more than they succeed, and eventually try to wholesale outsource the applied projects to AI. This is exactly what Kucinskas is worried about for university education. His approach doubles down on what is happening already, for students who are least self-directed.

Students who are less self-directed by definition require a more directive approach from lecturing staff. These students need to be scaffolded through the process of recognising the value of generative AI, learning to use generative AI within a narrowly-scoped set of activities, and gradually building their skills with prompting and learning from each other and from the generative AI, before being let loose on the applied projects that are the end-point of the learning journey.

So, there is definitely a role for the 'mushy middle' in university education. However, by making it more directive we can hopefully reduce the degree of mushiness.

[HT: Marginal Revolution]

Read more:

Sunday, 9 November 2025

Book review: In This Economy

Kyla Scanlon rose to some prominence during and after the pandemic, through her short explanatory videos about the economy, money, and finance. She may not have been the first, but certainly is one of the most prominent members of the #EconTok community on TikTok (as well as being active on other social media as well). Certainly, she has developed a large following, particularly among younger people. So, I was really interested to read her 2024 book, In This Economy.

I have to say that I was quite disappointed though. On the plus side, Scanlon plays to her strengths, and the early parts of the book are strong on exploring the role of vibes on the economy (Scanlon coined the term 'vibecession', to mean "a period of temporary vibe decline during which economic data such as trade and industrial activity are okay-ish"). Those chapters are generally good (although see my later comments). However, significant parts of chapters are less explainers about "how money and markets really work", which is the subtitle for the book, and more a commentary on current US policy on housing, immigration, clean energy, and the like. This is not just apparent in the final chapter, which is supposed to be more policy focused. The parts of the book where Scanlon held forth on her views were far less compelling to me, because the role of vibes was largely forgotten. It would have been more interesting to know how vibes may play a role in housing policy, or immigration policy, and whether a change in vibes might change policy. The book could have been tightened up significantly, and made an interesting contribution that other authors are less well equipped to make.

What put me off most though, were the inaccuracies in the book. The worst offence (to a New Zealand economist) was this, about inflation targeting:

That's because the 2 percent figure is sort of random. The idea originally came from Arthur Grimes, the Labour Party finance minster [sic] of New Zealand in the 1980s. He went on TV and said, "Two percent should be our inflation target," and now everybody goes after that magic number.

Arthur Grimes was never an MP, let alone finance minister (I checked this with him!). Scanlon might owe Arthur an apology for confusing him with Roger Douglas. One of my colleagues ventured that perhaps ChatGPT wrote those sentences. It is the sort of hallucination we might expect from an LLM, but who knows if that was the source. Sadly, it is indicative of the inaccuracies in the book. Consider this one:

In one example of the extremity of market moves, the yield on thirty-year U.K. inflation-linked bonds jumped by more than 250% (meaning that they fell 250% in price) after the Bank made the announcement that it was not going to intervene.

If something falls in price by more than 100 percent, that means that the seller pays the buyer to buy it from them. The correct figure here should be 60 percent I think, not 250 percent. Similarly:

So when news headlines say, "Inflation Rate Falls to 3 Percent," that doesn't mean that prices fell three percent; it just means that the rate of change of price increases fell three percent.

No, it means that the rate of change of prices fell to three percent (from whatever it was before). There is unfortunately a lot of this sort of lack of attention to detail. At one point, Scanlon provides an estimate of GDP for the 'Gingerbread Yeti economy', then converts it to 'real nominal GDP' by dividing by one plus the current year's inflation rate. First, there's no such thing as 'real nominal GDP'. There is 'nominal GDP' and there is 'real GDP'. And second, the calculation does provide a measure of real GDP, measured in terms of dollars from the year before. However, the calculation that is presented gives the impression that dividing by one plus the current year's inflation rate is the standard way of calculating real GDP. It isn't. It's not just the current year's inflation that matters in calculating real GDP, but the inflation in every year between the current year and the base year. The base year matters, and the base year is not always the year before the current year.

Despite my grumpiness, there are some good aspects to the book. Scanlon does have a good way with words that I think connects with younger people (and that much is clear from her success on social media). She also provides some interesting examples to illustrate her explanations, such as the 'economics kingdom' (which illustrates how parts of the economy are related), the 'cake of uncertainty' (which relates expectations, theory, and reality), and the aforementioned 'Gingerbread Yeti economy'. Scanlon also refers to a lot of memes, probably many more than I would recognise. And yet I found the explanation of how 'meme stocks' worked to be a bit underdone.

Sadly, I don't think I can recommend this book, even to my younger students who might connect with the contemporary material more than they would with earlier pop economics books. There are simply too many bits where I worry that the book would steer them wrong. Normally, I find that Tyler Cowen makes excellent book recommendations. In this case, I'm really not seeing whatever he saw in this one.

Saturday, 8 November 2025

Survey evidence on the labour market impacts of generative AI

A picture of the labour market impacts of generative AI is slowly emerging. At this stage, there is little consensus on what the impacts will be. I just stumbled across this working paper, by Jonathan Hartley (Stanford University) and co-authors, which I had put aside to read earlier this year. Unlike some of the research I have discussed in recent posts (linked at the end of this post), Hartley et al. make use of a nationally representative survey of US workers.

The survey has had three waves in the US (plus one Canadian wave), and the first US wave had over 4200 respondents (Hartley et al. don't report how many respondents there were for the other waves). The results make for interesting reading. First, in terms of who is using generative AI, they report that:

...LLM adoption at work among U.S. survey respondents above 18 has increased rapidly from 30.1% as of December 2024, to 43.2% as of March/April 2025, and to 45.9% as of June/July 2025...

Conditional on using Generative AI at work, about 33% of workers use Generative AI five days per week at work (every weekday). Roughly 12% of Generative AI users use such tools at work only 1 day at work. About 17% and 18% of Generative AI users use Generative AI tools at work two and three days per week respectively...

That is a lot of people using generative AI for work, and using it often when they do. It is interesting to sit these results alongside those of Chatterji et al. (whose paper I discussed in this post). They found growth in both work-related and non-work-related ChatGPT messages over time.

Who is using generative AI at work, though? Hartley et al. find that:

...Generative AI tools like large language models (LLMs) are most commonly used in the labor force by younger individuals, more highly educated individuals, higher income individuals, and those in particular industries such as customer service, marketing and information technology.

These results are similar to those of Chatterji et al., except that Hartley et al. also report gender differences (with greater use of generative AI by men), whereas Chatterji et al. report that the gender gap that was apparent among early adopters of ChatGPT has closed completely.

Hartley et al. then move on to estimating the productivity gains from generative AI. Given that this is survey-based, and not observational or experimental, we should take these results with a very large grain of salt. Hartley et al. ask their respondents how long it takes then to complete various tasks with and without generative AI. The results are summarised in Figure 12 in the paper:

Notice that every task is reported to take less time with generative AI (the green dots) than without (the blue dots). The productivity gains are different for different tasks. However, I find this figure and the data to be very fishy. How could generative AI create a huge decrease in time on 'Persuasion' tasks? Or 'Repairing' (which has one of the biggest productivity gains). Also, notice how almost every task takes between 25 and 39 minutes with generative AI. I strongly suspect that the research participants are anchoring their responses to this question on 30 minutes with GenAI for some reason. Without seeing the particular questions that are being asked though, it is hard to tell why. [*]

Hartley et al. then try to estimate the impact of generative AI on job postings, employment, and wages, using a difference-in-differences research design. They find no impact on job postings or employment, but significant impacts on wages. However, here things get strange. The coefficients that they report in Tables 6 and 7 of the paper are clearly negative, and yet Hartley et al. write that:

Our estimated coefficients... imply economically meaningful wage effects: a one-standard deviation increase in occupational Generative AI exposure corresponds to a significant increase in median annual wages...

Going back to their regression equations, their 'exposure to generative AI variable' is more positive when exposure is high, so a negative coefficient should imply that more exposure to generative AI is associated with lower wages. I must be missing something?

Given the deficiencies in the data and the regression modelling, I don't think that this paper really adds much to our understanding of the labour market effects of generative AI. Which is disappointing, because survey-based evidence would provide us with a complementary data source that would help us to triangulate with the results from other data sources and methods.

[HT: Marginal Revolution]

*****

[*] On a slightly more technical note, we might expect there to be as much variation (in relative terms) in the 'with GenAI' data as in the 'without GenAI' data. However, the coefficient of variation (the standard deviation expressed as a percentage of the mean) is 0.109 for the 'with GenAI' data, but 0.226 for the 'without GenAI' data. So, there is less than half the variation in the reported task times with GenAI than without. Again, that suggests that this data is fishy.

Read more:

  • ChatGPT and the labour market
  • More on ChatGPT and the labour market
  • The impact of generative AI on contact centre work
  • Some good news for human accountants in the face of generative AI
  • Good news, bad news, and students' views about the impact of ChatGPT on their labour market outcomes
  • Swiss workers are worried about the risk of automation
  • How people use ChatGPT, for work and not
  • Generative AI and entry-level employment