Monday, 15 December 2025

Grade inflation at New Zealand universities, and what can be done about it

Grade inflation at New Zealand universities has been in the news recently. This is a delayed reaction to this report from the New Zealand Initiative released back in August, authored by James Kierstead. He collected data on grade distributions from all eight New Zealand universities (via Official Information Act requests), and looks at how those distributions have changed over time. The results are a clear demonstration of grade inflation, and most clearly demonstrated in Figure 2.1 from the report:

Over the period from the mid-2000s to 2024, the proportion of New Zealand university students receiving a grade in the A range has increased at every New Zealand university, and by more than ten percentage points overall. Kierstead notes that:

Overall, the median proportion of A-grades grew by 13 percentage points, from 22% to 35%... The largest increases occurred at Lincoln, where the proportion of As grew by 24 percentage points between 2010 and 2024 (from 15% to 39%), more than doubling, and Massey, where they grew by 17 percentage points (from 19% to 36%) from 2006 to 2023.

A similar pattern of increases, although not as striking, is seen for pass rates, which in 2024 were above 90 percent at every university except Auckland. The results are also apparent across different disciplines, as shown in Figure 2.4 from the report:

Of course, this sort of grade inflation is common across other countries as well, and Kierstead provides a comparison that shows that New Zealand grade inflation is not dissimilar from grade inflation in the US, UK, Australia, and Canada.

Kierstead then turns his attention to why there has been grade inflation. He first dismisses some possible explanations such as better incoming students (NCEA results have not improved, although even if they had that might be due to grade inflation as well), more female students (the proportion of female students has been flat over the past ten years, while grades have continued to increase), better funding (bwahahahaha - in fact, funding per student has declined in real terms since 2019, while grades have continued to increase), and student-staff ratios (which have declined over time, but the student-academic ratio, which is the one that should matter most, has barely changed).

So, what has caused grade inflation? Kierstead describes it as a collective action problem, akin to the tragedy of the commons first described by Garret Hardin in 1968:

It is our contention that grade inflation is the product of a dynamic that is not dissimilar to the tragedy of the commons. Just like Hardin’s villagers, academics pursue a good (in this case high student numbers) in a rational way (in this case by awarding more high grades). And just as with Hardin’s villagers, negative consequences ensue, with a common resource (sound grading) being depleted, to the cost of every individual academic as well as others...

In the grade inflation game, the good that academics want to maximize is student numbers. Individual academics, on the whole, want to have as many students in their courses as possible. This suggests that they are popular teachers and can help get them promoted (and hence gain more money and prestige). It can also help make sure the courses they want to teach stay on the menu.

I like this general framing of the problem, where 'sound grading' is a common resource - a good that is rival and non-excludable. However, I would change it slightly, by thinking about the common resource as being A grades generally, which are depleted when the credibility of those grades reduces. In my slightly different framing, awarding A grades is rival in the sense that one person awarding more A grades reduces the credibility of A grades awarded by others. Awarding A grades is non-excludable in the sense that if anyone can award A grades, everyone can award A grades (while it is possible to prevent academics from awarding A grades, universities would probably prefer not to do so because that would reduce student satisfaction). So, while the social incentive for all academics collectively is to reduce the award of A grades to keep the credibility of those grades high, the private incentive for each academic individually is to increase the proportion of A grades awarded, leading to fame and fortune (or, more likely, leading to fewer awkward conversations with their Head of School as to why their grade distribution is too low, as well as better student evaluations - see here and here, for example). Essentially then, the incentives are for academics to inflate grades. The universities have few incentives to act to reduce grade inflation, since higher grades increase student satisfaction and lead to greater enrolments.

However, there is a problem. As Kierstead notes, grade inflation is well-termed because its effects are similar to the inflation that economists are more familiar with:

If universities hand out more and more As in a way that isn’t justified by student performance, the value of an A will go down. The same job opportunities will ‘cost’ more As as As flood the market. Students who worked hard will see the value of their As decrease over time, just as workers in the economy see their savings decrease in value due to monetary inflation.

So, what to do? Kierstead offers a few solutions in the report, including moderation of grades, reporting grades differently on transcripts, calculating grades differently, making post-hoc adjustments to grade point averages, having national standardised exams by discipline, changing the way that universities are funded to reduce the incentive to inflate grades, changing the culture of academics, and giving out prizes for 'sound grading'. I'm not going to dig into those different solutions, because sometimes the simplest one is the best one. With that in mind, I pick this:

Perhaps the simplest addition that could be made to student transcripts alongside letter grades is the rank that students achieved out of the total number of students on the course. So a student’s transcript might read, for example, ‘Classics 106: Ancient Civilizations: A- (27th of 252).’...

Adding ranking information restores some of the signalling value of grades without needing to reverse grade inflation itself. To see why, consider an example. If an employer has the transcripts of two students, one of whom got an A- grade in econometrics and ranked 17th out of 22 students, while the other student got a B grade and ranked 3rd out of 29 students, it's pretty clear that the grade might not be capturing the full picture of the students' relative merit. Kierstead worries about this simple solution because:

A limitation of rank-ordering is that it might suggest that students who achieved only a lowly ranking had performed badly, whereas they might well have performed very well in an especially difficult course.

Possibly, but the key point is not how well students did in the course, but how well they did relative to the other students in the class, which is exactly what the ranking provides. The benefit of this approach is that providing a ranking alongside the grade would reduce the incentives for students to cherry pick easy papers that award high grades, because a high grade on its own would not necessarily lead to a good ranking within the class.

Of course, there are potential problems with the simple solution. One such problem is that comparisons across different cohorts of students might not be fair. Taking the example of the two students I gave earlier, perhaps the student who got an A- grade and ranked 17/22 completed the paper in a cohort that was particularly smart, while the student who got a B grade and ranked 3/29 completed the paper in a cohort that was less smart. In that case, the grade without the ranking might be a better measure.

Kierstead's more complex solutions don't really deal well with the problem of between-cohort comparisons, and suffer from being more complicated for non-specialists to understand. A simple ranking, or a percentile ranking, is relatively easy for HR managers to interpret. Having said that, the between-cohort comparisons issue might not be too much of a problem in any case. My experience though, is for classes of a sufficiently large size (30 or more), the grade distributions do not differ materially (and if they do, it is usually because of the teaching or the assessment, not the students).

I can see some incentive issues though. Would students start to choose papers that they suspect that many weak students complete? Good students might anticipate that this would lead to a higher grade and a better ranking, which will look better on their transcript. On the other hand, is that really any worse than what students are doing now, if they choose papers that give out easy grades?

There are also potential issues with stigmatising students who end up near the bottom of a large class (how dispiriting would it be to have your transcript say you got a grade of E, and ranked 317th out of 319 students?). Of course, that could be solved to some extent by only providing ranking information for students with passing grades. And consideration would also be needed for how to deal with very small classes (is a ranking of 4th out of 5 students meaningful?).

Grade inflation is clearly a problem. It's not just nostalgia to say that an A grade is not what it used to be. Grade inflation has real consequences for employers, because the signalling value of high grades is reduced (see here for more on signalling in education). This means that there are also real consequences for high-quality students, who find it more difficult to differentiate themselves from average students. Solving this problem shouldn't involve government intervention to change university funding formulas, or trying to change academic culture. It shouldn't involve complicated statistical manipulations of grades. It really could be as simple as reporting students' within-class ranking on their academic transcripts.

The question now is whether any university would take it on themselves to do so. The credibility of university grades depends on it.

[HT: Josh McNamara, earlier in the year]

Read more:

Sunday, 14 December 2025

Online and blended learning lead to similar outcomes on average, at lower cost but lower student satisfaction

It's been a while since I've written about online or blended learning, which may seem surprising given the ample opportunities for us to learn about online learning during the pandemic. Perhaps I'm still dealing with the trauma of that, or perhaps I have just pivoted more to understanding the emerging role of AI in education. Nevertheless, I recently dipped my toes back into the research on online and blended learning, reading this 2020 article by Igor Chirikov (University of California, Berkeley) and co-authors, published in the journal Science Advances (open access).

Chirikov et al. evaluate a large multisite randomised controlled trial of online and blended learning in engineering, across three universities in Russia. As they explain:

In the 2017–2018 academic year, we selected two required semester-long STEM courses [Engineering Mechanics (EM) and Construction Materials Technology (CMT)] at three participating, resource-constrained higher education institutions in Russia. These courses were available in-person at the student’s home institution and alternatively online through OpenEdu. We randomly assigned students to one of three conditions: (i) taking the course in-person with lectures and discussion groups with the instructor who usually teaches the course at the university, (ii) taking the same course in the blended format with online lectures and in-person discussion groups with the same instructor as in the in-person modality, and (iii) taking the course fully online.

The course content (learning outcomes, course topics, required literature, and assignments) was identical for all students.

Their sample is made up of 325 second-year university students, with 101 randomly assigned to in-person, 100 to blended, and 124 to online. All students then completed the same final examination. Looking at student performance, Chirikov et al. find:

...minimal evidence that final exam scores differ by condition (F = 0.26, P = 0.77)... The average assessment score varied significantly by condition (F = 3.24, P = 0.039): Students under the in-person and blended conditions have similar average assessment scores (t = 0.26, P = 0.80), but those under the online condition scored 7.2 percentage points higher (t = 2.52, P = 0.012). This effect is likely an artifact of the more lenient assessment submission policy for online students, who were permitted three attempts on the weekly assignments.

The lack of a difference in student performance on average across different learning modes is a common feature of the literature (see the links at the end of this post). It would have been interesting if Chirikov et al. had undertaken a heterogeneity analysis to see whether online and blended modes advantage the more able and engaged students, while disadvantaging the less able and engaged students (also a feature of the literature on online and blended learning). The general result that online and blended learning provides benefits for top students but harms weaker ones is a point I’ve discussed many times before (see the links below for more).

Chirikov et al. then look at student satisfaction, and despite claiming that "we find minimal evidence that student satisfaction differs by condition", Table 3 in the paper does show that students in the online mode report a statistically significant five percentage points lower satisfaction than in-person students, while students in the blended mode report lower satisfaction (by about 2-2.5 percentage points) than in-person students, although the latter difference was not statistically significant.

Finally, Chirikov et al. evaluate the effect on the cost of education, finding that:

Compared to the instructor compensation cost of in-person instruction, blended instruction lowers the per-student cost by 19.2% for EM and 15.4% for CMT; online instruction lowers it by 80.9% for EM and 79.1% for CMT...

These cost savings can fund increases in STEM enrollment with the same state funding. Conservatively assuming that all other costs per student besides instructor compensation at each university remain constant, resource-constrained universities could teach 3.4% more students in EM and 2.5% more students in CMT if they adopted blended instruction. If universities relied on online instruction, then they could teach 18.2% more students in EM and 15.0% more students in CMT.

I don't think it will come as a surprise to anyone that online and blended learning are more cost-effective. There is little doubt that it has factored into some of the push towards online and blended learning across higher education over time.

Given that, in this study, both online and blended learning lead to similar outcomes on average, one might be tempted to suggest that they are good value for money from the university’s or funder's perspective. For cash-strapped institutions (or governments), the temptation to expand online provision on the back of such numbers is obvious. However, we should be cautious about drawing that conclusion. The lower student satisfaction in the blended and (especially) online modes should be a worry (at least to those who care about student satisfaction). And, as alluded to earlier, the average student performance can hide important heterogeneity between more engaged and less engaged students.

The real question here isn’t whether online and blended learning can be as effective on average, but whether we are comfortable trading lower satisfaction and potential for harms to less engaged students for lower cost of delivery and higher enrolments.

Read more:

Saturday, 13 December 2025

This Kansas City Chiefs conspiracy theory article is a mess

I have to admit to experiencing a non-trivial amount of schadenfreude this year, as the Kansas City Chiefs find themselves with a losing record in December for the first time in a decade. My mild animosity towards the Chiefs is based entirely on their supreme performance over that decade. After they've had a few losing seasons, I won't care anymore (which is how I feel about the Patriots right about now). However, there are plenty of people who have griped about the Chiefs, and claimed that the Chiefs receive favourable referee calls.

I'd label that a conspiracy theory, but it has apparently caught the attention of researchers. This recent article by Spencer Barnes (University of Texas at El Paso), Ted Dischman (an independent researcher), and Brandon Mendez (University of South Carolina), published in the journal Financial Review (sorry, I don't see an ungated version online), explicitly tests whether the Kansas City Chiefs receive favourable referee calls. Specifically, Barnes et al.:

...compare penalty calls benefiting the Mahomes-era Kansas City Chiefs (from 2018 to 2023) and the Brady-era New England Patriots (2015–2019) across the regular and postseason...

Barnes et al. argue that:

...financial pressures, particularly those related to TV revenue (the primary source of revenue for the NFL), serve as the underlying mechanism.

In other words, Barnes et al. claim that the NFL has a strong financial incentive to bias officiating in favour of the 2018-2023 Kansas City Chiefs, to a greater extent than any bias in favour of the 2015-2019 New England Patriots. As we’ll see, the empirical strategy is poorly chosen, parts of the results are misinterpreted, and the proposed TV-revenue mechanism is implausible. All up, you shouldn't believe this paper's results.

What did they do? Barnes et al. use play-by-play data covering the 2015 to 2023 seasons. They restrict their attention to defensive penalties only, which gives them a sample of 13,136 penalties across 2435 games. They apply a fairly simple linear regression model to the data:

Here we find the first problem with their analysis. If you want to show that the Mahomes-era Kansas City Chiefs benefited from more defensive penalties than other teams, you should be running a difference-in-differences analysis. Essentially, you compare the difference between the Chiefs and other teams, between the period before and the period after Patrick Mahomes started playing. In other words, you should test whether the Chiefs’ advantage in penalties grows after Mahomes started playing, compared with their earlier advantage and with other teams over the same period. Barnes et al. simply test for a level difference between the Chiefs and other teams during that time (using the 'Dynasty' variable), but fail to account for whether the Chiefs might already benefit from more defensive penalties before Mahomes became the starting quarterback (in 2018). Indeed, Figure 1 in the paper shows that the Chiefs did benefit from more defensive penalties per game before 2018:

That difference prior to 2018 should be controlled for. Having said that, the difference from the rest of the NFL teams looks bigger from 2018 onwards (but mostly concentrated in 2018-19, and in 2023), so if they had used the more correct difference-in-differences model (or, when comparing regular and post-season, a triple-differences model), they might still have found a statistically significant effect.

There is a further, albeit more minor, issue with the analysis. Barnes et al. control for 'defensive team fixed effects', which they argue controls "for differences in how opposing teams play defense and how frequently they are penalized". However, teams change the way they play defence, particularly when the defensive coordinator changes. So really, they should have used defensive-team-by-season fixed effects there, which would allow the way a team plays (and gets penalised) to vary from season to season, and control for that.

Barnes et al. look at the effect on several outcome variables:

Our primary dependent variables capture different dimensions of officiating decisions. The first is Penalty Yards, which measures the total yards gained or lost due to penalty calls. If the NFL or its officials favor a particular team, we expect them to benefit from potentially more penalty yards assessed against their opponents. The second variable, First Down, is a binary indicator that takes a value of 1 if a penalty call results in an automatic first down. Because first downs have a direct impact on a team’s ability to sustain drives and score points, this measure captures whether penalties disproportionately help a team advance the ball. The third variable, Subjective, is a binary indicator equal to 1 if the defensive penalty falls into a category requiring referee discretion...

The 'Subjective' variable is described in the appendix to the paper, and appears to be far too inclusive since it includes penalties like 'Face Mask' and 'Horse Collar Tackle' that seem to me not to be particularly subjective (and those two categories alone made up 6 percent of all penalties, and a much higher proportion of the 'subjective' penalties).

Putting aside the issues with the analysis for a moment, Barnes et al. find that:

...penalties against Kansas City during the regular season result in 2.02 fewer yards (𝑝 < 0.01), are 8 percentage points less likely to have a penalty call that results in a first down (𝑝 < 0.01), and are 7 percentage points less likely to have subjective penalties (𝑝 < 0.05) compared to the rest of the NFL. This pattern is decisively reversed in postseason contests, where penalties against the Chiefs offense yield 2.36 more yards (𝑝 < 0.05), are 23 percentage points more likely to have a penalty call that results in a first down (𝑝 < 0.01), and are 28 percentage points more likely to have subjective calls (𝑝 < 0.01) compared to the rest of the NFL in the playoffs.

Barnes et al. have explained this incorrectly. Notice their wording suggests the penalties are called on Kansas City (i.e. hurting the Chiefs). Their analysis actually shows that penalties against Kansas City Chiefs' opponents result in 2.02 fewer yards during the regular season, and penalties against Kansas City Chiefs' opponents (not the Chiefs offense) yield 2.36 more yards in the postseason. At least, that is according to the notes to their Table 3, which says:

The dependent variable in Columns (1) and (4) is the realized yardage for the offensive team resulting from a penalty on the defensive team... The independent variable of interest, Kansas City Chiefs, is a binary indicator variable that equals 1 if the offensive team is the Kansas City Chiefs and 0 otherwise.

So, the correct way of interpreting those results is penalties against the opposing defence, not penalties against Kansas City. Barnes et al. then turn to applying the same analysis to the 2015-2019 New England Patriots, and find effects that are mostly statistically insignificant (and small). For other teams that might arguably be called a 'dynasty' (for a sufficiently low bar for what constitutes a dynasty, Barnes et al. find no evidence of differences in defensive penalty calls. That sample includes the Philadelphia Eagles (2017-2023), the Los Angeles Rams (2018-2023), and the San Francisco 49ers (2019-2023).

At this point, the problem with the mechanism starts to become clear. Barnes et al. start to look at TV viewership, and argue that:

If certain teams, particularly those associated with high-profile players, systematically attract larger audiences, then maintaining the success or visibility of those teams may align with the league’s broader financial interests.

If the NFL wanted to attract a larger audience, and aimed to do so by biasing officiating in favour of a particular team, why on earth would they choose a small market team like Kansas City? Surely they would want to boost a large-market team? According to this ranking, Kansas City is only the 35th-largest sports media market in the US. Now, Patrick Mahomes is a star quarterback (he was the 10th overall pick in the 2016 NFL draft), so maybe it's the combination of star quarterback and media market that matters. However, Tom Brady was also a star quarterback, and Boston is the 10th-largest sports media market. So, why weren't the Patriots getting favourable calls in 2015-2019? If, as Barnes et al. seem to argue, the NFL was going through some particular challenges in 2016, then Kansas City is still not the obvious choice for biased officiating. They should have favoured the LA Rams (in the second-largest sports media market, with star quarterback Jared Goff, the first overall pick in the 2016 NFL draft).

Barnes' et al.'s argument falls apart. Their TV viewership analysis does show that:

...the Chiefs’ emergence as a marquee team coincided with a material increase in viewership interest, consistent with the broader financial incentives we hypothesize.

However, that analysis also has issues, because they don't control for the win/loss record of the teams in each game (and winning teams likely attract more TV viewers). And, all it really tells you is that Patrick Mahomes attracts a big TV audience. He is a good player. That's what they do. Higher ratings for teams with star players is not evidence that referees are biased. As noted above, if the NFL thought that way, they should have preferred biasing the officiating towards the LA Rams instead, and Barnes et al.'s analysis shows that didn't happen.

As a final point, there is a real risk that the analysis in this paper gets causality backwards. Did the Chiefs get favourable referee calls because they are a dynasty, or did they become a dynasty because they received favourable referee calls at key moments? Barnes et al. never consider the possibility of reverse causality. Overall, the paper does much more to flatter an existing conspiracy theory than to seriously test it. Even if we take their estimates at face value, nothing in the paper convincingly links referee calls to incentives to increase NFL TV viewership.

[HT: Marginal Revolution]

Friday, 12 December 2025

This week in research #105

Here's what caught my eye in research over the past week:

  • Gillespie et al. (with ungated earlier version here) find evidence of landlord exit from the rental market, specifically after rent controls were tightened in 2021 in Ireland, meaning that rent controls are associated with more sale listings and fewer rental listings/registrations
  • Pagani and Pica (open access) find that exposure to a higher share of same-gender math high achievers is related to better academic performance among Italian primary school children, for both boys and girls, three years later
  • Dutta, Gandhi, and Green (open access) find, using data from India, that relaxing rent control leads to higher rents and decreases rural-urban migration, while easing eviction laws increases the conversion of rental units into owner-occupied housing and increases the prevalence of 'marriage migrants'
  • Couture and Smit find no evidence that Federal Open Market Committee officials in the US select securities that earn abnormal returns
  • Bergvall et al. (open access) find, using Swedish data, that find that following the start of their PhD studies, psychiatric medication use among PhD students increases substantially, continuing throughout their studies to the point that by the fifth year medication use has increased by 40 percent compared to pre-PhD levels (more reason to worry about the mental health of PhD students)
  • Bagues and Villa (open access) find that, after Spanish regions increased the minimum legal drinking age from 16 to 18 years, alcohol consumption among adolescents aged 14-17 decreased by 7 to 17 percent and exam performance improved by 4 percent of a standard deviation
  • Fan, Tang, and Zhang find, using data on university relocations in China in the 1950s, that there were substantial effects on total employment, firm numbers, and productivity in industries technologically related to the relocated departments
  • Chikish and Humphreys find that surgical repair of UCL injuries extends post-injury MLB pitcher careers by roughly 1.3 seasons relative to matched uninjured pitchers, and that post-injury and treatment pitcher performance improves by roughly 8 percent
  • Chegere et al. (open access) conduct an experiment investigating how regular sports bettors in urban Tanzania value sports bets and form expectations about winning probabilities and find that people assign higher certainty equivalents and winning probabilities to sports bets than to urn-and-balls lotteries with identical odds, even though, in fact, they are not more likely to win
  • Seak et al. (with ungated earlier version here) find that experimental choices by both humans and monkeys violated the independence axiom across a broad range of reward probabilities (both monkeys and humans are not purely rational decision-makers)

Thursday, 11 December 2025

Do men and women pitch science proposals differently, and does it matter for funding outcomes?

If male academics and female academics write academic papers and grant proposals differently, does that lead to different outcomes by gender? Past studies have worried about whether grant funding decisions are affected by gender bias (see here, for example), and differences in writing style may contribute to that. However, the article I discussed in this post from earlier this year concluded that there was little evidence of bias in grant funding, at least since 2000 in the US.

Nevertheless, I thought it would be interesting to read this 2020 article by Julian Kolev (Southern Methodist University), Yuly Fuentes-Medel, and Fiona Murray (both MIT), published in the journal AEA Papers and Proceedings (ungated here), because it not only looks at the grant funding decisions, but also at writing style. Kolev et al. focus on grant applications submitted to the Bill and Melinda Gates Foundation and the National Institutes of Health (NIH) over the period from 2008 to 2016. The sample includes 6931 Gates Foundation applications and 12,589 NIH applications.

Kolev et al. first subject the applications to textual analysis to the abstract of each application, evaluating the positivity of the text (and the extent to which the word "novel" is used), the readability (using the Flesch reading ease score), concreteness of the language (as opposed to abstractness), and three measures of how narrow or broad the abstract is. In this textual analysis, they find that:

...female applicants are less likely to present their research using positive vocabulary, they are more likely to write with high readability, and they prefer concrete language. Moving to our final three measures, we find an interesting dichotomy: even as female applicants use fewer broad words and more narrow words in their abstracts, we find that their research is characterized by lower MeSH concentrations, meaning that they cover a wider range of medical subjects in their work, at least within the NIH sample. Effect sizes are relatively small: the impact of gender ranges from approximately 0.04 to 0.08 standard deviations for our significant effects.

So, there are small but statistically significant differences in writing style between male academics and female academics in these funding proposal abstracts. Does that translate into differences in outcome? Kolev et al. test for whether the measures of writing style correlate with funding outcome, while controlling for:

...calendar time and application topic fixed effects, controls for total word count and the count of relevant words for dictionary-based metrics, and applicant publication history and gender.

In this analysis, Kolev et al. find that:

For Gates applicants, high levels of concreteness tend to improve the odds of funding; by contrast, at the NIH, we find a strong positive impact for MeSH concentration and marginal effects for both broad and narrow words.

So, the evidence is weak that writing style matters, or that writing style differences between the genders affect the success of funding applications. However, not so fast. There is a key problem with this analysis. If you read the quote above about the control variables in this second analysis, you may note that they control for gender. That might sound sensible, but if you're wanting to evaluate whether writing style differences between the genders affect funding outcomes, you don't want to control for both writing style and gender. What Kolev et al. have actually tested is whether writing style differences within each gender affect funding application outcomes, finding that they don't. As an example, their analysis doesn't answer the question of whether readability differences between male and female academics affect funding outcomes, it answers the question of whether readability differences matter overall (which it appears they don't), controlling for the average difference in funding outcomes between men and women. Those are quite different questions.

In other words, we probably want to know whether style mediates the effect of gender on funding outcomes, but this analysis doesn't do that. Instead, they should either run the analysis with gender as the main explanatory variable, then add the style variables and see if the coefficient on the gender variable shrinks, or run the analysis with interactions between gender and the style variables.

The results are, on the one hand, surprising. Quality of writing should matter. Stylistic differences should matter less. However, the quality of the proposed research should matter even more than the quality or style of writing. And this study wasn't even evaluating the quality or style of all of the writing (or the quality of the proposed research), only the style of writing in the abstract for the proposal. That, along with the issue with the second analysis above, make this paper of limited use for understanding whether there is a gender difference in funding outcomes (and, if there is, whether writing style differences contribute to the difference). The difference in writing style is an interesting result in itself, but we need to know more.

Read more:

Wednesday, 10 December 2025

People care about whether their data are shared, but not so much where their data are stored

There has been a substantial policy movement in favour of the localisation of data storage over the past decade (for example, see here). Policymakers often justify data localisation policies by appealing to consumers' supposed preference for having their data stored locally. In particular, they refer to privacy concerns, lack of trust in data handling practices in other countries, and preference for supporting local data storage firms. However, the evidence that consumers have strong preferences for data localisation is very thin. In fact, this new article by Jeffrey Prince (Indiana University) and Scott Wallsten (Technology Policy Institute), published in the journal Information Economics and Policy (ungated earlier version here), may represent the first attempt to really evaluate consumers preferences for data storage.

Prince and Wallsten use a discrete choice survey to evaluate preferences for localisation for different types of data. Specifically:

We constructed five different survey structures, one each centered on the respondent’s smartphone, financial institution, healthcare app, smart home device, and social media. The data types we consider include home address, phone number, income, financial activity, health status and activity, biometrics, music preferences, location, networks, and communications. Across the five survey structures and range of data types, we measure the relative value of full privacy (no data sharing) versus sharing only domestically (localization), sharing domestically and internationally (no localization), and sharing domestically and internationally excluding China and Russia (no localization but with limits). We administered each of these five different surveys across seven different countries: the United States, the United Kingdom, South Korea, Japan, Italy, India, and France

Their sample, drawn from Dynata's online panel, is 11,375 respondents, with 325 completed surveys for each of the five survey structures, for each of the seven countries. Each respondent was shown ten different discrete choice questions. In each question, respondents would have been shown hypothetical alternative scenarios about how their data could be stored and shared, and had to pick their preferred alternative. However, the article doesn't make clear how many alternatives the respondent was choosing from in each choice task, nor whether they simply chose the best of the alternatives, or provided a full ranking of all of the alternatives. Those are issues that are consequential for the analysis, but probably don't bias the results in any way.

In terms of data localisation, Prince and Wallsten distinguish between data not being shared at all, and data being stored and shared domestically only, internationally, or internationally while excluding China and Russia. The latter is included because consumers may be more concerned about their data being stored in China or Russia than being stored in other countries. First, Prince and Wallsten find that:

...virtually all of our parameter estimates are highly significant. As these are estimates of (dis) utility from sharing data in one of three ways (domestically only, internationally, internationally except China and Russia) versus not sharing, the consistent, negative and statistically significant estimates imply that respondents across all of our countries are averse to sharing their data.

In other words, people really don't like their data being shared, regardless of how or where it would be shared. However, in terms of data localisation, Prince and Wallsten find that:

...it is evident that there are a just handful of data types for which we find any notable data localization premium: bank balance, facial recognition, home address, and phone number, all with multiple instances, and voiceprint, with one instance.

Interpreting these results, Prince and Wallsten note that:

...the data types for which we find a data localization premium are also the data types for which citizens find the most value in having no sharing of any kind... citizens across our seven countries, by and large, place little to no value in data localization requirements, despite placing value on full privacy for these data (i.e., no domestic or international sharing)...

In addition, there are no differences between sharing internationally, and sharing internationally while excluding China and Russia. If anything, there is some weak evidence that respondents in South Korea and Japan preferred to have their data shared with China and Russia. That’s striking given how often policymakers highlight the dangers of data flowing to China and Russia. Prince and Wallsten conclude that:

Our findings have several implications. First, they suggest that the use of privacy concerns as motivation for data localization laws may be overstated, although there may be some gross welfare gains for some types of data. Our findings also indicate that if international sharing is allowed, restricting prominent authoritarian countries such as China and Russia appears to have little impact on consumer value, at least for a number of highly populated countries...

...our findings do provide a counterweight to any claim that citizens find value from imposing constraints on international data sharing.

It may still be worthwhile for policymakers to insist on data localisation. Of course, this is just one study (albeit the first study) using survey data from an online panel, so we should be cautious about overgeneralising. Nevertheless, based on this study, the argument that data localisation reflects consumers' preferences for data storage does not hold up to scrutiny. If they want to keep pushing data localisation, policymakers will need to lean on geopolitical or protectionist arguments instead.

Sunday, 7 December 2025

Opportunity costs may lead attractive people to play video games less than unattractive people

There is a stereotype that gamers are physically unattractive compared with non-gamers. However, it is unlikely that gaming causes people to become less attractive, and more likely that causality runs in the other direction (that more attractive people are less likely to game than less attractive people). And that is one of the findings of this 2024 NBER Working Paper by Andy Chung (University of Reading) and co-authors. However, Chung et al. aren't interested in only identifying that there is a descriptive relationship between attractiveness and gaming, but looking at the mechanism. They propose the following:

Given that physical attractiveness confers advantages in face-to-face interactions within social or leisure activities, individuals deemed more physically attractive will face a higher opportunity cost of engaging in video-gaming. Consequently, we hypothesize a negative relationship between beauty and gaming time, suggesting that individuals considered more attractive are likely to spend less time gaming. In other words, good-looking gamers will be relatively scarce because of the higher cost of gaming that they face.

They test their hypotheses using National Longitudinal Study of Adolescent to Adult Health (Add Health), which has:

...a representative sample of American adolescents spanning grades 7 through 12 (generally ages 12-18) during the 1994-95 school year, with four follow-up waves, the most recent collected between 2016 and 2018.

Chung et al. use data from Wave I (from 1994-95) and Wave IV (from 2008), representing teenagers and adults respectively. Interestingly:

In each wave, at the end of each interview, the field interviewer rated the physical attractiveness of the respondent...

This gives a rating of 1-5 (1 being 'very attractive' and 5 being 'very unattractive') for each respondent. Given small numbers of ratings 4 and 5, Chung et al. combine those two categories together. They also create a binary indicator for 'attractive', equal to one if the rating is 1 or 2, and equal to zero otherwise. In terms of gaming:

we consider the time spent video gaming based on the interviewees’ responses to the following question:...

In the past seven days, how many hours did you spend playing video or computer games, or using a computer? Do not count internet use for work or school.

Looking at attractiveness and teen gaming, Chung et al. find that:

...the estimated coefficient for being “attractive or very attractive as a teen” is negative and statistically significant at the 10% level, while the coefficient for “unattractive or very unattractive as a teen” is negative but essentially zero statistically... The effects of looks are not negligible; for example, the difference in the incidence of gaming between attractive and unattractive teens is 2.9 percentage points, compared to a mean incidence of 54 percent.

So, Chung et al. confirm the hypothesis that more attractive teens are less likely to game than less attractive teens. Interestingly though, the effect seems to be limited to the extensive margin (whether a person games or not), and not on the intensive margin (how many hours they spend gaming), since the latter shows a statistically insignificant relationship with attractiveness. Turning to attractiveness and adult gaming, Chung et al. find that:

As with teens, good-looking adults are less likely than others to game, while the few adults rated as bad-looking are more likely to engage in gaming (although not significantly more so than average-looking adults)... Moving from the bad-looking 8 percent of adults to the good-looking 44 percent reduces the likelihood of gaming by over 10 percentage points, which is about 26 percent of the average incidence.

In the case of adult gaming, there are also significant effects on the intensive margin, since:

 ...good-looking adults who do game spend significantly less time doing it than average-looking adults, whose gaming time is, albeit insignificantly, less than that of the small group of bad-looking adults...

As at the intensive margin, the impact of differences in looks on gaming hours is substantial. Compared to bad-looking adults, good-looking adults who game spend, on average, 2.05 hours fewer doing so per week, which represents 27 percent of the mean conditional gaming time.

Next, in terms of their proposed mechanism, Chung et al. look at the effect of attractiveness on the number of reported friends, and find that:

Physically attractive/very attractive adults have about 0.4 more close friends (on a mean of 4.9) than those who are of average attractiveness. Conversely, the small fraction of those whose looks are rated unattractive/very unattractive claims about 0.7 fewer close friends than those with average attractiveness. The gap in the average number of close friends between the good- and the bad-looking is thus 22% of the overall mean.

Chung et al. also expend a good deal of effort on checking for reverse causality - that is, whether gaming reduces attractiveness. However, it does appear from their results that gaming as a teenager does not make adults less attractive, so Chung et al. are able to conclude that:

The relationship between looks and gaming does not arise because gaming makes people bad-looking: the causation appears to go from looks to gaming, not vice-versa.

Overall, if we believe the mechanism that Chung et al. propose, being more attractive raises the opportunity cost of gaming. Chung et al. note that:

...the better-looking have a higher opportunity cost of gaming as they have a comparative advantage in social interactions as an alternative leisure activity that is evidenced by more close friends.

And, as I teach in the first week of my ECONS101 and ECONS102 classes, when the opportunity cost of doing something increases, we tend to do less of it. So, attractive people game less because of their higher opportunity cost of gaming.

UPDATE: Just a couple of days after I posted this, the paper was published here (open access), in the Journal of Economic Behavior and Organization.

[HT: Marginal Revolution, last year]

Friday, 5 December 2025

This week in research #104

This week I hosted the ANZRSAI (Australia New Zealand section of the Regional Science Association international) conference in Hamilton. Hosting a conference keeps you pretty busy, but I still managed to attend some sessions, and here are some of the highlights I found from the conference:

  • Bruce Newbold kicked off the conference with an excellent keynote on the mobility of older people in Canada, and I was surprised how many older people move from other provinces to Alberta, and how few move to the Atlantic provinces
  • Bill Cochrane presented on residential segregation by occupation (a proxy for socioeconomic status or class), showing that the population became more segregated between 2013 and 2018, but not between 2018 and 2023 (and Hamilton was an outlier on various aspects of the analysis, possibly because the satellite towns of Te Awamutu and Cambridge were not included)
  • Michelle Thompson-Fawcett's keynote showed how urban design can incorporate Mātauranga Māori, and in my view this talk pointed to one way of considering Indigenous regional science and urban planning
  • Robert Tanton showed how he created a synthetic population equivalent to the Australian Census, to explore access by older people to doctors (but the use cases for a synthetic census population are far wider than that)
  • Iain White closed the conference with an excellent keynote on urban growth and climate change resilience, drawing on many of his previous research projects

Aside from the conference, here's what caught my eye in research over the past week:

  • Smith and Grimes (open access) explore the impact of income measurement issues on the estimated relationship between income and life satisfaction
  • Prince and Wallsten (with ungated earlier version here) find that there is not a strong preference for data to be stored locally, except for data types where privacy is already of high value, such as financial and biometric data, and home address and phone number
  • Berens, Henao, and Schneider (with ungated earlier version here) find that abolishing moderate tuition fees in Germany led students to reduce their academic effort, by postponing graduation and withdrawing from registered exams, and that the number of 'ghost students' increased
  • Hua and Humphreys find that new players whose careers started at the time of the cancelled 2004-05 NHL season experienced shorter careers than those not exposed (including European players)
  • Banerjee et al. (open access) conduct an experiment on a major international online freelancing platform, and find that, while both men and women prefer flexible work hours, the elasticity of response for women is twice that for men
  • Guelmamen, Garcia, and Mayol (open access) find that while inter-municipal cooperation in water supply in France is associated with higher water prices, these increased tariffs are offset by better network performance, as indicated by lower water loss indices and improved water quality (seems important given the trajectory of change in New Zealand right now)
  • Palacios-Huerta (with ungated earlier version here) reviews the beauty that is using sports as a setting for testing models and hypotheses

Thursday, 4 December 2025

The decline in high school economics in Australia, and what should be done about it

Since I became a lecturer in economics some twenty years ago, one thing that has become apparent has been the decline in the number of students studying economics at high school. When I taught my first classes, nearly half of students (most of which were management students, or social science students) had taken at least one economics class at high school. Now, it is down to less than one quarter. One of the contributing factors to that decline is that many schools replaced economics as a subject with business studies.

So, I was interested to read this new article by Tanya Livermore and Mike Major (both Reserve Bank of Australia), published in the journal Australian Economic Papers (ungated earlier version here). They find similar results for Australia, which are summarised in Figure 1 of the paper:

The figure shows economics enrolments in Year 12 (the final year of high school in Australia) declining by around two-thirds between the mid-1990s and 2023. At the same time, the gender split in economics enrolments has grown, with the male share of economics high school enrolments increasing from near-parity to over two-thirds by 2023. Neither trend is a good thing.

Livermore and Major look to understand why these trends have occurred. First, they refer to qualitative evidence collected from educators, which concludes that a number of factors are to blame:

First, too few educators are equipped to teach Economics and too little relevant Australian economics content is available, providing school leaders with limited incentive to offer (or promote) the subject. Second, it has been reported that many students do not select Economics because they do not understand what it is and how it might be relevant to them... Third, the introduction of Business Studies to the NSW Higher School Certificate (HSC) in the early 1990s saw a large number of students take up the subject instead of Economics, with reports that Business Studies, which is more vocationally oriented, is perceived as being easier to learn and more helpful for employment...

If Livermore and Major had written the same sentences in relation to New Zealand, I expect they would be equally valid. To further explore the factors associated with taking (or not taking) economics at high school, Livermore and Major then rely on a survey of students in NSW in Years 10 to 12. The survey was undertaken at 51 schools in 2019, and the sample size is over 4600 students. Of the Years 11-12 students, less than 10% were studying any economics. Livermore and Major also look at the school-level factors that are associated with whether or not a school offers economics (and for this, they have a sample of 768 schools). In this school-level analysis, they find that:

...schools are significantly more likely to teach Economics if they have a higher ICSEA score, a larger Year 12 cohort, teach a larger variety of subjects, or are all boys.

Again, these results would not come as a surprise if they were described as being for New Zealand schools. Larger schools, and those that teach a larger variety of subjects, will be more likely to offer economics. ICSEA is the Index of Community Socio-Educational Advantage, so more advantaged schools are more likely to offer economics. And single-sex boys' schools are more likely to offer economics (which raises the question of what girls' schools are offering instead of economics). Turning to the proportion of students who study economics (and accounting for the fact that not all schools offer economics), Livermore and Major find that:

...being at a school with a higher ICSEA score is associated with increased demand for Economics amongst students... Non-government schools experience lower demand for Economics relative to government schools, holding ICSEA and other characteristics constant. Relative to co-ed schools, all-boys schools are associated with greater student demand for Economics, and all-girls schools are associated with less.

I would have thought that non-government schools (most of which are likely to be religious schools) would have been more likely to offer economics. At least, that is my impression of New Zealand schools, but that might be a key difference with NSW schools. Anyway, turning to individual students' subject choice, Livermore and Major find that:

...males are more likely to choose Economics than females, even when controlling for school characteristics...

ICSEA is also significantly associated with taking economics at the student level, with students from more advantaged schools more likely to study economics. Turning to the survey results, Livermore and Major first identify the positive and negative perceptions that students have about economics, finding that:

What positive perceptions do students have about Economics? Students typically believe that economics can be used for social good, is not all about money, and that an economics degree leads to a wide range of career options...

What negative perceptions do students have about Economics? Students generally do not perceive Economics as interesting and have little desire to know more about it. Economics is perceived as having a heavier workload than most other Year 11 and 12 subjects. Although Economics is seen as providing skills and tools for everyday life, students generally indicated they prefer to study Business Studies because they think it will be more useful for their future and more interesting... Although students perceive an economics degree to lead to a wide range of career opportunities, students are less likely to have a clear understanding of Economics (the subject) or the careers available if they were to choose Economics (as a subject).

The sheer weight of perceptions is clearly to the negative side, and that is a worry. Does it explain the gender difference in enrolments? Livermore and Major find that:

...females were less likely than males to ‘have a good understanding of what Economics is’, ‘find Economics interesting as a subject’ or ‘want to know more about Economics’... Females were also more likely than males to perceive Business Studies as easier, more useful and more interesting than Economics. In terms of career development, females were less likely to have clear or positive perceptions of career opportunities from studying economics.

Economics at high school clearly has an image problem. I am convinced this is true in New Zealand as well, based on the number of students in my first-year economics classes who express how much different economics is than what they thought it was going to be (I take that as a compliment!). If we want to increase student enrolments in economics, and narrow the gender gap, then the image problem needs to be addressed. Livermore and Major conclude that:

...one possible intervention to address diversity deficits in Economics is to improve students' understanding of what Economics entails.

Yes, but how? Sadly, Livermore and Major have pointed out the problem, identified the source, but not offered much in the way of solutions. Unfortunately, it is not as simple as providing students with information (see this post, and the links at the end of that post). We may need to look at fundamentally changing the way that economics is taught at high school. Some of us have made substantial changes to the teaching of university economics, and I believe it has had a positive effect (at least, there is evidence of gender parity at the top of our introductory economics classes, and in economics majors at Waikato). Those changes (or related changes) to teaching need to be allowed to propagate down to high school. And that is especially important given that we are finding that studying high school economics causally improves students' performance in introductory economics at university (I'll have much more to say on that research, with one of my Masters students, in a future post).

If the NSW evidence travels to New Zealand (and my classes suggest it does), then economics doesn’t have a demand problem so much as an information and curriculum design problem. High school students are choosing business studies instead of economics because business studies looks clearer, closer to jobs, and less risky. Fixing economics doesn’t mean dumbing the subject down. It means teaching the really interesting stuff earlier and better. Ditch the abstract mathematics, and focus on real-world applications, especially those with a more social focus. That's what we have done in introductory economics at Waikato. In my experience, that approach keeps the students, and the teachers, more engaged, and will hopefully allow economics to regain some of its lost ground.

Wednesday, 3 December 2025

The lifespan benefit of being elected to the MLB Hall of Fame

There is a clear difference in life expectancy between the rich and the poor (see this post, for example). However, disentangling how much of the life expectancy differential is a causal effect of socioeconomic status on mortality is difficult, because there are so many things that affect both socioeconomic status and mortality. This recent article by Chengyuan Hua and Brad Humphreys (both West Virginia University), published in the journal Economics Letters (sorry, I don't see an ungated version online), takes an interesting approach to answering the question.

Hua and Humphreys look at lifespan of professional baseball players, comparing those that have been elected to the MLB Hall of Fame with those who narrowly missed out on election. The idea is that election to the Hall of Fame increases socioeconomic status, and so comparing those who were elected and those who were not but are otherwise similar, means that the difference attributable just to the change in socioeconomic status can be identified.

In relation to election to the Hall of Fame, Hua and Humphreys note that:

Baseball players elected to the HoF must appear on 75% of the annual ballots cast, get removed from the ballot after appearing on fewer than 5% of ballots, and can only appear on a limited number of consecutive ballots...

The exogenous 75% election threshold permits a fuzzy regression discontinuity design (RDD) to identify the causal effect of HoF election on longevity.

Their dataset:

...includes the universe of candidates eligible for HoF induction from 1936 to 2024. We divide the sample into two groups: a treatment group of 131 players voted into the HoF while alive and a control group of 1067 players nominated by the BBWAA but not inducted.

Comparing the two groups, Hua and Humphreys find that:

...HoF members live 1.97 years longer than HoF nominees.

Hua and Humphreys go on to look at possible mechanisms that might explain the lifespan benefit of Hall of Fame election. They find that:

...HoFers are 5.8 p.p. more likely to become an MLB manager... MLB managers lived 2.86 years longer than their counterparts. We interpret this as evidence that HoFers are more likely to become MLB managers, a high-paying occupation.

In other words, Hua and Humphreys argue that the mechanism is that higher socioeconomic status leads to a better paying occupation, which in turn leads to longer lifespan. Of course, it could be more likely that healthier players are more likely to become managers, so the RDD approach isn't as clean in terms of identifying the mechanism. Nevertheless, it is plausible.

Now, what these results tell us more broadly about socioeconomic status and lifespan is unclear. Baseball players are very different from the general population. The sample here is both unusually affluent and unusually healthy, before we even consider the effect of raising their socioeconomic status. At best, these results tell us something about groups at a similar prior level of affluence and health.

Nevertheless, the implications for professional baseball players are clear. It's Hall of Fame or bust (two years earlier)!

Friday, 28 November 2025

This week in research #103

Here's what caught my eye in research over the past week (clearly a very quiet week!):

  • Jurkat, Klump, and Schneider (with ungated earlier version here) report on a meta-analysis of 55 papers containing 2,468 estimates of the impact of industrial robots on wages, finding that the overall effect is close to zero and statistically insignificant
  • Chekenya and Dzingirai find, using African data from 1997 to 2014, that migration significantly increases conflict incidence, with effects concentrated in countries and regions in Africa with weak governance and economic stress
  • Cafferata, Dominguez, and Scartascini (with ungated earlier version here) find that overconfident individuals (in the US and Latin America) are more willing to accept the use of guns and more likely to declare their willingness to use guns
  • Bucher-Koenen et al. (with ungated earlier version here) find that financial advisors in Germany offer more self-serving advice to women, while men are more likely to receive sales fee rebates and less likely to be recommended expensive in-house multi-asset funds

And the latest paper from my own research (or, more accurately, from the thesis research of my successful PhD student Jayani Wijesinghe, on which I am a co-author along with Susan Olivia and Les Oxley):

  • Our new article (online early version, open access) in the journal Economics and Human Biology describes the patterns of lifespan inequality at the state level in the United States between 1959 and 2018, and identifies the state-level demographic and socioeconomic factors that are associated with lifespan inequality

Wednesday, 26 November 2025

Shots fired at the end of a debate on contingent valuation

I have written a number of posts about debates on the contingent valuation method (most recently here, but see the links at the end of this post for more). A 2016 debate that I blogged about here, was picked up again in 2020 (but I didn't blog about it then because I was kind of busy trying to manage the COVID lockdown-online teaching debacle). So, what happened? The first of two 2020 articles published in the journal Ecological Economics (sorry I don't see an ungated version online) is by John Whitehead (Appalachian State University), a serial participant in contingent valuation debates.

This part of the debate centres on 'adding up tests', which essentially test for scope problems. To reiterate (from this post):

Scope problems arise when you think about a good that is made up of component parts. If you ask people how much they are willing to pay for Good A and how much they are willing to pay for Good B, the sum of those two WTP values often turns out to be much more than what people would tell you they are willing to pay for Good A and Good B together. This issue is one I encountered early in my research career, in joint work with Ian Bateman and Andreas Tsoumas (ungated earlier version here).

An 'adding up test' tests for whether the willingness to pay for the global good (Good A and Good B together) is more than adding the willingness to pay for Good A alone to the willingness-to-pay for Good B alone. In relation to this particular debate, Whitehead summarises where we are up to:

Desvousges et al. (2012) reinterpret the two-scenario scope test in Chapman et al. (2009) as a three-scenario adding-up test. They then assert that the implicit third willingness-to-pay estimate is not of adequate size. Whitehead (2016) critiques the notion of the adding-up test as an adequacy test and proposes a measure to assess the economic significance of the scope test: scope elasticity. Chapman et al. (2016) argue that Desvousges et al. (2012) misinterpret their scope test. Desvousges et al. (2016) reply that they did not misinterpret the Chapman et al. (2009) scope test and assert that their adding-up test in Desvousges et al. (2015) demonstrates one of their points.

Desvousges et al. (2015) field the Chapman et al. (2009) survey with new sample data collected with a different survey sample mode than that used by Chapman et al. (2009) and three additional scenarios. Desvousges et al. (2015) conduct an adding-up test and argue that willingness-to-pay (WTP) for the whole should be equal to willingness-to-pay for the sum of four parts (the first, second, third and fourth increment scenarios). Desvousges et al. (2015) find that “The sum of the four increments … is about three times as large as the value of the whole” (p. 566).

Whitehead joins the debate on the side of Chapman et al., defending them by examining Desvousges et al.'s analysis and showing that it actually does meet an 'adding up test', thereby showing that there are no scope problems in the original Chapman et al. paper. Whitehead concludes that there are a number of problems in the Desvousges et al. analysis:

First, they do not elicit WTP estimates explicitly consistent with the theory of the adding-up test. Their survey design suggests that a one-tailed test be conducted where the sum of the WTP parts is expected to be greater than the WTP whole. Second, there are several data quality problems: non-monotonicity, flat portions over wide ranges of the bid function and fat tails. Each of these data problems leads to high variability in mean WTP across estimation approach and larger standard errors than those associated with nonparametric estimators that rely on smoothed data.

I'm not going to get into the weeds here, because what I want to highlight is the response by William Desvousges, Kristy Mathews (both independent consultants), and Kenneth Train (University of California - Berkeley), also published in the journal Ecological Economics (and also no ungated version available). The response is only two pages long, and is a very effective takedown of Whitehead. Along the way, Desvousges et al. note that Whitehead:

...made numerous mistakes in his calculations... When these errors are corrected, adding-up fails for each theoretically valid parametric model that Whitehead used.

One example of Whitehead's errors is:

He used medians for the tests instead of means, assuming – incorrectly – that the sum of medians is the median of the sum.

That's a fair criticism. However, Desvousges et al. are not satisfied leaving it at that. Instead, they go onto the attack:

Also, we examined the papers authored or co-authored by Whitehead that are cited in the recent reviews... These papers provide 15 CV datasets. Each of the three problems that Whitehead identified for our paper is evidenced in these datasets:

  • Non-monotonicity: 12 of the 15 datasets exhibit non-monotonicity.
  • Flat portions of the response curve: All 15 datasets have flat areas for at least half of the possible adjacent prompts, and 4 datasets have flat areas for all adjacent prompts.
  • Fat tails: In our data, the yes-share at the highest cost prompt ranged from 15 to 45%, depending on the program increment. In Whitehead's studies, the share ranged from 14 to 53%.

If Whitehead's data are no worse than typical CV studies, then his papers indicate the pervasiveness of these problems in CV studies.

Ouch! That seems to have ended that particular debate. My takeaway (apart from not messing with Desvousges et al.) is that the contingent valuation method is far from perfect. In particular, it is vulnerable to scope problems (which my own research with Ian Bateman and Andreas Tsoumas (ungated earlier version here) showed some years ago. Ironically, that contingent valuation has particular problems is a message that John Whitehead himself has also argued (see here).

Read more:

Tuesday, 25 November 2025

The economics of fertility in high-income countries

Earlier this year, Melissa Kearney and Phillip Levine released an NBER Working Paper on the economics of fertility in high-income countries. In part, this paper is a follow-up on their 2022 article on cohort effects and fertility (which I discussed here), as well as building on this theoretical and empirical review (ungated here) by Doepke et al. (which I discussed here).

Kearney and Levine first review the trends and patterns in fertility in high-income countries, focused in particular on cohort-based measures. This exercise re-establishes the by now well-known trend of declining fertility, across the six example countries that they selected (Canada, Japan, Netherlands, Norway, Portugal, and the US).

Kearney and Levine then turn their attention to why fertility has declined, as well as why various policies and incentives have mostly failed to arrest the declining fertility trends. Taking an economic perspective that builds from Gary Becker's work on the economics of the family, but broadens its consideration (as shown by Doepke et al.), Kearney and Levine state that:

...the evidence points us to the view that the recent decline in fertility is likely less about changes in current constraints and more about cumulative cultural and economic forces that influence fertility decisions over time. Generally, economists are loathe to rely on changes in preferences to explain behavior because that can explain virtually anything. But there are reasons to believe that the lifestyle, broadly defined, that is consistent with having a child or multiple children is becoming less desirable for many adults.

Kearney and Levine point out several times (as in the quote above) how much economists dislike resorting to changes in preferences as an explanation, because changes in preferences can be used to explain essentially anything (which renders models basically worthless). However, they acknowledge that in this context, and based on the evidence from many studies, that it is likely that "shifting priorities" (a convenient alternative name for changing preferences) are at play. These "shifting priorities":

...refer broadly to changes in individual values, which potentially reflect evolving opportunities and constraints, changing norms and expectations about work, parenting, and gender roles, and social and cultural factors.

However, Kearney and Levine still want to avoid letting changes in preferences take over. That leads them to note that:

...changes in preferences may not be generated randomly and it is important to consider the forces that might have led to such changes. In our review of empirical evidence below, we highlight a number of potential social and cultural factors that might have altered preferences for and attitudes toward childbearing in recent decades, including peer effects, media and social media influences, the role of religion and religious messaging, and changing norms around parenting and gender roles in the home and society.

For me, the key contributions of the paper are not the review sections, but the theoretical and empirical implications. For example, in terms of theory, Kearney and Levine suggest that economic modelling of family decisions needs to change. Specifically:

We propose that it is now more appropriate to consider and model labor force participation as the default option, and fertility as the discretionary activity. This reflects a major shift in societal norms and practices over the past several decades. Women in earlier cohorts were more likely to have children and less likely to work. Back then, it is reasonable to consider having children as a widespread priority for women, perhaps reflecting societal norms and expectations, and sustained participation in the paid labor force as the more “optional” choice.

That presumptive ranking quite possibly has reversed. If market work is now the norm, the labor market norms and practices, including the expectations of “greedy jobs” as described by Goldin (2014), may alter fertility behavior. The tradeoff between market work and childbearing is now about the tension between a lifetime career and the way motherhood interrupts or alters that lifetime career progression, rather than about whether women work at all after they are married or have had their first child.

In terms of empirical implications, Kearney and Levine note that economists could learn a lot from demographers, in particular in relation to recognising cohort effects. They also note that:

...a challenge for economic research going forward is that the empirical methods we often rely on for causal identification are not particularly well-suited for studying changes across cohorts, nor the impact of widespread social and cultural changes... The statistical demands on the data for causal identification often lead to a focus on the immediate impact of period-specific factors. But as noted throughout this paper, the key questions that remain to be answered in this area are about cohort-level changes and the role of less immediate and discrete changes.

In addition, a typical approach to identifying period-specific effects might generate misleading or limited policy lessons. Consider an intervention that relaxes some constraints on having a child at a point-in-time. Younger women—say, 18-year-olds—may incorporate that change into their long-term decision making, but they may not respond immediately. Meanwhile, women in their early 30s may be less responsive, having already made many related life choices (regarding careers, relationships, lifestyle, etc.). In such cases, we might observe little to no immediate effect, even if the policy ultimately influences lifetime fertility...

A policy change may lead women to move up the timing of a birth to respond to some incentive, but to have the same number of children over their childbearing years. Our methods may conclude that this policy “worked,” even though completed fertility was unaffected. 

It is important for economists to recognise where the current widely used empirical methods are likely to lead to incorrect conclusions being drawn, and Kearney and Levine have provided some important cautions here. Fertility decline is topical, and many economists will be working on research questions related to this, especially as policy initiatives are rolled out by governments trying to return to above-replacement fertility. This review by Kearney and Levine is both timely and very helpful.

[HT: Marginal Revolution]

Read more: