Wednesday, 18 February 2026

People's offsetting behaviour thwarts well-intentioned interventions in social media and smartphone use

People lead complicated lives. They have many competing goals, and have to trade off between those goals. Economists assume that they choose their actions with the overall aim of maximising their utility (satisfaction, or happiness). However, the many competing goals can sometimes thwart well-intentioned interventions. For example, when seatbelts were made compulsory, that made driving faster safer to do, and people responded by driving faster, and therefore less safely (for related examples, see here and here). Economists refer to that as offsetting behaviour.

Two recent examples of this arose in research papers I read this week. The first is this NBER Working Paper by Hunt Allcott (Stanford University) and a long list of co-authors, who investigated the impact of people temporarily deactivating Facebook or Instagram on their emotional state. Working with Meta (where some of the co-authors work), they:

...recruited 19,857 Facebook users and 15,585 Instagram users who spent at least 15 minutes per day on the respective platform. We randomly assigned 27 percent of participants to a treatment group that was offered payment for deactivating their accounts for the six weeks before the election. The remaining participants formed a control group that was paid to deactivate for just the first of those six weeks.

They then compare the difference in emotional state between before and after the deactivation for the treatment group (who deactivated for six weeks) and the control group (who deactivated for one week), and find that:

...users in the Facebook deactivation group reported a 0.060 standard deviation improvement in an index of happiness, anxiety, and depression, relative to control users...

...users in the Instagram deactivation group reported a 0.041 standard deviation improvement in the emotional state index relative to control.

Those effects are quite small in comparison to other interventions, and in comparison to changes in emotional state over time, and:

Under the approximation that emotional state index is normally distributed, the estimated effects of Facebook or Instagram deactivation would move the median user from the 50th percentile to the 52.4th or 51.6th percentile, respectively.

Why was the effect so small? Users who deactivated Facebook or Instagram spent more of their newly-freed-up time on other apps. Those who deactivate Facebook increased their use of Instagram, but also:

Facebook and Instagram deactivation both increased use of Twitter, Snapchat, TikTok, YouTube, web browsers, other social media apps, and other non-categorized apps by a few minutes per day.

It's little wonder that deactivating Facebook or Instagram had such small effects, given the offsetting behaviour of the users pivoting to using other apps, including other social media apps, instead. None of this is to say that the intervention made the users worse off, but it probably didn't make them better off overall either.

The second example is this NBER Working Paper by Billur Aksoy (Rensselaer Polytechnic Institute), Lester Lusher (University of Pittsburgh), and Scott Carrell (University of Texas at Austin), which looked at the effects of the app 'Pocket Points' at Texas A&M University. Specifically:

Pocket Points is marketed as a soft commitment device and provides incentives for students to stay off of their phones. In particular, Pocket Points rewards students with “points” for staying off their phones during class: Students open the app, lock their phone, and start accumulating points, all while the app verifies through GPS coordinates that the student is indeed in class. These points can then be used to get discounts at participating local and online businesses.

One thousand Texas A&M students were invited to participate in the experiment in 2017, and half were randomised to treatment, where they were instructed to download the Pocket Points App and create an account. Aksoy et al. then compare the treatment and control students. They also distinguish effects between those who used the app at least once, and those who used the app more than once a week (based on survey results). Importantly, first Aksoy et al. report that:

...treatment students were about 25 percentage points more likely to download the app... and over 31 percentage points more likely to use the app... than control students. Additionally, treatment students were 13 percentage points more likely to use the app more than once a week...

So, the treatment worked in encouraging students to use Pocket Points. But did it work? Aksoy et al. find some positive effects in the classroom, such as:

...Pocket Points usage is associated with a 0.42 standard deviation reduction in phone distraction rate in the classroom... we observe increases in student satisfaction with their academic performance for the semester: Students who used the app more than once a week experienced more than a one standard deviation increase in satisfaction...

That seems promising. However, when they look at student grades (from their official TAMU transcripts), Aksoy et al. find that:

...students who used the app more than once a week experienced a 0.50 unit increase in GPA. These estimates, however, are statistically insignificant...

So, even though the Pocket Points app reduced in-class distractions, it had no statistically significant effect on students' grades. That may be because there were also:

...significant decreases in time spent studying on campus... treated students spent approximately 18.2 hours/week studying, 12.0 of which were on campus, whereas control students spent 20.3 hours/week studying, 14.1 of which were on campus. Thus, it appears that the increased learning and attendance in the classroom came with a reduction in time spent studying.

It's little wonder that there was no effect on students' grades, given the offsetting behaviour of students spending less time studying, perhaps because they believed (perhaps rightly) that their in-class study time was more effective without phone distractions. None of this is to say that the app made the students worse off, but it probably didn't make them better off overall either.

When we implement an intervention that we hope will lead to better outcomes, such as improved emotional state due to less time spent on social media, or improved student performance due to more focused studying in class, we need to be prepared for the offsetting behaviour of the people affected by the intervention. Their lives are complicated, and they are trading off between competing goals. Just because we want to make one of their goals easier to achieve, that doesn't mean that they will focus extra energy on that goal. As we have seen from the two examples above, they may simply re-focus their energies elsewhere, leaving the outcome that we want to improve unchanged.

[HT: Marginal Revolution, last year]

Tuesday, 17 February 2026

Can fertility return to replacement levels?

Many countries, including almost all developed countries and many developing countries, are now experiencing below-replacement fertility, with fertility rates having declined substantially over the past decade or more. That means that each generation will be progressively smaller than the last, and almost inevitably that leads to a declining population (in the absence of offsetting migration flows). Can countries reverse the trend of declining fertility, and return to replacement levels? Two new articles suggest that might be difficult.

The first is this article by Michael Geruso and Dean Spears (both University of Texas at Austin), published in the Journal of Economic Perspectives (open access). They look explicitly at the question of whether persistently low fertility can be reversed, but first they do a great job of setting the scene:

Fertility is low or falling across the world: among high-, middle-, and low-income countries; among secular and religious populations; and in economies where the state is large and where it is small. Birth rates have been falling not only for decades, but for centuries. They have been falling for as long as there are good historical records to document them...

The TFR [total fertility rate] has fallen from a global average that was a little under five in 1950 to a global average that is a little over two in 2025...

The 115 richest countries in the world together have an average total fertility rate of 1.5... A birth rate of 1.5 would lead to a decline of 44 percent in generation size over two generations...

Geruso and Spears then look at the trends in some detail, focusing attention on completed cohort fertility (CCF), which captures the average number of lifetime births for women born in a particular place and year. That is a better measure than the TFR, because it is not affected by the timing of births - a woman having two children at ages 25 and 34 instead of aged 25 and 27 would not change the CCF, but would affect the TFR in the years in which they gave birth (increasing TFR when they were aged 34, but decreasing it when they were aged 27). In any case though, the trends in the two measures (CCF and TFR) are broadly similar, with both showing declining fertility over time across all of the countries that Geruso and Spears consider (with the exception of the US in the 1980s to 2000s, where there was a modest increase in fertility).

Geruso and Spears then use global data from the Human Fertility Database (HFD), and Indian data from the National Family Health Survey, and explore the contribution of childlessness to the overall decline in fertility. In both datasets, they find that the majority of the decline in fertility is due to a decline in the number of children among women who have at least one child, rather than an increase in childlessness. For countries in the HFD, childlessness accounts for 37 percent of the decline in fertility between the cohort of mothers born in 1956 and the cohort born in 1976, while in India, childlessness accounts for just 9 percent of the differences in fertility across districts.

Finally, Geruso and Spears turn to the prospects for a reversal of the fertility trend. On this, they start by noting that in the HFD:

...there have been 24 countries in which cohort fertility ever fell below 1.9. In none of these cases have subsequent cohorts from the same country ever had fertility as high as 2.1...

And aside from the post-WWII Baby Boom, there are no significant episodes of increasing fertility. And the Baby Boom was the result of a fairly unique set of circumstances that are (hopefully) unlikely to be repeated. Geruso and Spears then look at the microeconomic and programme evaluation literature, and note that:

...the clear-cut bottom line is that whatever impacts pro-natal policies and broader changes might have caused, none has caused low birth rates to reverse enduringly back to replacement levels.

Even a particularly strict programme in Romania that "banned abortion and made modern contraception effectively inaccessible" had only a short-term effect on the total fertility rate, and no effect on completed cohort fertility. So, this paper gives no reason to believe that declining fertility can be reversed. Geruso and Spears conclude that:

To put it bluntly, history offers no examples of societies recognizing very low birth rates as a social priority and then responding with effective changes that restore, and sustain, replacement-level fertility.

The second article is this one by Kimberly Babiarz (Stanford University), Paul Ma (University of Minnesota), Grant Miller (Stanford University), and Shige Song (City University of New York), forthcoming in the journal Review of Economics and Statistics (ungated earlier version here). They don't look explicitly at fertility decline, but they do look in detail at fertility in China, and in particular at the impact of the Wan Xi Shao (Later, Longer, Fewer) campaign, which predated the One Child Policy. That policy:

...aimed to limit fertility by promoting older age at marriage (“Later”), longer intervals between births (“Longer”), and fewer births per couple (“Fewer”).

The campaign was very successful, with the total fertility rate falling from 6 to about 2.75 over the course of the 1970s. The One Child Policy began in 1980, so by the time it was instituted, China's fertility had already fallen almost to replacement level. Babiarz et al. aren't the first to note this, but they extend the analysis further, exploiting differences in the timing of implementation of the policy across Chinese provinces to investigate how much of the decline in fertility was due to the policy, how it affected fertility decisions within Chinese families, and how many 'missing girls' are attributable to the policy implementation in combination with a societal preference for sons.

Now, as a policy the LLF aimed to:

...reduce crude annual birth rates in rural areas to 15 per 1,000 population via three primary mechanisms: (1) later marriage—delaying marriage to ages 23 and 25 (for rural women and men respectively); (2) longer birth intervals—increasing birth intervals to a minimum of four years; and (3) fewer lifetime births—limiting couples to 2–3 children in total...

The policy was implemented differently in urban areas, and about 87 percent of births in the sample occurred in rural areas, so Babiarz et al. focus attention on births in rural areas. Their main data source is the 1988 Two-per-Thousand National Survey of Fertility, which was a nationally representative survey that included around 400,000 women living in rural areas. I'm not going to go into detail on their methods (you should read the paper), but using an event study design, they find that the policy:

...reduced China’s total fertility rate by almost one birth per woman, accounting for about 30.6% of China’s overall fertility decline prior to 1980, or approximately 18.2 million averted births... Decomposing this TFR change into “quantum” and “tempo” effects, we show that, although the policy raised mothers’ median age at first birth by 5.2 months, the decline in TFR was largely the result of fewer lifetime births rather than changes in the timing of births.

They also find that:

...the LLF policy led directly to an increase in the use of both male-biased fertility-stopping rules and postnatal selection (via neglect or possible infanticide). Although postnatal selection was relatively rare, our results imply that the LLF policy resulted in about 180,000 additional missing girls, or approximately 19% of all missing girls during the 1970s.

So, the policy was quite successful in reducing Chinese fertility faster than it otherwise would have. However, this came with the unintended consequence of fewer female births relative to male births, and the phenomenon of 180,000 'missing girls' (who would have been born if the policy had not been in place).

How does this relate to the Geruso and Spears article, and what does it tell us about changing fertility? The Babiarz et al. article shows what it takes to move fertility quickly, but only in one direction (downwards). The LLF policy was dramatic, and successful, but it took a concerted government effort, supported by severe penalties, to achieve its aim. And this was in an environment where fertility was already declining. That's a very different challenge from trying to engineer a sustained increase in fertility back to replacement levels.

So, where does that leave us? If we have essentially no historical examples of societies successfully and sustainably reversing very low fertility, then the practical policy question shifts to planning for a future with progressively smaller age cohorts and older populations. That may mean reconsidering institutions that rely on a foundation of population growth (retirement and superannuation, and health and long-term care), as well as family-friendly policies and immigration settings. Policy proposals that treat women as a demographic instrument (like this one) aren’t a solution - they’re a warning sign that we’re asking policy to do something it may not be able to do.

[HT: Marginal Revolution, for the Babiarz et al. article]

Read more:

Monday, 16 February 2026

Book review: Economists in the Cold War

In 2024, I reviewed Alan Bollard's book Economists at War, noting that it sat awkwardly in-between being a biography and an economic history. I just finished reading Bollard's 2023 book Economists in the Cold War, which follows a similar approach.

This book is basically a sequel to the earlier book, and adopts a similar format, focusing on seven economists: Harry Dexter White, Oskar Lange, John von Neumann, Ludwig Erhard, Joan Robinson, Saburo Okita, and Raul Prebisch. Each chapter is devoted to the life and works (and times) of one of these eminent economists. This book differs from the earlier volume by setting each of the seven economists against one of their contemporaries, respectively: John Maynard Keynes, Friedrich Hayek, Leonid Kantorovich, Jean Monnet, Paul Samuelson, Zhou En-lai, and Walt Rostow.

There is a bit of overlap with the earlier book, which features Keynes, Kantorovich, and von Neumann. However, there is plenty of new material in this book, and I especially appreciated the chapters on Lange, Erhard, Okita, and Prebisch, who I knew little about. I also really enjoyed the chapter on Joan Robinson, which helped me to solve the mystery (to me, at least) of why she never won the Nobel Prize in Economics. On that point, Bollard writes that:

Once more, Robinson had no compunction about forming strong public views from limited evidence on contentious issues... It has been suggested that the polemical content of these writings may have cost Joan Robinson the Nobel Prize in economics which her mainstream contributions might otherwise have earned... She never saw the need to separate her economic findings and her political opinions.

Bollard has a good way of bringing in anecdotes, even though he is adamant that he is not writing a biography of each economist. On Oskar Lange, Bollard tells us that:

...he was once invited to lunch by Al Capone the famous gangster, who he found to be self-educated and well-read with a good knowledge of politics and economics. They had a most interesting conversation, and at the end Capone offered: 'Professor, if you ever have a problem, anything at all, please do not hesitate to call me!'...

It is not just any economist who can call on such support! On the negative side, there is a fair amount of repetition, both between this book and the earlier volume, and even within the book itself. For instance, Bollard twice tells us that British government economist Alex Cairncross's brother John was a spy for the Soviets, within the span of 17 pages. This, and the several other similar instances, is a minor point in an otherwise excellent book, but was quite distracting for me.

Overall, I rate this book as highly as the earlier volume, but as I noted at the beginning it suffers from a similar flaw. In trying to avoid being biography and economic history, it ends up awkwardly caught in-between. Perhaps my views have softened somewhat on this in the last couple of years, or perhaps it was that this book covered a lot of new ground for me, but I thought that overall this was the better of the two books. Like Bollard's earlier book, I recommend this one for anyone interested in the key players and in the development of 20th Century economics.

Sunday, 15 February 2026

Déjà vu: It's not a tax, it's a levy

In 2018, I mocked the government for their insistence that an increase in fuel tax was an excise, not a tax. Since I'm a firm believer in equal treatment of the government of the day when they display their economic illiteracy, I thought I needed to pick up on this story from earlier in the week:

Is it a tax? Is it a levy? An additional charge for a liquefied natural gas import terminal has turned into a communications nightmare for the Government...

Asked if this was a new tax on households, the prime minister was quick to intervene.

“This isn’t a tax, it’s a levy to fund a key piece of infrastructure,” he said.

So, it's a levy, and that is different from a tax? Not according to the OED, which defines a levy as:

Levy, n.

A duty, impost, tax.

Or, if you prefer the Merriam Webster Dictionary:

1 a : the imposition or collection of an assessment

Merriam Webster then defines an assessment as (emphasis is mine):

2 : the amount assessed : an amount that a person is officially required to pay especially as a tax

A levy is a tax. It has the same effects as a tax (for example, see this post for the details) - it raises the price that consumers pay, it lowers the effective price that sellers receive (after paying the levy to the government), it delivers revenue to the government, and it creates a deadweight loss (even if there may be offsetting benefits from how the revenue is spent). Whether the government uses that revenue for a liquefied natural gas import terminal, or for any other purpose, that doesn't change the fact that the levy is a tax.

I wrote back in that 2018 post that:

...this isn't the first (and it won't be the last) government to try their hardest not to refer to taxes as taxes.

It seems I was correct in that assessment.

Read more:

Friday, 13 February 2026

This week in research #113

Here's what caught my eye in research over the past week:

  • Babiarz et al. (with ungated earlier version here) show that most of China’s fertility decline occurred during the earlier Wan Xi Shao (Later, Longer, Fewer, LLF) campaign, rather than the One Child Policy
  • Clark and Nielsen (with ungated earlier version here) conduct a meta-analysis of studies on the returns to education and find that, after controlling for publication bias, the effects are smaller than expected (perhaps 0-3 percent per year of education, compared with 8.2 percent per year without correction for publication bias)
  • Bratti, Granato, and Havari (open access) demonstrate that a policy reducing the number of exam retakes per year at one Italian university significantly improved students’ first-year outcomes, resulting in lower dropout rates, increased exam pass rates, and enhanced credit accumulation (presumably because the students had to give the exam their best shot the first time around)
  • Buechele et al. (open access) find no systematic evidence indicating that the prestige of the doctoral degree-granting university systematically affects individuals' odds of being appointed to professorships in Germany (because the prestigious universities train a disproportionate number of the PhD graduates)

Finally, I spent today and yesterday at the New Zealand Economics Forum. I wasn't one of the speakers, so I could again enjoy the proceedings from the floor. Overall, I thought that this may have been the best Forum so far (it has been running for six years now). You can watch recordings of the sessions now (here for Day One from the main room, here for the Day One breakout room sessions, here for Day Two from the main room, and here for the Day Two breakout room sessions). Enjoy!

Wednesday, 11 February 2026

Did employers value an AI-related qualification in 2021?

Many universities are rapidly adapting to education in the age of generative AI by trying to develop AI skills in their students. There is an assumption that employers want graduates with AI skills across all disciplines, but is there evidence to support that? This recent discussion paper by Teo Firpo (Humboldt-Universität zu Berlin), Lukas Niemann (Tanso Technologies), and Anastasia Danilov (Humboldt-Universität zu Berlin) provides an early answer. I say it's an early answer because their data come from 2021, before the wave of generative AI innovation that became ubiquitous following the release of ChatGPT at the end of 2022. The research also focuses on AI-related qualifications, rather than the more general AI skills, but it's a start.

Firpo et al. conduct a correspondence experiment, where they:

...sent 1,185 applications to open vacancies identified on major UK online job platforms... including Indeed.co.uk, Monster.co.uk, and Reed.co.uk. We restrict applications to entry-level positions requiring at most one year of professional experience, and exclude postings that demand rare or highly specialized skills...

Each identified job posting is randomly assigned to one of two experimental conditions: a "treatment group", which receives a résumé that includes additional AI-related qualifications and a "control group", which receives an otherwise identical résumé without mentioning such qualifications.

Correspondence experiments are relatively common in the labour economics literature (see here, for example), and involve the researcher making job applications with CVs (and sometimes cover letters) that differ in known characteristics. In this case, the applications differed by whether the CV included an AI-related qualification or not. Firpo et al. then focus on differences in callback rates, and they differentiate between 'strict callbacks' (invitations to interview), and 'broad callbacks' (any positive employer response, including requests for further information). Comparing callback rates between CVs with and without AI-related qualifications, they find:

...no statistically significant difference between treatment and control groups for either outcome measure...

However, when they disaggregate their results by job function, they find that:

In both Marketing and Engineering, résumés listing AI-related qualifications receive higher callback rates compared to those in the control group. In Marketing, strict callback rates are 16.00% for AI résumés compared to 7.00% for the control group (p-value = 0.075...), while broad callback rates are 24.00% versus 12.00% (p-value = 0.043...). In Engineering, strict callback rates are 10.00% for AI résumés compared to 4.00% for the control group (p-value = 0.163...), while broad callback rates are 20.00% versus 8.00% (p-value = 0.024...).

For the other job functions (Finance, HR, IT, and Logistics) there was no statistically significant effect of AI qualifications on either measure of callback rates. Firpo et al. then estimate a regression model and show that:

...including AI-related qualifications increases the probability of receiving an interview invitation for marketing roles by approximately 9 percentage points and a broader callback by 12 percentage points. Similarly, the interaction between the treatment dummy and the Engineering job function dummy in the LPM models is positive and statistically significant, but only for broad callbacks. AI-related qualifications increase the probability of a broad callback by at least 11 percentage points...

The results from the econometric model are only weakly statistically significant, but they are fairly large in size. However, I wouldn't over-interpret them because of the multiple-comparison problem (around five percent of results would show up as statistically significant just by chance). At best, the evidence that employers valued AI-related qualifications in 2021 is pretty limited, based on this research.

Firpo et al. were worried that employers might not have noticed the AI qualifications in the CVs, so they conducted an online survey of over 700 professionals with hiring experience and domain knowledge, but that survey instead shows that the AI-related qualification was salient and a signal of greater technical skills, but lower social skills. These conflicting signals are interesting, and suggestive that employers are looking for both technical skills and social skills in entry-level applicants. Does this, alongside the earlier results for different job functions, imply that technical skills are weighted more heavily than social skills for Engineering and Marketing jobs? I could believe that for Engineering, but for Marketing I have my doubts, because interpersonal skills are likely to be important in Marketing. Again though, it's probably best not to over-interpret the results.

Firpo et al. conclude that:

...our findings challenge the assumption that AI-related qualifications unambiguously enhance employability in early-career recruitment. While such skills might be valued in abstract or strategic terms, they do not automatically translate into interview opportunities, at least not in the entry-level labor market in job functions such as HR, Finance, Marketing, Engineering, IT and Logistics.

Of course, these results need to be considered in the context of their time. In 2021, AI-related skills might not have been much in demand by employers. That is unlikely to hold true now, given that generative AI use has become so widespread. It would be interesting to see what a more up-to-date correspondence experiment would find.

[HT: Marginal Revolution]

Read more:

  • ChatGPT and the labour market
  • More on ChatGPT and the labour market
  • The impact of generative AI on contact centre work
  • Some good news for human accountants in the face of generative AI
  • Good news, bad news, and students' views about the impact of ChatGPT on their labour market outcomes
  • Swiss workers are worried about the risk of automation
  • How people use ChatGPT, for work and not
  • Generative AI and entry-level employment
  • Survey evidence on the labour market impacts of generative AI
  • Tuesday, 10 February 2026

    Who on earth has been using generative AI?

    Who are the world's generative AI users? That is the question addressed in this recent article by Yan Liu and He Wang (both World Bank), published in the journal World Development (ungated earlier version here). They use website traffic data from Semrush, alongside Google Trends data, to document worldwide generative AI use up to March 2024 (so, it's a bit dated now, as this is a fast-moving area, but it does provide an interesting snapshot up to that point). In particular, Liu and Wang focus on geographical heterogeneity in generative AI use (measured as visits to generative AI websites, predominantly, or in some of their analyses, entirely ChatGPT), and they explore how that relates to country-level differences in institutions, infrastructure, and other variables.

    Some of the results are fairly banal, such as the rapid increase in website traffic to AI chatbot websites, a corresponding decline in traffic to sites such as Google, and Stack Overflow, and that the users skew younger, more educated, and male. Those demographic differences will likely become less dramatic over time as user numbers increase. However, the geographic differences are important and could be more persistent. Liu and Wang show that:

    As of March 2024, the top five economies for ChatGPT traffic are the US, India, Brazil, the Philippines, and Indonesia. The US share of ChatGPT traffic dropped from 70 % to 25 % within one month of ChatGPT’s debut. Middle-income economies now contribute over 50 % of traffic, showing disproportionately high adoption of generative AI relative to their GDP, electricity consumption, and search engine traffic. Low-income economies, however, represent less than 1 % of global ChatGPT traffic.

    So, as of 2024, most generative AI use was in middle-income countries, but remember that those are also high-population countries (like India). Generative AI users are disproportionately from high-income countries once income and internet use (proxied by search engine traffic) are accounted for. Figure 12 in the paper illustrates this nicely, showing generative AI use, measured as visits per internet user:

    Notice that the darker-coloured countries, where a higher proportion of internet users used ChatGPT, are predominantly in North America, western Europe, and Australia and New Zealand. On that measure, Liu and Wang rank New Zealand 20th (compared with Singapore first, and Australia eighth). There are a few interesting outliers like Suriname (sixth) and Panama (17th), but the vast majority of the top twenty countries are high-income countries.

    What accounts for generative AI use at the country level? Using a cross-country panel regression model, Liu and Wang find that:

    Higher income levels, a higher share of youth population, bet-ter digital infrastructure, and stronger human capital are key predictors of higher generative AI uptake. Services’ share of GDP and English fluency are strongly associated with higher chatbot usage.

    Now, those results simply demonstrate correlation, and are not causal. And website traffic could be biased due to use of VPNs, etc., not to mention that it doesn't account very well for traffic from China or Russia (and Liu and Wang are very upfront about that limitation). Nevertheless, it does provide a bit more information about how countries with high generative AI use differ from those with low generative AI use. Generative AI has the potential to level the playing field somewhat for lower-productivity workers, and lower-income countries. However, that can only happen if lower-income countries access generative AI. And it appears as if, up to March 2024 at least, they are instead falling behind. As Liu and Wang conclude, any catch-up potential from generative AI:

    ...depends on further development as well as targeted policy interventions to improve digital infrastructure, language accessibility, and foundational skills.

    To be fair, that sounds like a general prescription for development policy in any case.

    Read more:

    Monday, 9 February 2026

    The promise of a personalised, AI-augmented textbook, and beyond

    In the 1980s, the educational psychologist Benjamin Bloom introduced the 'two-sigma problem' - that students who were tutored one-on-one using a mastery approach performed on average two standard-deviations (two-sigma) better than students educated in a more 'traditional' classroom setting. That research is often taken as a benchmark for how good an educational intervention might be (relative to a traditional classroom baseline). The problem, of course, is that one-on-one tutoring is not scalable. It simply isn't feasible for every student to have their own personal tutor. Until now.

    Generative AI makes it possible for every student to have a personalised tutor, available 24/7 to assist with their learning. As I noted in yesterday's post though, it becomes crucial how that AI tutor is set up, as it needs to ensure that students engage meaningfully in a way that promotes their own learning, rather than simply being a tool to 'cognitively offload' difficult learning tasks.

    One promising approach is to create customised generative AI tools, that are specifically designed to act as tutors or coaches, rather than simple 'answer-bots'. This new working paper by the LearnLM team at Google (and a long list of co-authors) provides one example. They describe an 'AI-augmented textbook', which they call the 'Learn Your Way' experience, which:

    ...provides the learner with a personalized and engaging learning experience, while also allowing them to choose from different modalities in order to enhance understanding.

    Basically, this initially involves taking some source material, which in their case is a textbook, but could just as easily be lecture slides, transcripts, and related materials from a class. It then personalises those materials to the interests of the students, adapting the examples and exercises to fit a context that the students find more engaging. For example, if the student is an avid football fan, they might see examples drawn from football. And if the student is into Labubu toys, they might see examples based on that.

    The working paper describes the approach, reports a pedagogical evaluation performed by experts, and finally reports on a randomised controlled trial (RCT) evaluating the impact of the approach on student learning. The experts rated the Learn Your Way experience across a range of criteria, and the results were highly positive. The only criterion where scores were notably low was for visual illustrations. That accords with my experience so far with AI tutors, which are not good at drawing economics graphs, in particular (and is an ongoing source of some frustration!).

    The RCT involved sixty high-school students in Chicago area schools, who studied this chapter on brain development of adolescents. Half of the students were assigned to Learn Your Way, and half to a standard digital PDF reader. As the LearnLM Team et al. explain:

    Participants then used the assigned tool to study the material. Learning time was set to a minimum of 20 minutes and a maximum of 40 minutes. After this time, each participant had 15 minutes to complete the Immediate Assessment via a Qualtrics link.

    They then did a further assessment three days later (a 'Retention Assessment'). In terms of the impact of Learn Your Way:

    The students who used Learn Your Way received higher scores than those who used the Digital Reader, in both the immediate (p = 0.03) and retention (p = 0.03) assessments.

    The difference in test outcomes was 77 percent vs. 68 percent in the Immediate Assessment, and 78 percent vs. 67 percent in the Retention Assessment. So, the AI-augmented textbook increased student learning and retention by about 10 percentage points in both immediate learning and in the short term (three days). Of course, this was just a single study with a relatively small sample size of 60 students in a single setting, but it does offer some promise for the approach.

    I really like this idea of dynamically adjusting content to suit students' interests, which is a topic I have published on before. However, using generative AI in this way allows material to be customised for every student, creating a far more personalised approach to learning than any teacher could offer. I doubt that even one-on-one tutoring could match the level of customisation that generative AI could offer.

    This paper has gotten me thinking about the possibilities for personalised learning. Over the years, I have seen graduate students with specific interests left disappointed by what we are able to offer in terms of empirical papers. For example, I can recall students highly interested in economic history, the economics of education, and health economics in recent years. Generative AI offers the opportunity to provide a much more tailored education to students who have specific interests.

    This year, I'll be teaching a graduate paper for the first time in about a decade. My aim is to allow students to tailor that paper to their interests, by embarking on a series of conversations about research papers based on their interests. The direction that leads will be almost entirely up to the student (although with some guidance from me, where needed). Students might adopt a narrow focus on a particular research method, a particular research question, or a particular field or sub-field of economics. Assisted by a custom generative AI tool, they can read and discuss papers, try out replication packages, and/or develop their own ideas. Their only limits will be how much time they want to put into it. Of course, some students will require more direction than others, but that is what our in-class discussion time will be for.

    I am excited by the prospects of this approach, and while it will be a radical change to how our graduate papers have been taught in the past, it might offer a window to the future. And best of all, I have received the blessing of my Head of School to go ahead with this as a pilot project that might be an exemplar for wider rollout across other papers. Anyway, I look forward to sharing more on that later (as I will turn it into a research project, of course!).

    The ultimate question is whether we can use generative AI in a way that moves us closer to Bloom’s two-sigma benefit of one-on-one tutoring. The trick will be designing it so that students still do the cognitive work. My hope (and, it seems, the LearnLM team’s) is that personalisation increases students' engagement with learning rather than replacing it. If it works, this approach could be both effective and scalable in a way that human one-on-one tutoring simply can’t match.

    [HT: Marginal Revolution, for the AI-augmented textbook paper]

    Sunday, 8 February 2026

    Neuroscientific insights into learning and pedagogy, especially in the age of generative AI

    In May last year, my university's Centre for Tertiary Teaching and Learning organised a seminar by Barbara Oakley of Oakland University, with the grand title 'The Science of Learning'. It was a fascinating seminar about the neuroscience of learning, and in my mind, it justified several of my teaching and learning practices, such as continuing to have lectures, to emphasise students' learning basic knowledge in economics, and retrieval practice and spaced repetition as learning tools.

    Now, I've finally read the associated working paper by Oakley and co-authors (apparently forthcoming as a book chapter), and I've been able to pull out further insights that I want to share here. The core of their argument is in the Introduction to the paper. First:

    Emerging research on learning and memory reveals that relying heavily on external aids can hinder deep understanding. Equally problematic, however, are the pedagogical approaches used in tandem with reliance on external aids—that is, constructivist, often coupled with student-centered approaches where the student is expected to discover the insights to be learned... The familiar platitude advises teachers to be a guide on the side rather than a sage on the stage, but this oversimplifies reality: explicit teaching—clear, structured explanations and thoughtfully guided practice—is often essential to make progress in difficult subjects. Sometimes the sage on the stage is invaluable.

    I have resisted the urge to move away from lectures as a pedagogical tool, although I'd like to think that my lectures are more than simply information dissemination. I actively incorporate opportunities for students to have their first attempts at integrating and applying the economic concepts and models they are learning - the first step in an explicit retrieval practice approach. Oakley et al. note the importance of both components, because:

    ...mastering culturally important academic subjects—such as reading, mathematics, or science (biologically secondary knowledge)—generally requires deliberate instruction... Our brains simply aren’t wired to effortlessly internalize this kind of secondary knowledge—in other words, formally taught academic skills and content—without deliberate practice and repeated retrieval.

    The paper goes into some detail about the neuroscience underlying this approach, but again it is summarised in the Introduction:

    At the heart of effective learning are our brain's dual memory systems: one for explicit facts and concepts we consciously recall (declarative memory), and another for skills and routines that become second nature (procedural memory). Building genuine expertise often involves moving knowledge from the declarative system to the procedural system—practicing a fact or skill until it embeds deeply in the subconscious circuits that support intuition and fluent thinking...

    Internalized networks form mental structures called schemata, (the plural of “schema”) which organize knowledge and facilitate complex thinking... Schemata gradually develop through active engagement and practice, with each recall strengthening these mental frameworks. Metaphors can enrich schemata by linking unfamiliar concepts to familiar experiences... However, excessive reliance on external memory aids can prevent this process. Constantly looking things up instead of internalizing them results in shallow schemata, limiting deep understanding and cross-domain thinking.

    This last point, about the shallowness of learning when students rely on 'looking things up' instead of relying on their own memory of key facts (and concepts and models, in the case of economics), leads explicitly to worries about learning in the context of generative AI. When students rely on external aids (known as 'cognitive offloading'), then learning becomes shallow, because:

    ...deep learning is a matter of training the brain as much as informing the brain. If we neglect that training by continually outsourcing, we risk shallow competence.

    Even worse, there is a feedback loop embedded in learning, which exacerbates the negative effects of cognitive offloading:

    Without internally stored knowledge, our brain's natural learning mechanisms remain largely unused. Every effective learning technique—whether retrieval practice, spaced repetition, or deliberate practice—works precisely because it engages this prediction-error system. When we outsource memory to devices rather than building internal knowledge, we're not just changing where information is stored; we're bypassing the very neural mechanisms that evolved to help us learn.

    In short, internalized knowledge creates the mental frameworks our brains need to spot mistakes quickly and learn from them effectively. These error signals do double-duty: they not only help us correct mistakes but also train our attention toward what's important in different contexts, helping build the schemata we need for quick thinking. Each prediction error, each moment of surprise, thus becomes an opportunity for cognitive growth—but only if our minds are equipped with clear expectations formed through practice and memorization...

    Learning works through making mistakes, recognising those mistakes, and adapting to reduce those mistakes in future. Ironically, this is analogous to how generative AI models are trained (through 'reinforcement learning'). When students offload learning tasks to generative AI, they don't get an opportunity to develop the underlying internalised knowledge that allows them to recognise mistakes and learn from them. Thus, it is important for significant components of student learning to happen without resorting to generative AI (or other tools that allow students to cognitively offload tasks).

    Now, in order to encourage learning, teachers must provide students with the opportunity to make, and learn from, mistakes. Oakley et al. note that:

    ...cognitive scientists refer to challenges that feel difficult in the moment but facilitate deeper, lasting understanding as “desirable difficulties... Unlike deliberate practice, which systematically targets specific skills through structured feedback, desirable difficulties leverage cognitive struggle to deepen comprehension and enhance retention...

    Learning is not supposed to be easy. It is supposed to require effort. This is a point that I have made in many discussions with students. When they find a paper relatively easy, it is likely that they aren't learning much. And tools that make learning easier can hinder, rather than help, the learning process. In this context, generative AI becomes potentially problematic for learning for some (but not all) students. Oakley et al. note that:

    Individuals with well-developed internal schemas—often those educated before AI became ubiquitous—can use these tools effectively. Their solid knowledge base allows them to evaluate AI output critically, refine prompts, integrate suggestions meaningfully, and detect inaccuracies. For these users, AI acts as a cognitive amplifier, extending their capabilities.

    In contrast, learners still building foundational knowledge face a significant risk: mistaking AI fluency for their own. Without a robust internal framework for comparison, they may readily accept plausible-sounding output without realizing what’s missing or incorrect. This bypasses the mental effort—retrieval, error detection, integration—that neuroscience shows is essential for forming lasting memory engrams and flexible schemas. The result is a false sense of understanding: the learner feels accomplished, but the underlying cognitive work hasn’t been done.

    The group that benefits from AI as a complement for studying is not just those who were educated before AI became ubiquitous, but also those who learn in an environment where generative AI is explicitly available as a complement to learning (rather than a substitute). To a large extent, it depends on how generative AI is used as a learning tool. Oakley et al. do provide some good examples (and I have linked to some in past blog posts). I'd also like to think the AI tutors I have created for my ECONS101 and ECONS102 students assist with, rather than hamper, learning (and I have some empirical evidence that seems to support this, which I have already promised to blog about in the future).

    Oakley et al. conclude that:

    Effective education should balance the use of external tools with opportunities for students to internalize key knowledge and develop rich, interconnected schemata. This balance ensures that technology enhances learning rather than creating dependence and cognitive weakness.

    Finally, they provide some evidence-based strategies for enhancing learning (bolding is mine):

    • Embrace desirable difficulty—within limits: Encourage learners to generate answers and grapple with problems before turning to help... In classroom practice, this means carefully calibrating when to provide guidance—not immediately offering solutions, but also not leaving students floundering with tasks far beyond their current capabilities...
    • Assign foundational knowledge for memorization and practice: Rather than viewing factual knowledge as rote trivia, recognize it as the glue for higher-level thinking...
    • Use procedural training to build intuition: Allocate class time for practicing skills without external aids. For instance, mental math exercises, handwriting notes, reciting important passages or proofs from memory, and so on. Such practices, once considered old-fashioned, actually cultivate the procedural fluency that frees the mind for deeper insight...
    • Intentionally integrate technology as a supplement, not a substitute: When using AI tutors or search tools, structure their use so that the student remains cognitively active...
    • Promote internal knowledge structures: Help students build robust mental frameworks by ensuring connections happen inside their brains, not just on paper... guide students to identify relationships between concepts through active questioning ("How does this principle relate to what we learned last week?") and guided reflection...
    • Educate about metacognition and the illusion of knowledge: Help students recognize that knowing where to find information is fundamentally different from truly knowing it. Information that exists "out there" doesn't automatically translate to knowledge we can access and apply when needed.

    I really like those strategies as a prescription for learning. However, I am understandably biased, because many of the things I currently do in my day-to-day teaching practice are encompassed within (or similar to) those suggested strategies. I'll work on making 'guided reflection' a little more interactive in my classes this year, as I have traditionally made the links explicit for the students, rather than inviting them to make those links for themselves. We have been getting our ECONS101 students to reflect more on learning, and we'll be revising that activity (which happens in the first tutorial) this year to embrace more of a focus on metacognition.

    Learning is something that happens (often) in the brain. It should be no surprise that neuroscience has some insights to share on learning, and what that means for pedagogical practice. Oakley et al. take aim at some of the big names in educational theory (including Bloom, Dewey, Piaget, and Vygotsky), so I expect that their work is not going to be accepted by everyone. However, I personally found a lot to vindicate my pedagogical approach, which has developed over two decades of observational and experimental practice. I also learned that there are neuroscientific foundations for many aspects of my approach. And, I learned that there are things I can do to potentially further improve student learning in my classes.

    Friday, 6 February 2026

    This week in research #112

    Here's what caught my eye in research over the past week:

    • Mati et al. find that the Russia-Ukraine war resulted in an immediate 21 percent reduction in the daily growth rate of the Euro-Ruble exchange rate, and that the steady-state effect translates to a 26 percent reduction in growth
    • Masuhara and Hosoya review the COVID-19-related performance of OECD countries as well as Singapore and Taiwan in terms of deaths, vaccination status, production, consumption, and mobility from the early part of the pandemic to the end of 2022, and conclude that Norway was the most successful in terms of balancing deaths, production, and consumption
    • Neprash, McGlave, and Nikpay (with ungated earlier version here) quantify the effects of ransomware attacks on hospital operations and patient outcomes, finding that attacks decrease hospital volume by 17-24 percent during the initial attack week, with recovery occurring within 3 weeks, and that among patients already admitted to the hospital when a ransomware attack begins, in-hospital mortality increases by 34-38 percent
    • Tsivanidis (with ungated earlier version here) studies the world’s largest Bus Rapid Transit system in Bogotá, Colombia, and finds that low-cost "feeder" bus systems that complement mass rapid transit by providing last-mile connections to terminals yield high returns, but that welfare gains would have been about 36 percent larger under a more accommodative zoning policy
    • Janssen finds that the 2023 Bud Light boycott led to a large drop in Bud Light volume (34-37 percent), partial switching into other beer, and a net decline in total ethanol purchases of roughly 5.5-7.5 percent of pre-boycott intake
    • Krishnatri and Vellakkal (with ungated earlier version here) find that alcohol prohibition in Bihar, India, led to significant increases in caloric, protein, and fat intake from healthy food sources, as well as a decline in fat intake from unhealthy food sources
    • Geruso and Spears (open access) document the worldwide fall in birth rates, and the unlikely prospects of a reversal to higher fertility in the future

    Thursday, 5 February 2026

    Americans' beliefs about trade, and why compensation matters

    Do people understand trade policy? Or rather, do they understand trade policy the way that economists understand it? Given current debates in the US and elsewhere, it would be fair to question people's (or politicians') understanding of trade policy, and to consider what it is about trade that generates negative reactions. After all, the aggregate benefits of free trade are one of the things about which economists most agree.

    Last year, Stefanie Stantcheva won the John Bates Clark Medal (which is awarded annually to the American economist under age 40 who has made the most significant contributions to the field). Stantcheva's medal-winning work included three main strands, one of which was the use of "innovative surveys and experiments to measure what people know". One of the papers from that strand of research is this 2022 NBER Working Paper (revised in 2023), which describes Americans' understanding of trade and trade policy and importantly, it answers the question of why people support trade (or not).

    The paper reports results from three large-scale surveys in the US run between 2019 and 2023, with a total sample size of nearly 4000. The surveys also included experiments that primed respondents to think about trade from particular angles. Overall, Stantcheva is interested in teasing out the factors that affect Americans' support for trade policies. Essentially, she tests the mechanisms that are described in Boxes I-V in Figure 2 from the paper:

    Box I picks up views on whether trade lowers prices and increases variety for consumers. Box II picks up the threats from increasing trade to workers in import-competing sectors. Those two boxes together constitute self-interest as an effect on people's views on trade policy. Their views might also be affected by broader social and economic concerns, such as trade's efficiency effects (Box III), its distribution impacts (Box IV), and patriotism, partisanship, or geopolitical concerns (Box V).

    Before we turn to the specific results on the mechanisms, it is worth considering Americans' overall views on trade first. Stantcheva reports that:

    Most respondents (63%) are supportive of more free trade and decreasing trade restrictions in general... Only 36% believe that import restrictions are the best way to help U.S. workers.

    Nevertheless, there is support for more targeted trade restrictions. 40% of respondents believe the US should restrict food imports to ensure food security. 54% think the US should protect their “infant” industries. 78% support protection of key consumer products, namely food items and cars. 50% believe the US should restrict trade in key sectors, such as oil and machinery...

    And general knowledge about trade policy is not too bad, as:

    ...almost 80% of respondents know what an import tariff is, but just around half know what an import quota is. Two-thirds of respondents appear to understand the basic price effects of tariffs and export taxes, i.e., that an import tariff on imported goods will likely raise the price of that good and that an export tax will increase the price of the taxed good abroad. The final question... considers a scenario in which the US can produce a good (“cars”) at a lower cost than the foreign country. Respondents are asked whether, under some circumstances, it would still make sense to import cars from abroad. 68% of respondents agree that it could make sense. This suggests that respondents either understand the concept of comparative advantage or have in mind some model of love-for-variety or quality differential.

    So far, so good. How do Americans perceive the impacts of trade? Figure 9 Panel A reports perceptions related to the self-interest motivation (Boxes I and II from the figure above):

    From the bottom of that figure, it is clear that a majority of Americans believe that they are better off from trade, but a substantial minority (39%) believe that they are worse off. Still focusing on the self-interest motivations (Boxes I and II), Stantcheva finds that:

    In general, a respondent’s (objective) negative exposure to trade through their sector, occupation, or local labor market is significantly positively correlated with a feeling that trade has made them worse off and that it has negatively affected their job. People exposed to trade through their job also feel worse off as consumers and are less likely to believe that trade has reduced the prices of goods they buy, perhaps because they feel that their purchasing power is lower than it would otherwise be. Furthermore, college-educated respondents are significantly less likely to feel negatively impacted in their role as consumers and workers.

    Notice those results are mostly consistent with the figure above. What about consumer gains through reduced prices on imported products? Stantcheva reports that:

    ...the belief that prices decrease from trade is not significantly related to either support for trade or redistribution. Consistent with this lack of correlation, the experiment priming people to think of their benefits as consumers (precisely, the prices and variety of goods they purchase) does not move their support for trade either.

    So, in terms of self-interest, Americans' support for trade is more negative when they are negatively affected as workers, but is not more positive when they are positively affected as consumers. In my ECONS102 class, we talk about the tension between the gains from trade and loss aversion. Every trade involves gaining something, in exchange for giving something up. However, quasi-rational decision-makers are affected much more by losses than equivalent gains (what we call loss aversion). So, loss aversion might mean that many profitable trades are not undertaken, because the decision-makers prefer to keep what they have, rather than giving it up for something that may be objectively worth more. In the case of Stantcheva's survey respondents, the workers who are negatively impacted experience a loss, which would be weighed much more heavily than the gain that a consumer receives.

    An alternative explanation is salience. Job losses are very visible and impactful on the people who lose their jobs and those around them. Consumers' gains in terms of lower prices and increased variety, on the other hand, are not really as visible - many people wouldn't even notice them, unless they were pointed out to them. So even if people weren’t loss averse, attention would still be drawn disproportionately to the negative impacts of trade, rather than the positive. Taken altogether, Stantcheva's results here are not surprising.

    What about the broader social and economic concerns, and their impact on views about trade? In terms of efficiency effects (Box III), Stantcheva reports that:

    Respondents are generally optimistic about these effects. For instance, 61% of respondents think that international trade increases competition among firms in the US, 69% that it fosters innovation, and 62% that it generates more GDP growth.

    Moreover:

    ...efficiency gains from trade are significantly associated with more support for free trade... This relation can be seen in the correlations and the experimental effects: the Efficiency treatment significantly improves support for free trade.

    And interestingly:

    Respondents who believe that trade can improve innovation, competitiveness, and GDP are more supportive of redistribution policy to help those who do not benefit from these efficiency gains.

    Turning to distributional impacts (Box IV), Stantcheva reports that:

    Overall, respondents know that trade can have adverse distributional consequences through the labor market. Just around half of all respondents believe that trade has, on balance, helped US workers. 79% of people think that trade is the reason for “unemployment in some sectors and the decline of some industries in the U.S..” More respondents (63%) believe that high-skilled workers could easily change their work sector if their jobs were destroyed by trade than that low-skilled workers could switch sectors (37%)...

    Consequently, around two-thirds of respondents think that trade is a major reason for the “rise in inequality” in the US. Notably, despite being aware of the potential adverse distributional consequences of trade, a majority (62%) of respondents believe that, in principle, trade could make everyone better off because it is possible to “compensate those who lose from it through appropriate policies.”

    It is interesting that so many people believe in the compensation principle (although I bet that few of them would know that term for it). And it turns out that belief in the compensation principle is really important, as:

    ...the strongest predictor of support for free trade is the belief that, in principle, losers can be compensated... free trade. As long as respondents believe that adverse consequences from trade on some groups can be dampened by redistributive policy, they are likely to support more free trade, even if they believe that there are adverse distributional consequences. The perceived distributional impacts of trade also substantially matter for support for compensatory redistribution. Respondents who believe that trade hurts low-income and low-skilled workers and that it fosters inequality support redistribution much more.

    Finally, in terms of patriotism, partisanship, or geopolitical concerns (Box V), Stantcheva reports that:

    ...those who worry about geopolitical ramifications from trade restrictions, i.e., retaliatory responses, are more likely to support policies to compensate losers from trade rather than support outright trade restrictions. Patriotism is significantly correlated with support for trade restrictions in many industries and to protect U.S. workers, as well as with lower support for compensatory transfers...

    Stantcheva draws a number of conclusions from her results, including:

    First, respondents perceive gains from trade as consumers to be vague and unclear but perceive potential losses as workers to be concentrated and salient. Actual and perceived exposure to trade through the labor market is significantly associated with policy views...

    Second, people’s policy views on trade do not only reflect self-interest. Respondents also care about trade’s distributional and efficiency impacts on others and the US economy...

    Third, respondents’ experience, as measured by their exposure to trade through their sector, occupation, and local labor market, shapes their policy views directly (through self-interest) and indirectly by influencing their understanding and reasoning about the broader efficiency and distributional impacts of trade.

    Overall, I take away from this paper that Americans have more correct views about trade than I suspected. Their support for trade is not determined simply by self-interest, but is more nuanced. However, negative impacts weigh far more heavily for those who are negatively impacted than the weight attached to positive impacts for those who are positively impacted. That may relate to loss aversion, and to the more concentrated nature of negative impacts compared with more diffuse positive impacts. That asymmetry also explains why a majority have positive views of trade (since fewer people will have been negatively impacted on the whole). The most surprising aspect to me, though, was the views on the compensation principle. Those results provide a clear policy prescription. To get more people on board with trade, making compensatory policy more explicit and salient may help to ensure that there is greater support for trade. On the other hand, politicians who want to exploit the negative views on trade might benefit from obscuring any such compensatory policies. Unfortunately, there are too many who are willing to do just that.

    [HT: Marginal Revolution, last year]

    Wednesday, 4 February 2026

    The economic impacts of the 2008 NZ-China Free Trade Agreement

    New Zealand was the first Western developed country to sign a free trade agreement with China, and it came into force in 2008. At the time, the New Zealand government estimated an increase in exports to China of between NZ$225 million and NZ$350 million (between US$180 million and US$280 million), and Ministry of Foreign Affairs and Trade (MFAT) estimated an increase of 0.25% in GDP. How did things actually turn out?

    That is the question addressed in this 2021 article by Samuel Verevis (MFAT) and Murat Üngör (University of Otago), published in the Scottish Journal of Political Economy (ungated earlier version here). Now, the challenge with this sort of exercise is that we can observe what happened to New Zealand with the FTA in place, but we cannot observe what would have happened if there had been no FTA (the counterfactual). And that is a problem, since what we really want to know is the difference in outcome between what really happened and the counterfactual.

    Verevis and Üngör solve that problem by using the synthetic control method. Essentially, they use a weighted average of the outcomes of other countries (donor countries), that closely follows the trends in the New Zealand data before the FTA came into force in 2008, and then use the same weights to create a 'synthetic New Zealand' counterfactual for the period after 2008. The key assumption with this approach is that there isn't some other change that affected New Zealand differently from the donor countries at the same time as the FTA came into force.

    Verevis and Üngör first look at the effect on New Zealand exports to China. The results are summarised in Figure 3 from the paper:

    The black solid line is actual New Zealand exports to China (in nominal US dollar terms). The red dashed line is the counterfactual created using the synthetic control method. The vertical dotted line reminds us that the FTA came into force in 2008. Notice that, prior to 2008, the two lines follow each other closely. That is what we should expect with this method, since the synthetic control is designed to closely mimic New Zealand data. After 2008, the lines diverge dramatically, with actual New Zealand exports to China far higher than the counterfactual. Verevis and Üngör note that:

    In the post-intervention between 2009 and 2015, NZ's actual exports to China were more than 120%, on average, higher than the synthetic counterparts.

    Eyeballing Figure 3, the increase in exports was in the order of US$6 billion at its peak, so the government's expectations of US$180-280 million wildly underestimated the trade impact of the FTA. What about GDP though? Verevis and Üngör's preferred results for GDP actually show a decrease, as shown in Figure 7 from the paper:

    Verevis and Üngör estimate that:

    In the post-intervention era, the 2009–2017 period, the synthetic real GDP per capita was 4%, on average, higher than the actual GDP per capita.

    However, there is good reason to doubt that there was such a negative impact of the FTA on GDP. The Global Financial Crisis (GFC) also occurred in 2008-2009, alongside this FTA coming into force. Verevis and Üngör argue that the GFC affected all countries, so is not a problem for their analysis. However, they acknowledge that the GFC didn't affect all countries equally. And when, in a robustness check, they exclude all Eurozone countries and Iceland, they find no significant impact of the FTA on New Zealand GDP per capita. Overall, I take from this that there is limited evidence in favour of a GDP impact of the FTA (in either direction). Of course, the concurrent GFC critique also applies to their earlier analysis of the impact on exports to China. When Verevis and Üngör re-run the analysis of exports while excluding Eurozone countries, the impact is smaller, but there is still a very large positive impact of the FTA.

    Ultimately, what can we take away from this study? The NZ-China Free Trade Agreement increased trade between New Zealand and China, but didn't really impact income in New Zealand (at least on average). Why might the value of exports to China increase but GDP remain unaffected? Verevis and Üngör show that exports to the rest of the world were largely unaffected, so it wasn't simple substitution from exporting to other countries to exporting to China instead. It's quite possible that the increase in exports to China was offset by an equivalent increase in imports from China, leaving net exports unchanged. Unfortunately, Verevis and Üngör don't look at imports, so we are left to guess.

    Finally, an 'upgraded' FTA between the two countries came into force in 2022. Given that many of the trade frictions had already been removed by the original agreement, the upgraded FTA likely had a smaller impact. In terms of GDP, it probably wouldn't be too much of a stretch to think that the impact will be similarly imperceptible to the impact from the original agreement.

    Sunday, 1 February 2026

    The changing system of regional economic development in New Zealand

    I just finished reading the edited volume Economic Development in New Zealand, edited by James Rowe and published in 2005. Edited volumes are difficult to review, particularly when the collection of chapters have only a loose connection and lack a common thread, and that was the case with this book. Instead, I want to share one overall takeaway from reading the book, and that is how the policy environment for regional economic development has changed immensely since the 2000s. This matters because the way that we organise regional development determines who sets priorities, where capability accumulates, and whether regional growth is sustainable or merely a sequence of centrally funded projects.

    So, what has changed? We can think about how leadership and decision-making has changed, how funding and strategy-setting has changed, and how the roles of business, educational institutions, and the research sector have changed.

    In the mid-2000s, regional economic development had a lot of prominence, and it has seen a bit of a revival in recent years. However, there are some substantial differences in how that prominence manifests between the two eras. In the mid-2000s, regional economic development was led by the regions. The central government had an important role in setting the policy environment and steering the direction through funding, but regional development initiatives typically came from the regions. This is exemplified by the Regional Partnerships Programme (RPP), which involved central government funding regions to develop their own plans, build capability, and then back major initiatives coming out of those plans. Business had a strong role in partnership with government, not just as part of the RPP, but more generally. Region-wide strategy and plan development tended to rely on input from local business and industry leaders. There was also an important role for training , research and development, and innovation, and so universities, polytechnics, and Crown research institutes were all closely involved in regional development.

    Fast forward to today, and regional development has been embodied in the Provincial Growth Fund, which has a lot of different aims, one of which is to "create jobs, leading to sustainable economic growth", and more recently the Regional Strategic Partnership Fund, which had a much more narrow aim to "make regional economies stronger and more resilient to improve the economic prospects, wellbeing and living standards of all New Zealanders". In both cases, it is central government that is largely the decision-maker, in addition to funding the initiatives, rather than the regions themselves. Business input is now largely channeled through consultation and deal-making, rather than input into the strategic direction of regional development. The rhetoric for business has changed to more of an emphasis on innovation and increasing productivity. That applies to the education and research sectors as well, where the role has shifted to more of a focus on core skills development and innovation, rather than being part of regional strategic plan development.

    In between the mid-2000s and today, regional development did go through a bit of a quiet patch. It is clearly back in vogue now, although the policy environment and systems have changed tremendously. What that means is that there is not much from Rowe's edited volume that translates directly to today's situation, sadly. The initiatives that the authors were writing about are long gone, even the AUT Masters degree in Economic Development that one chapter describes has long since closed down. However, the value in reading Rowe's book is that it provides a useful reminder that regional development has long been a goal of central government, and that there is more than one way to approach that goal.

    Friday, 30 January 2026

    This week in research #111

    Here's what caught my eye in research over the past week:

    • Hu and Su find that housing wealth appreciation significantly improves individual happiness in China
    • Díez-Rituerto et al. (with ungated earlier version here) study gender differences in willingness to guess in multiple-choice questions in a medical internship exam in Spain, and find that, in line with past research, women answer fewer questions than men, but that reducing the number of alternative answers reduces the difference between men and women among those who answer most of the questions
    • Chen, Fang, and Wang (with ungated earlier version here) find that holding a deanship in China increases patent applications by 15.2 percent, and that deans' misuse of power distorts resource allocation

    Thursday, 29 January 2026

    European monarchs' cognitive ability and state performance

    How important is the quality of a CEO to a company's performance over time? How important is the quality of a leader to a country's performance over time? These questions seem quite straightforward to answer, but in reality they are quite tricky. First, it is difficult to measure the 'quality' of a CEO or a leader. Second, the appointment of a CEO or a leader is not a random event - typically it is the result of a deliberative process, and may depend on the company's or country's past or expected future performance.

    What is needed is some CEOs or leaders who differ in 'quality' and who are randomly appointed to the role. This sort of experiment is, of course, not available in the real world. However, a 2025 article by Sebastian Ottinger (SERGE-EI) and Nico Voigtländer (UCLA), published in the journal Econometrica (open access), examines a setting that mimics the ideal experiment in many respects. Ottinger and Voigtländer look at 399 European monarchs from 13 states over the period 1000-1800 CE. To address the two concerns above (measurement of quality and non-random appointment), they:

    ...exploit two salient features of ruling dynasties: first, hereditary succession—the predetermined appointment of offspring of the prior ruler, independent of their ability; second, variation in ruler ability due to the widespread inbreeding of dynasties.

    Ottinger and Voigtländer measure the 'quality' of a ruling monarch using the work of Frederick Adams Woods, who:

    ...coded rulers’ cognitive capability based on reference works and state-specific historical accounts.

    Ottinger and Voigtländer measure the outcome variable, state performance, as a subjective measure from the work of Woods, as well as the change in land area during each monarch's reign, and the change in urban population during each monarch's reign. They then use a measure of the 'coefficient of inbreeding' for each ruler as an instrument for cognitive ability. This is important, because the instrumental variables (IV) approach they employ reduces the impact of any measurement error in cognitive ability, as well as dealing with the endogenous selection of rulers. However, as always with the IV approach, the key identifying assumption is that inbreeding affects the outcome (state performance) only through its effect on ruler cognitive ability (not, say, through the instability of succession). Ottinger and Voigtländer provide a detailed discussion in favour of the validity of the instrument, and support this by showing that the results hold when they instead use 'hidden inbreeding' (inbreeding that is less direct than, say, parents being first cousins or an uncle and niece) as an instrument.

    Now, in their main instrumental variables analysis, they find:

    ...a sizeable effect of (instrumented) ruler ability on all three dimensions of state performance. A one-std increase in ruler ability leads to a 0.8 std higher broad State Performance, to an expansion in territory by 16%, and to an increase in urban population by 14%.

    Ottinger and Voigtländer also explore the mechanisms explaining this effect, finding that:

    ...less inbred, capable rulers tended to improve their states’ finances, commerce, law and order, and general living conditions. They also reduced involvement in international wars, but when they did, won a larger proportion of battles, leading to an expansion of their territory into urbanized areas. This suggests that capable rulers chose conflicts “wisely,” resulting in expansions into valuable, densely populated territories.

    Finally, Ottinger and Voigtländer looked at whether a country's institutions mattered for the effect of the ruler's cognitive ability on state performance. They measure how constrained a ruler was, such as by the power of parliament, and using this measure in their analysis they find that:

    ...inbreeding and ability of unconstrained leaders had a strong effect on state borders and urban population in their reign, while the of constrained rulers (those who faced “substantial limitations on their authority”) made almost no difference.

    That result is further support that the cognitive ability of rulers mattered precisely in those situations where a ruler might be expected to have an effect - that is, when they are unconstrained by political institutions. When the ruler is constrained by parliament or other political institutions, their cognitive ability will likely have much less effect on state performance, and that is what Ottinger and Voigtländer found.

    One surprising finding from the paper appears in the supplementary materials, where Ottinger and Voigtländer report that the marginal effect of cognitive ability on state performance doesn't vary by gender. That surprises me a little given that earlier research by Dube and Harish (which Ottinger and Voigtländer cite in a footnote) found that queens were more likely to engage in wars than kings (see here). Now, this paper shows that more able rulers fight fewer wars. So, I would have expected that queens, having fought more wars, would show a different relationship between cognitive ability and state performance, but that didn't prove to be the case. Perhaps that tells us that, while queens may have fought more wars, they made better choices about which wars to fight? Or perhaps, they fought more wars but that only affected the level of wars, and not the interaction between cognitive ability and wars (or cognitive ability and state performance)?

    Regardless, overall these results tell us that the 'quality' of a leader really does matter. A higher quality ruler, in terms of cognitive ability, improves state performance. Extending from those results, we might expect that a higher quality CEO also improves company performance. Of course, CEO selection isn’t hereditary and differs in important ways, but the broader lesson that leader quality can matter a lot when leaders have discretion likely holds in that setting as well.

    [HT: Marginal Revolution, early last year]

    Read more: