Wednesday, 11 February 2026

Did employers value an AI-related qualification in 2021?

Many universities are rapidly adapting to education in the age of generative AI by trying to develop AI skills in their students. There is an assumption that employers want graduates with AI skills across all disciplines, but is there evidence to support that? This recent discussion paper by Teo Firpo (Humboldt-Universität zu Berlin), Lukas Niemann (Tanso Technologies), and Anastasia Danilov (Humboldt-Universität zu Berlin) provides an early answer. I say it's an early answer because their data come from 2021, before the wave of generative AI innovation that became ubiquitous following the release of ChatGPT at the end of 2022. The research also focuses on AI-related qualifications, rather than the more general AI skills, but it's a start.

Firpo et al. conduct a correspondence experiment, where they:

...sent 1,185 applications to open vacancies identified on major UK online job platforms... including Indeed.co.uk, Monster.co.uk, and Reed.co.uk. We restrict applications to entry-level positions requiring at most one year of professional experience, and exclude postings that demand rare or highly specialized skills...

Each identified job posting is randomly assigned to one of two experimental conditions: a "treatment group", which receives a résumé that includes additional AI-related qualifications and a "control group", which receives an otherwise identical résumé without mentioning such qualifications.

Correspondence experiments are relatively common in the labour economics literature (see here, for example), and involve the researcher making job applications with CVs (and sometimes cover letters) that differ in known characteristics. In this case, the applications differed by whether the CV included an AI-related qualification or not. Firpo et al. then focus on differences in callback rates, and they differentiate between 'strict callbacks' (invitations to interview), and 'broad callbacks' (any positive employer response, including requests for further information). Comparing callback rates between CVs with and without AI-related qualifications, they find:

...no statistically significant difference between treatment and control groups for either outcome measure...

However, when they disaggregate their results by job function, they find that:

In both Marketing and Engineering, résumés listing AI-related qualifications receive higher callback rates compared to those in the control group. In Marketing, strict callback rates are 16.00% for AI résumés compared to 7.00% for the control group (p-value = 0.075...), while broad callback rates are 24.00% versus 12.00% (p-value = 0.043...). In Engineering, strict callback rates are 10.00% for AI résumés compared to 4.00% for the control group (p-value = 0.163...), while broad callback rates are 20.00% versus 8.00% (p-value = 0.024...).

For the other job functions (Finance, HR, IT, and Logistics) there was no statistically significant effect of AI qualifications on either measure of callback rates. Firpo et al. then estimate a regression model and show that:

...including AI-related qualifications increases the probability of receiving an interview invitation for marketing roles by approximately 9 percentage points and a broader callback by 12 percentage points. Similarly, the interaction between the treatment dummy and the Engineering job function dummy in the LPM models is positive and statistically significant, but only for broad callbacks. AI-related qualifications increase the probability of a broad callback by at least 11 percentage points...

The results from the econometric model are only weakly statistically significant, but they are fairly large in size. However, I wouldn't over-interpret them because of the multiple-comparison problem (around five percent of results would show up as statistically significant just by chance). At best, the evidence that employers valued AI-related qualifications in 2021 is pretty limited, based on this research.

Firpo et al. were worried that employers might not have noticed the AI qualifications in the CVs, so they conducted an online survey of over 700 professionals with hiring experience and domain knowledge, but that survey instead shows that the AI-related qualification was salient and a signal of greater technical skills, but lower social skills. These conflicting signals are interesting, and suggestive that employers are looking for both technical skills and social skills in entry-level applicants. Does this, alongside the earlier results for different job functions, imply that technical skills are weighted more heavily than social skills for Engineering and Marketing jobs? I could believe that for Engineering, but for Marketing I have my doubts, because interpersonal skills are likely to be important in Marketing. Again though, it's probably best not to over-interpret the results.

Firpo et al. conclude that:

...our findings challenge the assumption that AI-related qualifications unambiguously enhance employability in early-career recruitment. While such skills might be valued in abstract or strategic terms, they do not automatically translate into interview opportunities, at least not in the entry-level labor market in job functions such as HR, Finance, Marketing, Engineering, IT and Logistics.

Of course, these results need to be considered in the context of their time. In 2021, AI-related skills might not have been much in demand by employers. That is unlikely to hold true now, given that generative AI use has become so widespread. It would be interesting to see what a more up-to-date correspondence experiment would find.

[HT: Marginal Revolution]

Read more:

  • ChatGPT and the labour market
  • More on ChatGPT and the labour market
  • The impact of generative AI on contact centre work
  • Some good news for human accountants in the face of generative AI
  • Good news, bad news, and students' views about the impact of ChatGPT on their labour market outcomes
  • Swiss workers are worried about the risk of automation
  • How people use ChatGPT, for work and not
  • Generative AI and entry-level employment
  • Survey evidence on the labour market impacts of generative AI
  • Tuesday, 10 February 2026

    Who on earth has been using generative AI?

    Who are the world's generative AI users? That is the question addressed in this recent article by Yan Liu and He Wang (both World Bank), published in the journal World Development (ungated earlier version here). They use website traffic data from Semrush, alongside Google Trends data, to document worldwide generative AI use up to March 2024 (so, it's a bit dated now, as this is a fast-moving area, but it does provide an interesting snapshot up to that point). In particular, Liu and Wang focus on geographical heterogeneity in generative AI use (measured as visits to generative AI websites, predominantly, or in some of their analyses, entirely ChatGPT), and they explore how that relates to country-level differences in institutions, infrastructure, and other variables.

    Some of the results are fairly banal, such as the rapid increase in website traffic to AI chatbot websites, a corresponding decline in traffic to sites such as Google, and Stack Overflow, and that the users skew younger, more educated, and male. Those demographic differences will likely become less dramatic over time as user numbers increase. However, the geographic differences are important and could be more persistent. Liu and Wang show that:

    As of March 2024, the top five economies for ChatGPT traffic are the US, India, Brazil, the Philippines, and Indonesia. The US share of ChatGPT traffic dropped from 70 % to 25 % within one month of ChatGPT’s debut. Middle-income economies now contribute over 50 % of traffic, showing disproportionately high adoption of generative AI relative to their GDP, electricity consumption, and search engine traffic. Low-income economies, however, represent less than 1 % of global ChatGPT traffic.

    So, as of 2024, most generative AI use was in middle-income countries, but remember that those are also high-population countries (like India). Generative AI users are disproportionately from high-income countries once income and internet use (proxied by search engine traffic) are accounted for. Figure 12 in the paper illustrates this nicely, showing generative AI use, measured as visits per internet user:

    Notice that the darker-coloured countries, where a higher proportion of internet users used ChatGPT, are predominantly in North America, western Europe, and Australia and New Zealand. On that measure, Liu and Wang rank New Zealand 20th (compared with Singapore first, and Australia eighth). There are a few interesting outliers like Suriname (sixth) and Panama (17th), but the vast majority of the top twenty countries are high-income countries.

    What accounts for generative AI use at the country level? Using a cross-country panel regression model, Liu and Wang find that:

    Higher income levels, a higher share of youth population, bet-ter digital infrastructure, and stronger human capital are key predictors of higher generative AI uptake. Services’ share of GDP and English fluency are strongly associated with higher chatbot usage.

    Now, those results simply demonstrate correlation, and are not causal. And website traffic could be biased due to use of VPNs, etc., not to mention that it doesn't account very well for traffic from China or Russia (and Liu and Wang are very upfront about that limitation). Nevertheless, it does provide a bit more information about how countries with high generative AI use differ from those with low generative AI use. Generative AI has the potential to level the playing field somewhat for lower-productivity workers, and lower-income countries. However, that can only happen if lower-income countries access generative AI. And it appears as if, up to March 2024 at least, they are instead falling behind. As Liu and Wang conclude, any catch-up potential from generative AI:

    ...depends on further development as well as targeted policy interventions to improve digital infrastructure, language accessibility, and foundational skills.

    To be fair, that sounds like a general prescription for development policy in any case.

    Read more:

    Monday, 9 February 2026

    The promise of a personalised, AI-augmented textbook, and beyond

    In the 1980s, the educational psychologist Benjamin Bloom introduced the 'two-sigma problem' - that students who were tutored one-on-one using a mastery approach performed on average two standard-deviations (two-sigma) better than students educated in a more 'traditional' classroom setting. That research is often taken as a benchmark for how good an educational intervention might be (relative to a traditional classroom baseline). The problem, of course, is that one-on-one tutoring is not scalable. It simply isn't feasible for every student to have their own personal tutor. Until now.

    Generative AI makes it possible for every student to have a personalised tutor, available 24/7 to assist with their learning. As I noted in yesterday's post though, it becomes crucial how that AI tutor is set up, as it needs to ensure that students engage meaningfully in a way that promotes their own learning, rather than simply being a tool to 'cognitively offload' difficult learning tasks.

    One promising approach is to create customised generative AI tools, that are specifically designed to act as tutors or coaches, rather than simple 'answer-bots'. This new working paper by the LearnLM team at Google (and a long list of co-authors) provides one example. They describe an 'AI-augmented textbook', which they call the 'Learn Your Way' experience, which:

    ...provides the learner with a personalized and engaging learning experience, while also allowing them to choose from different modalities in order to enhance understanding.

    Basically, this initially involves taking some source material, which in their case is a textbook, but could just as easily be lecture slides, transcripts, and related materials from a class. It then personalises those materials to the interests of the students, adapting the examples and exercises to fit a context that the students find more engaging. For example, if the student is an avid football fan, they might see examples drawn from football. And if the student is into Labubu toys, they might see examples based on that.

    The working paper describes the approach, reports a pedagogical evaluation performed by experts, and finally reports on a randomised controlled trial (RCT) evaluating the impact of the approach on student learning. The experts rated the Learn Your Way experience across a range of criteria, and the results were highly positive. The only criterion where scores were notably low was for visual illustrations. That accords with my experience so far with AI tutors, which are not good at drawing economics graphs, in particular (and is an ongoing source of some frustration!).

    The RCT involved sixty high-school students in Chicago area schools, who studied this chapter on brain development of adolescents. Half of the students were assigned to Learn Your Way, and half to a standard digital PDF reader. As the LearnLM Team et al. explain:

    Participants then used the assigned tool to study the material. Learning time was set to a minimum of 20 minutes and a maximum of 40 minutes. After this time, each participant had 15 minutes to complete the Immediate Assessment via a Qualtrics link.

    They then did a further assessment three days later (a 'Retention Assessment'). In terms of the impact of Learn Your Way:

    The students who used Learn Your Way received higher scores than those who used the Digital Reader, in both the immediate (p = 0.03) and retention (p = 0.03) assessments.

    The difference in test outcomes was 77 percent vs. 68 percent in the Immediate Assessment, and 78 percent vs. 67 percent in the Retention Assessment. So, the AI-augmented textbook increased student learning and retention by about 10 percentage points in both immediate learning and in the short term (three days). Of course, this was just a single study with a relatively small sample size of 60 students in a single setting, but it does offer some promise for the approach.

    I really like this idea of dynamically adjusting content to suit students' interests, which is a topic I have published on before. However, using generative AI in this way allows material to be customised for every student, creating a far more personalised approach to learning than any teacher could offer. I doubt that even one-on-one tutoring could match the level of customisation that generative AI could offer.

    This paper has gotten me thinking about the possibilities for personalised learning. Over the years, I have seen graduate students with specific interests left disappointed by what we are able to offer in terms of empirical papers. For example, I can recall students highly interested in economic history, the economics of education, and health economics in recent years. Generative AI offers the opportunity to provide a much more tailored education to students who have specific interests.

    This year, I'll be teaching a graduate paper for the first time in about a decade. My aim is to allow students to tailor that paper to their interests, by embarking on a series of conversations about research papers based on their interests. The direction that leads will be almost entirely up to the student (although with some guidance from me, where needed). Students might adopt a narrow focus on a particular research method, a particular research question, or a particular field or sub-field of economics. Assisted by a custom generative AI tool, they can read and discuss papers, try out replication packages, and/or develop their own ideas. Their only limits will be how much time they want to put into it. Of course, some students will require more direction than others, but that is what our in-class discussion time will be for.

    I am excited by the prospects of this approach, and while it will be a radical change to how our graduate papers have been taught in the past, it might offer a window to the future. And best of all, I have received the blessing of my Head of School to go ahead with this as a pilot project that might be an exemplar for wider rollout across other papers. Anyway, I look forward to sharing more on that later (as I will turn it into a research project, of course!).

    The ultimate question is whether we can use generative AI in a way that moves us closer to Bloom’s two-sigma benefit of one-on-one tutoring. The trick will be designing it so that students still do the cognitive work. My hope (and, it seems, the LearnLM team’s) is that personalisation increases students' engagement with learning rather than replacing it. If it works, this approach could be both effective and scalable in a way that human one-on-one tutoring simply can’t match.

    [HT: Marginal Revolution, for the AI-augmented textbook paper]

    Sunday, 8 February 2026

    Neuroscientific insights into learning and pedagogy, especially in the age of generative AI

    In May last year, my university's Centre for Tertiary Teaching and Learning organised a seminar by Barbara Oakley of Oakland University, with the grand title 'The Science of Learning'. It was a fascinating seminar about the neuroscience of learning, and in my mind, it justified several of my teaching and learning practices, such as continuing to have lectures, to emphasise students' learning basic knowledge in economics, and retrieval practice and spaced repetition as learning tools.

    Now, I've finally read the associated working paper by Oakley and co-authors (apparently forthcoming as a book chapter), and I've been able to pull out further insights that I want to share here. The core of their argument is in the Introduction to the paper. First:

    Emerging research on learning and memory reveals that relying heavily on external aids can hinder deep understanding. Equally problematic, however, are the pedagogical approaches used in tandem with reliance on external aids—that is, constructivist, often coupled with student-centered approaches where the student is expected to discover the insights to be learned... The familiar platitude advises teachers to be a guide on the side rather than a sage on the stage, but this oversimplifies reality: explicit teaching—clear, structured explanations and thoughtfully guided practice—is often essential to make progress in difficult subjects. Sometimes the sage on the stage is invaluable.

    I have resisted the urge to move away from lectures as a pedagogical tool, although I'd like to think that my lectures are more than simply information dissemination. I actively incorporate opportunities for students to have their first attempts at integrating and applying the economic concepts and models they are learning - the first step in an explicit retrieval practice approach. Oakley et al. note the importance of both components, because:

    ...mastering culturally important academic subjects—such as reading, mathematics, or science (biologically secondary knowledge)—generally requires deliberate instruction... Our brains simply aren’t wired to effortlessly internalize this kind of secondary knowledge—in other words, formally taught academic skills and content—without deliberate practice and repeated retrieval.

    The paper goes into some detail about the neuroscience underlying this approach, but again it is summarised in the Introduction:

    At the heart of effective learning are our brain's dual memory systems: one for explicit facts and concepts we consciously recall (declarative memory), and another for skills and routines that become second nature (procedural memory). Building genuine expertise often involves moving knowledge from the declarative system to the procedural system—practicing a fact or skill until it embeds deeply in the subconscious circuits that support intuition and fluent thinking...

    Internalized networks form mental structures called schemata, (the plural of “schema”) which organize knowledge and facilitate complex thinking... Schemata gradually develop through active engagement and practice, with each recall strengthening these mental frameworks. Metaphors can enrich schemata by linking unfamiliar concepts to familiar experiences... However, excessive reliance on external memory aids can prevent this process. Constantly looking things up instead of internalizing them results in shallow schemata, limiting deep understanding and cross-domain thinking.

    This last point, about the shallowness of learning when students rely on 'looking things up' instead of relying on their own memory of key facts (and concepts and models, in the case of economics), leads explicitly to worries about learning in the context of generative AI. When students rely on external aids (known as 'cognitive offloading'), then learning becomes shallow, because:

    ...deep learning is a matter of training the brain as much as informing the brain. If we neglect that training by continually outsourcing, we risk shallow competence.

    Even worse, there is a feedback loop embedded in learning, which exacerbates the negative effects of cognitive offloading:

    Without internally stored knowledge, our brain's natural learning mechanisms remain largely unused. Every effective learning technique—whether retrieval practice, spaced repetition, or deliberate practice—works precisely because it engages this prediction-error system. When we outsource memory to devices rather than building internal knowledge, we're not just changing where information is stored; we're bypassing the very neural mechanisms that evolved to help us learn.

    In short, internalized knowledge creates the mental frameworks our brains need to spot mistakes quickly and learn from them effectively. These error signals do double-duty: they not only help us correct mistakes but also train our attention toward what's important in different contexts, helping build the schemata we need for quick thinking. Each prediction error, each moment of surprise, thus becomes an opportunity for cognitive growth—but only if our minds are equipped with clear expectations formed through practice and memorization...

    Learning works through making mistakes, recognising those mistakes, and adapting to reduce those mistakes in future. Ironically, this is analogous to how generative AI models are trained (through 'reinforcement learning'). When students offload learning tasks to generative AI, they don't get an opportunity to develop the underlying internalised knowledge that allows them to recognise mistakes and learn from them. Thus, it is important for significant components of student learning to happen without resorting to generative AI (or other tools that allow students to cognitively offload tasks).

    Now, in order to encourage learning, teachers must provide students with the opportunity to make, and learn from, mistakes. Oakley et al. note that:

    ...cognitive scientists refer to challenges that feel difficult in the moment but facilitate deeper, lasting understanding as “desirable difficulties... Unlike deliberate practice, which systematically targets specific skills through structured feedback, desirable difficulties leverage cognitive struggle to deepen comprehension and enhance retention...

    Learning is not supposed to be easy. It is supposed to require effort. This is a point that I have made in many discussions with students. When they find a paper relatively easy, it is likely that they aren't learning much. And tools that make learning easier can hinder, rather than help, the learning process. In this context, generative AI becomes potentially problematic for learning for some (but not all) students. Oakley et al. note that:

    Individuals with well-developed internal schemas—often those educated before AI became ubiquitous—can use these tools effectively. Their solid knowledge base allows them to evaluate AI output critically, refine prompts, integrate suggestions meaningfully, and detect inaccuracies. For these users, AI acts as a cognitive amplifier, extending their capabilities.

    In contrast, learners still building foundational knowledge face a significant risk: mistaking AI fluency for their own. Without a robust internal framework for comparison, they may readily accept plausible-sounding output without realizing what’s missing or incorrect. This bypasses the mental effort—retrieval, error detection, integration—that neuroscience shows is essential for forming lasting memory engrams and flexible schemas. The result is a false sense of understanding: the learner feels accomplished, but the underlying cognitive work hasn’t been done.

    The group that benefits from AI as a complement for studying is not just those who were educated before AI became ubiquitous, but also those who learn in an environment where generative AI is explicitly available as a complement to learning (rather than a substitute). To a large extent, it depends on how generative AI is used as a learning tool. Oakley et al. do provide some good examples (and I have linked to some in past blog posts). I'd also like to think the AI tutors I have created for my ECONS101 and ECONS102 students assist with, rather than hamper, learning (and I have some empirical evidence that seems to support this, which I have already promised to blog about in the future).

    Oakley et al. conclude that:

    Effective education should balance the use of external tools with opportunities for students to internalize key knowledge and develop rich, interconnected schemata. This balance ensures that technology enhances learning rather than creating dependence and cognitive weakness.

    Finally, they provide some evidence-based strategies for enhancing learning (bolding is mine):

    • Embrace desirable difficulty—within limits: Encourage learners to generate answers and grapple with problems before turning to help... In classroom practice, this means carefully calibrating when to provide guidance—not immediately offering solutions, but also not leaving students floundering with tasks far beyond their current capabilities...
    • Assign foundational knowledge for memorization and practice: Rather than viewing factual knowledge as rote trivia, recognize it as the glue for higher-level thinking...
    • Use procedural training to build intuition: Allocate class time for practicing skills without external aids. For instance, mental math exercises, handwriting notes, reciting important passages or proofs from memory, and so on. Such practices, once considered old-fashioned, actually cultivate the procedural fluency that frees the mind for deeper insight...
    • Intentionally integrate technology as a supplement, not a substitute: When using AI tutors or search tools, structure their use so that the student remains cognitively active...
    • Promote internal knowledge structures: Help students build robust mental frameworks by ensuring connections happen inside their brains, not just on paper... guide students to identify relationships between concepts through active questioning ("How does this principle relate to what we learned last week?") and guided reflection...
    • Educate about metacognition and the illusion of knowledge: Help students recognize that knowing where to find information is fundamentally different from truly knowing it. Information that exists "out there" doesn't automatically translate to knowledge we can access and apply when needed.

    I really like those strategies as a prescription for learning. However, I am understandably biased, because many of the things I currently do in my day-to-day teaching practice are encompassed within (or similar to) those suggested strategies. I'll work on making 'guided reflection' a little more interactive in my classes this year, as I have traditionally made the links explicit for the students, rather than inviting them to make those links for themselves. We have been getting our ECONS101 students to reflect more on learning, and we'll be revising that activity (which happens in the first tutorial) this year to embrace more of a focus on metacognition.

    Learning is something that happens (often) in the brain. It should be no surprise that neuroscience has some insights to share on learning, and what that means for pedagogical practice. Oakley et al. take aim at some of the big names in educational theory (including Bloom, Dewey, Piaget, and Vygotsky), so I expect that their work is not going to be accepted by everyone. However, I personally found a lot to vindicate my pedagogical approach, which has developed over two decades of observational and experimental practice. I also learned that there are neuroscientific foundations for many aspects of my approach. And, I learned that there are things I can do to potentially further improve student learning in my classes.

    Friday, 6 February 2026

    This week in research #112

    Here's what caught my eye in research over the past week:

    • Mati et al. find that the Russia-Ukraine war resulted in an immediate 21 percent reduction in the daily growth rate of the Euro-Ruble exchange rate, and that the steady-state effect translates to a 26 percent reduction in growth
    • Masuhara and Hosoya review the COVID-19-related performance of OECD countries as well as Singapore and Taiwan in terms of deaths, vaccination status, production, consumption, and mobility from the early part of the pandemic to the end of 2022, and conclude that Norway was the most successful in terms of balancing deaths, production, and consumption
    • Neprash, McGlave, and Nikpay (with ungated earlier version here) quantify the effects of ransomware attacks on hospital operations and patient outcomes, finding that attacks decrease hospital volume by 17-24 percent during the initial attack week, with recovery occurring within 3 weeks, and that among patients already admitted to the hospital when a ransomware attack begins, in-hospital mortality increases by 34-38 percent
    • Tsivanidis (with ungated earlier version here) studies the world’s largest Bus Rapid Transit system in Bogotá, Colombia, and finds that low-cost "feeder" bus systems that complement mass rapid transit by providing last-mile connections to terminals yield high returns, but that welfare gains would have been about 36 percent larger under a more accommodative zoning policy
    • Janssen finds that the 2023 Bud Light boycott led to a large drop in Bud Light volume (34-37 percent), partial switching into other beer, and a net decline in total ethanol purchases of roughly 5.5-7.5 percent of pre-boycott intake
    • Krishnatri and Vellakkal (with ungated earlier version here) find that alcohol prohibition in Bihar, India, led to significant increases in caloric, protein, and fat intake from healthy food sources, as well as a decline in fat intake from unhealthy food sources
    • Geruso and Spears (open access) document the worldwide fall in birth rates, and the unlikely prospects of a reversal to higher fertility in the future

    Thursday, 5 February 2026

    Americans' beliefs about trade, and why compensation matters

    Do people understand trade policy? Or rather, do they understand trade policy the way that economists understand it? Given current debates in the US and elsewhere, it would be fair to question people's (or politicians') understanding of trade policy, and to consider what it is about trade that generates negative reactions. After all, the aggregate benefits of free trade are one of the things about which economists most agree.

    Last year, Stefanie Stantcheva won the John Bates Clark Medal (which is awarded annually to the American economist under age 40 who has made the most significant contributions to the field). Stantcheva's medal-winning work included three main strands, one of which was the use of "innovative surveys and experiments to measure what people know". One of the papers from that strand of research is this 2022 NBER Working Paper (revised in 2023), which describes Americans' understanding of trade and trade policy and importantly, it answers the question of why people support trade (or not).

    The paper reports results from three large-scale surveys in the US run between 2019 and 2023, with a total sample size of nearly 4000. The surveys also included experiments that primed respondents to think about trade from particular angles. Overall, Stantcheva is interested in teasing out the factors that affect Americans' support for trade policies. Essentially, she tests the mechanisms that are described in Boxes I-V in Figure 2 from the paper:

    Box I picks up views on whether trade lowers prices and increases variety for consumers. Box II picks up the threats from increasing trade to workers in import-competing sectors. Those two boxes together constitute self-interest as an effect on people's views on trade policy. Their views might also be affected by broader social and economic concerns, such as trade's efficiency effects (Box III), its distribution impacts (Box IV), and patriotism, partisanship, or geopolitical concerns (Box V).

    Before we turn to the specific results on the mechanisms, it is worth considering Americans' overall views on trade first. Stantcheva reports that:

    Most respondents (63%) are supportive of more free trade and decreasing trade restrictions in general... Only 36% believe that import restrictions are the best way to help U.S. workers.

    Nevertheless, there is support for more targeted trade restrictions. 40% of respondents believe the US should restrict food imports to ensure food security. 54% think the US should protect their “infant” industries. 78% support protection of key consumer products, namely food items and cars. 50% believe the US should restrict trade in key sectors, such as oil and machinery...

    And general knowledge about trade policy is not too bad, as:

    ...almost 80% of respondents know what an import tariff is, but just around half know what an import quota is. Two-thirds of respondents appear to understand the basic price effects of tariffs and export taxes, i.e., that an import tariff on imported goods will likely raise the price of that good and that an export tax will increase the price of the taxed good abroad. The final question... considers a scenario in which the US can produce a good (“cars”) at a lower cost than the foreign country. Respondents are asked whether, under some circumstances, it would still make sense to import cars from abroad. 68% of respondents agree that it could make sense. This suggests that respondents either understand the concept of comparative advantage or have in mind some model of love-for-variety or quality differential.

    So far, so good. How do Americans perceive the impacts of trade? Figure 9 Panel A reports perceptions related to the self-interest motivation (Boxes I and II from the figure above):

    From the bottom of that figure, it is clear that a majority of Americans believe that they are better off from trade, but a substantial minority (39%) believe that they are worse off. Still focusing on the self-interest motivations (Boxes I and II), Stantcheva finds that:

    In general, a respondent’s (objective) negative exposure to trade through their sector, occupation, or local labor market is significantly positively correlated with a feeling that trade has made them worse off and that it has negatively affected their job. People exposed to trade through their job also feel worse off as consumers and are less likely to believe that trade has reduced the prices of goods they buy, perhaps because they feel that their purchasing power is lower than it would otherwise be. Furthermore, college-educated respondents are significantly less likely to feel negatively impacted in their role as consumers and workers.

    Notice those results are mostly consistent with the figure above. What about consumer gains through reduced prices on imported products? Stantcheva reports that:

    ...the belief that prices decrease from trade is not significantly related to either support for trade or redistribution. Consistent with this lack of correlation, the experiment priming people to think of their benefits as consumers (precisely, the prices and variety of goods they purchase) does not move their support for trade either.

    So, in terms of self-interest, Americans' support for trade is more negative when they are negatively affected as workers, but is not more positive when they are positively affected as consumers. In my ECONS102 class, we talk about the tension between the gains from trade and loss aversion. Every trade involves gaining something, in exchange for giving something up. However, quasi-rational decision-makers are affected much more by losses than equivalent gains (what we call loss aversion). So, loss aversion might mean that many profitable trades are not undertaken, because the decision-makers prefer to keep what they have, rather than giving it up for something that may be objectively worth more. In the case of Stantcheva's survey respondents, the workers who are negatively impacted experience a loss, which would be weighed much more heavily than the gain that a consumer receives.

    An alternative explanation is salience. Job losses are very visible and impactful on the people who lose their jobs and those around them. Consumers' gains in terms of lower prices and increased variety, on the other hand, are not really as visible - many people wouldn't even notice them, unless they were pointed out to them. So even if people weren’t loss averse, attention would still be drawn disproportionately to the negative impacts of trade, rather than the positive. Taken altogether, Stantcheva's results here are not surprising.

    What about the broader social and economic concerns, and their impact on views about trade? In terms of efficiency effects (Box III), Stantcheva reports that:

    Respondents are generally optimistic about these effects. For instance, 61% of respondents think that international trade increases competition among firms in the US, 69% that it fosters innovation, and 62% that it generates more GDP growth.

    Moreover:

    ...efficiency gains from trade are significantly associated with more support for free trade... This relation can be seen in the correlations and the experimental effects: the Efficiency treatment significantly improves support for free trade.

    And interestingly:

    Respondents who believe that trade can improve innovation, competitiveness, and GDP are more supportive of redistribution policy to help those who do not benefit from these efficiency gains.

    Turning to distributional impacts (Box IV), Stantcheva reports that:

    Overall, respondents know that trade can have adverse distributional consequences through the labor market. Just around half of all respondents believe that trade has, on balance, helped US workers. 79% of people think that trade is the reason for “unemployment in some sectors and the decline of some industries in the U.S..” More respondents (63%) believe that high-skilled workers could easily change their work sector if their jobs were destroyed by trade than that low-skilled workers could switch sectors (37%)...

    Consequently, around two-thirds of respondents think that trade is a major reason for the “rise in inequality” in the US. Notably, despite being aware of the potential adverse distributional consequences of trade, a majority (62%) of respondents believe that, in principle, trade could make everyone better off because it is possible to “compensate those who lose from it through appropriate policies.”

    It is interesting that so many people believe in the compensation principle (although I bet that few of them would know that term for it). And it turns out that belief in the compensation principle is really important, as:

    ...the strongest predictor of support for free trade is the belief that, in principle, losers can be compensated... free trade. As long as respondents believe that adverse consequences from trade on some groups can be dampened by redistributive policy, they are likely to support more free trade, even if they believe that there are adverse distributional consequences. The perceived distributional impacts of trade also substantially matter for support for compensatory redistribution. Respondents who believe that trade hurts low-income and low-skilled workers and that it fosters inequality support redistribution much more.

    Finally, in terms of patriotism, partisanship, or geopolitical concerns (Box V), Stantcheva reports that:

    ...those who worry about geopolitical ramifications from trade restrictions, i.e., retaliatory responses, are more likely to support policies to compensate losers from trade rather than support outright trade restrictions. Patriotism is significantly correlated with support for trade restrictions in many industries and to protect U.S. workers, as well as with lower support for compensatory transfers...

    Stantcheva draws a number of conclusions from her results, including:

    First, respondents perceive gains from trade as consumers to be vague and unclear but perceive potential losses as workers to be concentrated and salient. Actual and perceived exposure to trade through the labor market is significantly associated with policy views...

    Second, people’s policy views on trade do not only reflect self-interest. Respondents also care about trade’s distributional and efficiency impacts on others and the US economy...

    Third, respondents’ experience, as measured by their exposure to trade through their sector, occupation, and local labor market, shapes their policy views directly (through self-interest) and indirectly by influencing their understanding and reasoning about the broader efficiency and distributional impacts of trade.

    Overall, I take away from this paper that Americans have more correct views about trade than I suspected. Their support for trade is not determined simply by self-interest, but is more nuanced. However, negative impacts weigh far more heavily for those who are negatively impacted than the weight attached to positive impacts for those who are positively impacted. That may relate to loss aversion, and to the more concentrated nature of negative impacts compared with more diffuse positive impacts. That asymmetry also explains why a majority have positive views of trade (since fewer people will have been negatively impacted on the whole). The most surprising aspect to me, though, was the views on the compensation principle. Those results provide a clear policy prescription. To get more people on board with trade, making compensatory policy more explicit and salient may help to ensure that there is greater support for trade. On the other hand, politicians who want to exploit the negative views on trade might benefit from obscuring any such compensatory policies. Unfortunately, there are too many who are willing to do just that.

    [HT: Marginal Revolution, last year]

    Wednesday, 4 February 2026

    The economic impacts of the 2008 NZ-China Free Trade Agreement

    New Zealand was the first Western developed country to sign a free trade agreement with China, and it came into force in 2008. At the time, the New Zealand government estimated an increase in exports to China of between NZ$225 million and NZ$350 million (between US$180 million and US$280 million), and Ministry of Foreign Affairs and Trade (MFAT) estimated an increase of 0.25% in GDP. How did things actually turn out?

    That is the question addressed in this 2021 article by Samuel Verevis (MFAT) and Murat Üngör (University of Otago), published in the Scottish Journal of Political Economy (ungated earlier version here). Now, the challenge with this sort of exercise is that we can observe what happened to New Zealand with the FTA in place, but we cannot observe what would have happened if there had been no FTA (the counterfactual). And that is a problem, since what we really want to know is the difference in outcome between what really happened and the counterfactual.

    Verevis and Üngör solve that problem by using the synthetic control method. Essentially, they use a weighted average of the outcomes of other countries (donor countries), that closely follows the trends in the New Zealand data before the FTA came into force in 2008, and then use the same weights to create a 'synthetic New Zealand' counterfactual for the period after 2008. The key assumption with this approach is that there isn't some other change that affected New Zealand differently from the donor countries at the same time as the FTA came into force.

    Verevis and Üngör first look at the effect on New Zealand exports to China. The results are summarised in Figure 3 from the paper:

    The black solid line is actual New Zealand exports to China (in nominal US dollar terms). The red dashed line is the counterfactual created using the synthetic control method. The vertical dotted line reminds us that the FTA came into force in 2008. Notice that, prior to 2008, the two lines follow each other closely. That is what we should expect with this method, since the synthetic control is designed to closely mimic New Zealand data. After 2008, the lines diverge dramatically, with actual New Zealand exports to China far higher than the counterfactual. Verevis and Üngör note that:

    In the post-intervention between 2009 and 2015, NZ's actual exports to China were more than 120%, on average, higher than the synthetic counterparts.

    Eyeballing Figure 3, the increase in exports was in the order of US$6 billion at its peak, so the government's expectations of US$180-280 million wildly underestimated the trade impact of the FTA. What about GDP though? Verevis and Üngör's preferred results for GDP actually show a decrease, as shown in Figure 7 from the paper:

    Verevis and Üngör estimate that:

    In the post-intervention era, the 2009–2017 period, the synthetic real GDP per capita was 4%, on average, higher than the actual GDP per capita.

    However, there is good reason to doubt that there was such a negative impact of the FTA on GDP. The Global Financial Crisis (GFC) also occurred in 2008-2009, alongside this FTA coming into force. Verevis and Üngör argue that the GFC affected all countries, so is not a problem for their analysis. However, they acknowledge that the GFC didn't affect all countries equally. And when, in a robustness check, they exclude all Eurozone countries and Iceland, they find no significant impact of the FTA on New Zealand GDP per capita. Overall, I take from this that there is limited evidence in favour of a GDP impact of the FTA (in either direction). Of course, the concurrent GFC critique also applies to their earlier analysis of the impact on exports to China. When Verevis and Üngör re-run the analysis of exports while excluding Eurozone countries, the impact is smaller, but there is still a very large positive impact of the FTA.

    Ultimately, what can we take away from this study? The NZ-China Free Trade Agreement increased trade between New Zealand and China, but didn't really impact income in New Zealand (at least on average). Why might the value of exports to China increase but GDP remain unaffected? Verevis and Üngör show that exports to the rest of the world were largely unaffected, so it wasn't simple substitution from exporting to other countries to exporting to China instead. It's quite possible that the increase in exports to China was offset by an equivalent increase in imports from China, leaving net exports unchanged. Unfortunately, Verevis and Üngör don't look at imports, so we are left to guess.

    Finally, an 'upgraded' FTA between the two countries came into force in 2022. Given that many of the trade frictions had already been removed by the original agreement, the upgraded FTA likely had a smaller impact. In terms of GDP, it probably wouldn't be too much of a stretch to think that the impact will be similarly imperceptible to the impact from the original agreement.

    Sunday, 1 February 2026

    The changing system of regional economic development in New Zealand

    I just finished reading the edited volume Economic Development in New Zealand, edited by James Rowe and published in 2005. Edited volumes are difficult to review, particularly when the collection of chapters have only a loose connection and lack a common thread, and that was the case with this book. Instead, I want to share one overall takeaway from reading the book, and that is how the policy environment for regional economic development has changed immensely since the 2000s. This matters because the way that we organise regional development determines who sets priorities, where capability accumulates, and whether regional growth is sustainable or merely a sequence of centrally funded projects.

    So, what has changed? We can think about how leadership and decision-making has changed, how funding and strategy-setting has changed, and how the roles of business, educational institutions, and the research sector have changed.

    In the mid-2000s, regional economic development had a lot of prominence, and it has seen a bit of a revival in recent years. However, there are some substantial differences in how that prominence manifests between the two eras. In the mid-2000s, regional economic development was led by the regions. The central government had an important role in setting the policy environment and steering the direction through funding, but regional development initiatives typically came from the regions. This is exemplified by the Regional Partnerships Programme (RPP), which involved central government funding regions to develop their own plans, build capability, and then back major initiatives coming out of those plans. Business had a strong role in partnership with government, not just as part of the RPP, but more generally. Region-wide strategy and plan development tended to rely on input from local business and industry leaders. There was also an important role for training , research and development, and innovation, and so universities, polytechnics, and Crown research institutes were all closely involved in regional development.

    Fast forward to today, and regional development has been embodied in the Provincial Growth Fund, which has a lot of different aims, one of which is to "create jobs, leading to sustainable economic growth", and more recently the Regional Strategic Partnership Fund, which had a much more narrow aim to "make regional economies stronger and more resilient to improve the economic prospects, wellbeing and living standards of all New Zealanders". In both cases, it is central government that is largely the decision-maker, in addition to funding the initiatives, rather than the regions themselves. Business input is now largely channeled through consultation and deal-making, rather than input into the strategic direction of regional development. The rhetoric for business has changed to more of an emphasis on innovation and increasing productivity. That applies to the education and research sectors as well, where the role has shifted to more of a focus on core skills development and innovation, rather than being part of regional strategic plan development.

    In between the mid-2000s and today, regional development did go through a bit of a quiet patch. It is clearly back in vogue now, although the policy environment and systems have changed tremendously. What that means is that there is not much from Rowe's edited volume that translates directly to today's situation, sadly. The initiatives that the authors were writing about are long gone, even the AUT Masters degree in Economic Development that one chapter describes has long since closed down. However, the value in reading Rowe's book is that it provides a useful reminder that regional development has long been a goal of central government, and that there is more than one way to approach that goal.

    Friday, 30 January 2026

    This week in research #111

    Here's what caught my eye in research over the past week:

    • Hu and Su find that housing wealth appreciation significantly improves individual happiness in China
    • Díez-Rituerto et al. (with ungated earlier version here) study gender differences in willingness to guess in multiple-choice questions in a medical internship exam in Spain, and find that, in line with past research, women answer fewer questions than men, but that reducing the number of alternative answers reduces the difference between men and women among those who answer most of the questions
    • Chen, Fang, and Wang (with ungated earlier version here) find that holding a deanship in China increases patent applications by 15.2 percent, and that deans' misuse of power distorts resource allocation

    Thursday, 29 January 2026

    European monarchs' cognitive ability and state performance

    How important is the quality of a CEO to a company's performance over time? How important is the quality of a leader to a country's performance over time? These questions seem quite straightforward to answer, but in reality they are quite tricky. First, it is difficult to measure the 'quality' of a CEO or a leader. Second, the appointment of a CEO or a leader is not a random event - typically it is the result of a deliberative process, and may depend on the company's or country's past or expected future performance.

    What is needed is some CEOs or leaders who differ in 'quality' and who are randomly appointed to the role. This sort of experiment is, of course, not available in the real world. However, a 2025 article by Sebastian Ottinger (SERGE-EI) and Nico Voigtländer (UCLA), published in the journal Econometrica (open access), examines a setting that mimics the ideal experiment in many respects. Ottinger and Voigtländer look at 399 European monarchs from 13 states over the period 1000-1800 CE. To address the two concerns above (measurement of quality and non-random appointment), they:

    ...exploit two salient features of ruling dynasties: first, hereditary succession—the predetermined appointment of offspring of the prior ruler, independent of their ability; second, variation in ruler ability due to the widespread inbreeding of dynasties.

    Ottinger and Voigtländer measure the 'quality' of a ruling monarch using the work of Frederick Adams Woods, who:

    ...coded rulers’ cognitive capability based on reference works and state-specific historical accounts.

    Ottinger and Voigtländer measure the outcome variable, state performance, as a subjective measure from the work of Woods, as well as the change in land area during each monarch's reign, and the change in urban population during each monarch's reign. They then use a measure of the 'coefficient of inbreeding' for each ruler as an instrument for cognitive ability. This is important, because the instrumental variables (IV) approach they employ reduces the impact of any measurement error in cognitive ability, as well as dealing with the endogenous selection of rulers. However, as always with the IV approach, the key identifying assumption is that inbreeding affects the outcome (state performance) only through its effect on ruler cognitive ability (not, say, through the instability of succession). Ottinger and Voigtländer provide a detailed discussion in favour of the validity of the instrument, and support this by showing that the results hold when they instead use 'hidden inbreeding' (inbreeding that is less direct than, say, parents being first cousins or an uncle and niece) as an instrument.

    Now, in their main instrumental variables analysis, they find:

    ...a sizeable effect of (instrumented) ruler ability on all three dimensions of state performance. A one-std increase in ruler ability leads to a 0.8 std higher broad State Performance, to an expansion in territory by 16%, and to an increase in urban population by 14%.

    Ottinger and Voigtländer also explore the mechanisms explaining this effect, finding that:

    ...less inbred, capable rulers tended to improve their states’ finances, commerce, law and order, and general living conditions. They also reduced involvement in international wars, but when they did, won a larger proportion of battles, leading to an expansion of their territory into urbanized areas. This suggests that capable rulers chose conflicts “wisely,” resulting in expansions into valuable, densely populated territories.

    Finally, Ottinger and Voigtländer looked at whether a country's institutions mattered for the effect of the ruler's cognitive ability on state performance. They measure how constrained a ruler was, such as by the power of parliament, and using this measure in their analysis they find that:

    ...inbreeding and ability of unconstrained leaders had a strong effect on state borders and urban population in their reign, while the of constrained rulers (those who faced “substantial limitations on their authority”) made almost no difference.

    That result is further support that the cognitive ability of rulers mattered precisely in those situations where a ruler might be expected to have an effect - that is, when they are unconstrained by political institutions. When the ruler is constrained by parliament or other political institutions, their cognitive ability will likely have much less effect on state performance, and that is what Ottinger and Voigtländer found.

    One surprising finding from the paper appears in the supplementary materials, where Ottinger and Voigtländer report that the marginal effect of cognitive ability on state performance doesn't vary by gender. That surprises me a little given that earlier research by Dube and Harish (which Ottinger and Voigtländer cite in a footnote) found that queens were more likely to engage in wars than kings (see here). Now, this paper shows that more able rulers fight fewer wars. So, I would have expected that queens, having fought more wars, would show a different relationship between cognitive ability and state performance, but that didn't prove to be the case. Perhaps that tells us that, while queens may have fought more wars, they made better choices about which wars to fight? Or perhaps, they fought more wars but that only affected the level of wars, and not the interaction between cognitive ability and wars (or cognitive ability and state performance)?

    Regardless, overall these results tell us that the 'quality' of a leader really does matter. A higher quality ruler, in terms of cognitive ability, improves state performance. Extending from those results, we might expect that a higher quality CEO also improves company performance. Of course, CEO selection isn’t hereditary and differs in important ways, but the broader lesson that leader quality can matter a lot when leaders have discretion likely holds in that setting as well.

    [HT: Marginal Revolution, early last year]

    Read more:

    Monday, 26 January 2026

    Roman rule, and personality traits and subjective wellbeing in modern Germany

    History has a long tail. Events in the distant past can have surprising effects today. For instance, past research I have blogged on has shown that autocratic rule in Qing dynasty China affects social capital today (see here), the Spanish Inquisition affects GDP in Spanish municipalities (see here), and Roman roads affect the modern location and density of roads in Europe (see here). In that vein, this recent article by Martin Obschonka (University of Amsterdam) and co-authors, published in the journal Current Research in Ecological and Social Psychology (open access), looks at the effect of Roman rule on modern incidence of personality traits and subjective wellbeing in Germany. To do this, Obschonka et al. compare people on either side of the Limes Wall, noting that:

    To protect their territory with its cultural and economic advancements, the Romans built the Limes wall around 150 AD and it served as a border of the empire for more than a century. The Limes consists of three major rivers, namely the Rhine, the Danube, and the Main ("Main Limes"), as well as a physical wall ... It is well-documented that the Limes constituted a physical, economic, and cultural border between the Roman and Germanic cultures...

    By comparing people on either side of the Limes Wall, Obschonka et al. try to reveal the enduring impact of Roman rule. They expect this effect on personality traits and subjective wellbeing because:

    ...the Roman society was much wealthier and considerably more structured and organized than the “barbaric” Germanic tribes, with an effective public administration and a relatively well-elaborated legal system... When the Romans occupied parts of the territories inhabited by Germanic tribes, they imported superior scientific knowledge and a civic structure.

    To measure personality traits, Obschonka et al. turn to the German dataset from the Gosling-Potter Internet Personality Project, the largest dataset on the 'Big Five' personality traits. The German sample they use includes over 73,000 observations between 2003 and 2015, which they aggregate to regional-level averages. For subjective wellbeing (life satisfaction), they use data from the German Socioeconomic Panel between 1984 and 2016, again aggregated to regional-level averages. They also look at life expectancy. Using a simple OLS regression model, with a 'treatment variable' indicating that a region was in the Roman occupied area, Obschonka et al. find that:

    ...the populations in those regions that were occupied by the Romans nearly 2000 years ago show significantly higher levels of extraversion, agreeableness, and openness, and significantly lower levels of neuroticism (which points to more adaptive personality patterns in the former Roman regions of present-day Germany) than do the populations living in the non-occupied regions... Moreover, populations living in the formerly Roman areas today report greater satisfaction with life and health, and also have longer life expectancies...

    After including a range of control variables into their models, the effects on agreeableness and openness became statistically insignificant. However, that leaves significant effects of Roman rule on extraversion and neuroticism, as well as life satisfaction and life expectancy. The results are similar when they use a spatial regression discontinuity design (RDD) instead of OLS. The spatial RDD takes account of how far away an observation is from the Limes Wall, which separates the 'treated' and 'control' regions (and regions closer to the line provide more information about the distinctive effect of the treatment, in this case Roman rule). The method assumes that places on either side of the border are similar except for the Roman occupation. This seems plausible, so the spatial RDD results in particular make the results more believable.

    Obschonka et al. then turn to looking at the mechanisms that might explain the enduring effect of Roman rule. They show that:

    Density of road infrastructure built by the Romans shows a statistically significant, positive effect on life and health satisfaction, as well as on life expectancy. There is a negative, statistically significant relationship with neuroticism, a positive one with extraversion, and a non-significant one with agreeableness, conscientiousness, and openness...

    Running the models with the number of Roman markets and mines as the independent variable reveals a negative effect on neuroticism and a positive effect on extraversion. In addition, there is also a positive effect on conscientiousness (and openness). None of the effects on psychological well-being or health were statistically significant. Including Roman road density and the number of Roman markets and mines in the same model... clearly indicates that markets and mines are more strongly related to the personality traits, whereas Roman road density is more closely related to the health and well-being outcomes.

    These results should be seen as more exploratory, but Obschonka et al. interpret them as showing:

    ...support for the notion that the tangible and lasting economic infrastructure built and established by the Romans left a long-term macro-psychological legacy...

    Perhaps. I find it less plausible that Roman physical infrastructure had a lasting effect on modern personality traits and subjective wellbeing, and more likely that Roman worldviews and 'social infrastructure' (things like institutions or social norms, for example) was passed down from one generation to the next, showing up as a lasting effect on personality and wellbeing. Unfortunately, Obschonka et al. aren't able to tease out those sorts of mechanisms. Either way, it’s another reminder that borders drawn 2000 years ago can still show up in the data, even in places we might not think to look.

    [HT: Marginal Revolution, early last year]

    Sunday, 25 January 2026

    The Census Tree project

    An exciting (and new-ish) dataset offers us an unprecedented opportunity to explore research questions using historical US Census data. When I posted about what's new in regional and urban economics last year, one of the things that was raised was the linking of historical Census records over time. That was based on the work of Abramitzky et al., known as the Census Linking Project (CLP). However, in a recent article published in the journal Explorations in Economic History (open access), Kasey Buckles (University of Notre Dame) and co-authors report on an alternative Census linking dataset that has far larger coverage than the CLP. As they explain:

    In the Census Tree project, we use information provided by members of the largest genealogy research community in the world to create hundreds of millions of new links among the historical U.S. Censuses (1850–1940). The users of the platform link data sources—including decennial census records—to the profiles of deceased people as part of their own family history research. In doing so, they rely on private information like maiden names, family members’ names, and geographic moves to make links that a researcher would never be able to make using the observable information...

    The result is the publicly-available Census Tree dataset, which contains over 700 million links among the 1850–1940 censuses...

    The article describes the creation of the Census Tree dataset, which can be accessed for free online. Buckles et al. also demonstrate the use of the dataset, in a particular application in comparison with the CLP data of Abramitzky et al.:

    ...who show that the children of immigrants were more upwardly mobile on average than the children of the U.S.-born in the late 19th and early 20th centuries. We replicate this result using the Census Tree, and are able to increase the precision of estimates for each sending country. Furthermore, the Census Tree includes sufficient numbers of links to produce estimates for an additional ten countries, including countries from Central America and the Caribbean. We find that the sons of low-income immigrants from Mexico had significantly worse outcomes on average than sons of fathers from other countries, including U.S.-born Whites. We further extend [Abramitzky et al.] by analyzing the mobility of women in a historical sample, and compare these results to historical estimates for men and modern estimates for women. While the patterns for daughters and sons are broadly similar, differences in marriage patterns contribute to gender gaps in mobility in some countries.

    As I noted in this post last year, the ability to link people over long periods of time (including between generations) has opened up a wealth of new research questions. Buckles et al. offers a peek at the range of research that has already been done using the Census Tree dataset (see Appendix B in the paper for a bibliography).

    Now, the coverage isn't perfect, and there is still some ways to go. You can evaluate the quality of the dataset based on what Buckles et al. report in their article, but it is clearly better than previous efforts. And importantly:

    ...we plan to update the Census Tree every two-to-three years to incorporate new information added by FamilySearch users, to include new links... and to implement methodological advances in linking methods that we and others develop.

    This seems like a really important resources for researchers in economics, sociology, regional science, and other fields, and not just for those interested in economic history. 

    Saturday, 24 January 2026

    The long persistence of retracted 'zombie' papers

    When a paper is retracted by a journal, that understandably tends to negatively impact perceptions of the researcher and the quality of their research (see here). However, these 'zombie' papers can maintain an undead existence for some time, continuing to be cited and used, sometimes uncritically, because retractions take time and because publishers are not good at highlighting when an article has been retracted. They may even continue to accrue further citations even after being retracted. In terms of understanding the effect of retractions on the research system, a key question is: how long does it take for a paper to be retracted?

    That is essentially the question that this new article by Marc Joëts (University of Lille) and Valérie Mignon (University of Paris Nanterre), published in the journal Research Policy (open access), addresses. Joëts and Mignon draw on a sample of 25,480 retracted research articles over the period from 1923-2023 (taken from the Retraction Watch database), and look at the factors associated with the time to retraction (that is, the time between first publication and when the article is retracted). First, they find that:

    ...the average time to retraction is approximately 1045 days (nearly 3 years), but there is significant variability, with a standard deviation of 1225 days... However, some extreme cases take much longer, with the longest retraction occurring 81 years after publication.

    Joëts and Mignon use several different forms of survival model to evaluate the relationship between the characteristics of an article and the time to retraction. In this analysis, they find that:

    Papers in biomedical and life sciences are generally retracted faster than those in social sciences and humanities, and articles published by predatory publishers are withdrawn more promptly than those from reputable journals. Collaboration intensity and type of misconduct also emerge as significant predictors of retraction delays.

    The result for predatory journals seems somewhat surprising. However, Joëts and Mignon suggest that:

    ...predatory journals often publish papers with evident deficiencies that are more easily detectable by external parties, such as watchdog organizations or institutions, leading to quicker retractions when misconduct is identified. Additionally, the lack of formal editorial procedures in predatory journals may result in a less structured and faster retraction process...

    Of course, a faster time to retraction doesn't make predatory journals good. It simply makes them less bad, since they almost certainly are a large source of low-quality research that deserves retraction (Joëts and Mignon don't report the proportion of retractions that come from predatory journals).

    In terms of collaboration intensity, articles with more co-authors take longer to retract, presumably because more people are involved in the retraction process, or because disputes over who is to blame may take some time to resolve. For types of misconduct, retractions due to 'data issues' take the longest to occur, while those for 'peer review errors' and 'referencing problems' take the least. That likely reflects that it takes some time for data analyses to be replicated and for problems to surface, whereas problems with referencing are more likely to be readily apparent from a simple reading of the article.

    Joëts and Mignon also do a lot of modelling of different editorial policy changes and their effects on the distribution of times to retraction, but I don't think we can read too much into that part of the article, as the results are mostly driven by the assumptions on how the policies affect retractions. Nevertheless, this paper provides some insight into why zombie papers can keep shambling through the literature: retractions are slow and the time to retraction depends on discipline, publisher type, collaboration, and the kind of misconduct involved.

    Read more:

    Friday, 23 January 2026

    This week in research #110

    Here's what caught my eye in research over the past week (a quiet one, after a bumper week last week):

    • Khan, Önder, and Ozcan (open access) use the UK’s transition from the Research Assessment Exercise to the Research Excellence Framework in 2009 as a natural experiment, and find that performance-based funding increased female participation in collaborative research by 10.3 percentage points, and that increased female participation coincided with higher research impact, with treated papers receiving 4.79 more citations on average

    Thursday, 22 January 2026

    What Hamilton and Waikato can learn from France about the consequences of inter-municipal water supply

    Hamilton City and Waikato District are transitioning their water services (drinking water, wastewater, stormwater) to a new, jointly owned Council Controlled Organisation (CCO) called IAWAI - Flowing Waters. IAWAI will deliver water services across all of Hamilton City and Waikato District, and is a response to the central government's 'Local Water Done Well' plan "to address New Zealand’s long-standing water infrastructure challenges". What are likely to be some of the consequences of the merging of water services across Hamilton and Waikato?

    Interestingly, this recent article by Mehdi Guelmamen, Serge Garcia (both University of Lorraine), and Alexandre Mayol (University of Lille), published in the journal International Review of Law and Economics (open access), may provide us with some idea. They look at inter-municipal cooperation in the provision of drinking water in France. France provides an interesting case study because:

    With roughly 12,000 water services—90 % serving populations under 10,000—and over 70 % managed by individual municipalities acting independently, there is substantial heterogeneity in governance arrangements.

    That is similar in spirit to our situation. Although Hamilton City (population around 192,000) and Waikato District (population around 86,000) are substantially larger than the municipalities in Guelmamen et al.'s sample, Waikato District is made up of many communities with their own separate water infrastructure (Huntly, Ngāruawāhia, Raglan, Pōkeno, Te Kauwhata, and others). Those many communities, aggregated into a single water entity, mimics the French context. 

    Guelmamen et al. investigate the determinants of inter-municipal cooperation (IMC) in drinking water supply, as well as how IMC affects pricing of drinking water, water quality, and scarcity of water, using data from 10,000 water services operations over the period from 2008 to 2021. Their analysis involves a two-step approach, where they first look at the associations with pricing, and then look at how the services are organised, conditional on the prices. They find that public water services are more likely to cooperate than privatised services, but of more interest to me, they also found that:

    First, IMC does not necessarily lead to lower water prices; on the contrary, water prices are often higher under IMC, reflecting additional transaction costs and the financing of investments enabled or encouraged by cooperative arrangements... Third, while IMC generally improves network performance—as evidenced by lower loss rates—the quality improvements are more pronounced in some institutional forms (e.g., communities rather than syndicates).

    That first finding arises in spite of an expectation of economies of scale from larger water services operations. Guelmamen et al. explain this as follows:

    First, cooperation often involves additional administrative costs due to the need for inter-municipal coordination, governance structures and compliance with multi-party agreements. Second, the larger scale management facilitated by IMC may lead to increased investment in infrastructure, which, while beneficial in the long run, increases short-term costs that are passed on to consumers...

    So, even though there may be economies of scale in terms of water provision, these were more than offset by coordination and governance costs, and investment in higher quality water services. In their estimates, this showed up in a combination of three effects. First, there was a negative (and convex) relationship between network size and price (representing economies of scale, as bigger networks have lower average costs, but the cost savings from bigger networks get smaller as network size gets bigger). Second, there was a negative (but concave) relationship between the number of municipalities in the IMC and price (again representing economics of scale, but in this case they become less negative as more municipalities are included). Third, there was a positive relationship between population size and price. The combination of those three effects is that larger IMCs, particularly those that involve more municipalities, have higher, rather than lower prices.

    The greater investment in higher quality water services is supported by their third finding above, which shows that IMCs have better network performance (less water is lost). IMCs also had higher quality water, measured as fewer breaches of microbiological and physico-chemical water standards).

    What does this tell us for Hamilton and Waikato? Obviously, the context is different, but many of the elements (such as combining multiple municipal water services suppliers into one, and potential economics of scale) are the same. Moreover, Waikato District already has many small water services combined into a single entity, which is not dissimilar to the situation in France. So, if we take these French results at face value, then the risk is that the price of water will go up. Hamilton and Waikato don't currently have water meters, so the unit price of water will remain zero (which in itself may be a problem, because it incentives overuse of water). Instead, water is charged as a fixed charge in annual property rates. The higher price of water will need to be covered by a higher annual fixed charge within the rates bills in Hamilton and Waikato. On the other hand, the quality of drinking water may increase, and drinking water provision may be more sustainable due to higher investment spending. And, of course, a more sustainable provision of water services is what the central government's plan was intended to achieve.

    How will we know if the creation of IAWAI is a good thing? Earlier indicators will be decreases in total administration and overhead costs, increases in capital expenditure (both for new construction and for maintenance), and improvements in water quality.

    Read more:

    Tuesday, 20 January 2026

    Why the effects of a guaranteed income on income and employment in Texas and Illinois shouldn't surprise us

    The idea of a universal basic income (sometimes called an income guarantee) has gathered a lot of interest over recent years, particularly as fears of job losses to artificial intelligence have risen. The underlying idea is simple. Government makes a regular payment to all citizens (so it's universal) large enough to cover their basic needs (so it's a basic income). However, other than a number of pilot projects, no country has yet fully implemented a universal basic income (UBI), and many have apparently changed their minds after a pilot (see here and here). There are a couple of reasons for that. First, obviously, is the cost. A basic income of just $100 per week for all New Zealanders would cost about $26 billion per year. That would increase the government budget by about 14 percent [*]. And $100 is not a basic income, because no one is going to be able to live on such a paltry amount. Second, there are worries about the incentive effects of a universal basic income. When workers can receive money from the government for doing nothing (because it's universal), will they work less, offsetting some (if not all) of the additional income from the UBI?

    That brings me to this NBER working paper by Eva Vivalt (University of Toronto) and co-authors. The paper was originally published back in 2024, and received quite a bit of coverage then (for examples from the media, see here and here), but has been revised since (and I read the September 2025 revision). Vivalt et al. evaluate the impact of two large guaranteed income programmes in north central Texas (including Dallas) and northern Illinois (including Chicago), both of which were implemented by local non-profit organisations (with the programmes funded by OpenResearch, founded by OpenAI CEO Sam Altman). These are not quite UBIs of course, because they weren't available to everyone. Nevertheless, they do help us to understand the incentive effects that could apply to a UBI. Like many would hope a UBI would be (ignoring the immense fiscal cost), the programmes were quite generous (for those in the treatment group, at least) and:

    ...distributed $1,000 per month for three years to 1,000 low-income individuals randomized into the treatment group. 2,000 participants were randomly assigned to receive $50 per month as the control group.

    Vivalt et al. look at the impacts on employment and other related outcomes. There is a huge amount of detail in the paper, so I'm just going to look at some of the highlights. In terms of the overall effect, they find that:

    ...total individual income excluding the transfers fell by about $1,800 per year relative to the control group, with these effects growing over the course of the study.

    So, people receiving the UBI received less income (excluding the UBI - their income increased once you consider the UBI plus their other income). In terms of employment:

    The program caused a 3.9 percentage point reduction in the extensive margin of labor supply and a 1-2 hours/week reduction in labor hours for participants. The estimates of the effects of cash on income and labor hours represent an approximately 5-6% decline relative to the control group mean.

    People responded to receiving a UBI by working less, just as many of those who had concerns about the incentive effects of a UBI feared. However, the negative incentives also extended to others in the household:

    Interestingly, partners and other adults in the household seem to change their labor supply by about as much as participants. For every one dollar received, total household income excluding the transfers fell by around 29 cents, and total individual income fell by around 16 cents.

    So, although households received $1000 extra per month from the UBI, their income only increased by $710 on average, because the person receiving the UBI, and other adults in the household, worked less on average. What were they doing with their extra time? Vivalt et al. use American Time Use Survey data, and find that:

    Treated participants primarily use the time gained through working less to increase leisure, also increasing time spent on driving or other transportation and finances, though the effects are modest in magnitude. We can reject even small changes in several other specific categories of time use that could be important for gauging the policy effects of an unearned cash transfer, such as time spent on childcare, exercising, searching for a job, or time spent on self improvement.

    So, people spend more time on leisure. Do they upgrade to better jobs, which is what some people claim would happen (because the UBI would give people the freedom to spend more time searching for a better job match)? Or do they invest in more education, or start their own business? It appears not, as:

    ...we find no substantive changes in any dimension of quality of employment and can rule out even small improvements, rejecting improvements in the index of more than 0.022 standard deviations and increases in wages of more than 60 cents. We find that those in the treatment group have more interest in entrepreneurial activities and are willing to take more financial risks, but the coefficient on whether a participant started a business is close to 0 and not statistically significant. Using data from the National Student Clearinghouse on post-secondary education, we see no significant impacts overall but some suggestive evidence that younger individuals may pursue more education as a result of the transfers...

    Some people have concluded that the results show that a guaranteed income or UBI is a bad policy. However, the guaranteed income did increase incomes (including transfers) overall and therefore makes people on average better off financially. Leisure time is an important component of our wellbeing, so we shouldn't necessarily consider more leisure time a bad outcome for a policy. In fact, Vivalt et al. also find that on average the guaranteed income increases subjective wellbeing on average (but only in the first year, after which subjective wellbeing returns to baseline). 

    The results should have surprised anyone. They are consistent with a simple model of the labour-leisure tradeoff that I cover in my ECONS101 class. The model (of the worker's decision) is outlined in the diagram below. The worker's decision is constrained by the amount of discretionary time available to them. Let's call this their time endowment, E. If they spent every hour of discretionary time on leisure, they would have E hours of leisure, but zero income. That is one end point of the worker's budget constraint, on the x-axis. The x-axis measures leisure time from left to right, but that means that it also measures work time (from right to left, because each one hour less leisure means one hour more of work). The difference between E and the number of leisure hours is the number of work hours. Next, if the worker spent every hour working, they would have zero leisure, but would have an income equal to W0*E (the wage, W0, multiplied by the whole time endowment, E). That is the other end point of the worker's budget constraint, on the y-axis. The worker's budget constraint joins up those two points, and has a slope that is equal to the wage (more correctly, it is equal to -W0, and it is negative because the budget constraint is downward sloping). The slope of the budget constraint represents the opportunity cost of leisure. Every hour the worker spends on leisure, they give up the wage of W0. Now, we represent the worker's preferences over leisure and consumption by indifference curves. The worker is trying to maximise their utility, which means that they are trying to get to the highest possible indifference curve that they can, while remaining within their budget constraint. The highest indifference curve they can reach on our diagram is I0. The worker's optimum is the bundle of leisure and consumption where their highest indifference curve meets the budget constraint. This is the bundle A, which contains leisure of L0 (and work hours equal to [E-L0]), and consumption of C0.

    Now, consider what happens when the worker receives a UBI. This is shown in the diagram below. At each level of leisure (and work), their income (and therefore consumption) is higher. That shifts the budget constraint up vertically by the amount of the UBI. If the worker spends no time at all working, they now have consumption of U, instead of zero, and if they spend all of their time working (and have no leisure) their consumption would be W0*E+U. The worker can now reach a higher indifference curve (I1). Their new optimal bundle of leisure and consumption is B, which contains leisure of L1 (and work hours equal to [E-L1]), and consumption of C1. Notice that the worker now consumes more leisure and more consumption as well. Because leisure has increased, that means that the number of work hours has decreased. The increase in leisure, decrease in work hours, and increase in income overall (when the UBI is included), are consistent with what Vivalt et al. found.

    So, based on a simple model of the labour-leisure tradeoff, the results of this guaranteed income programme are not surprising. We should have expected a reduction in work, and a reduction in labour income, and that's what Vivalt et al. found. The question policymakers are left with is whether a large income transfer like this is worth it for government, if each $1000 transferred increases incomes by just $710 on average.

    [HT: Marginal Revolution, back in 2024]

    *****

    [*] Of course, if other welfare payments were scrapped in favour of a universal basic income, then the net cost would be lower. Nevertheless, the point that the cost is very high still stands.