Friday, 30 January 2026

This week in research #111

Here's what caught my eye in research over the past week:

  • Hu and Su find that housing wealth appreciation significantly improves individual happiness in China
  • Díez-Rituerto et al. (with ungated earlier version here) study gender differences in willingness to guess in multiple-choice questions in a medical internship exam in Spain, and find that, in line with past research, women answer fewer questions than men, but that reducing the number of alternative answers reduces the difference between men and women among those who answer most of the questions
  • Chen, Fang, and Wang (with ungated earlier version here) find that holding a deanship in China increases patent applications by 15.2 percent, and that deans' misuse of power distorts resource allocation

Thursday, 29 January 2026

European monarchs' cognitive ability and state performance

How important is the quality of a CEO to a company's performance over time? How important is the quality of a leader to a country's performance over time? These questions seem quite straightforward to answer, but in reality they are quite tricky. First, it is difficult to measure the 'quality' of a CEO or a leader. Second, the appointment of a CEO or a leader is not a random event - typically it is the result of a deliberative process, and may depend on the company's or country's past or expected future performance.

What is needed is some CEOs or leaders who differ in 'quality' and who are randomly appointed to the role. This sort of experiment is, of course, not available in the real world. However, a 2025 article by Sebastian Ottinger (SERGE-EI) and Nico Voigtländer (UCLA), published in the journal Econometrica (open access), examines a setting that mimics the ideal experiment in many respects. Ottinger and Voigtländer look at 399 European monarchs from 13 states over the period 1000-1800 CE. To address the two concerns above (measurement of quality and non-random appointment), they:

...exploit two salient features of ruling dynasties: first, hereditary succession—the predetermined appointment of offspring of the prior ruler, independent of their ability; second, variation in ruler ability due to the widespread inbreeding of dynasties.

Ottinger and Voigtländer measure the 'quality' of a ruling monarch using the work of Frederick Adams Woods, who:

...coded rulers’ cognitive capability based on reference works and state-specific historical accounts.

Ottinger and Voigtländer measure the outcome variable, state performance, as a subjective measure from the work of Woods, as well as the change in land area during each monarch's reign, and the change in urban population during each monarch's reign. They then use a measure of the 'coefficient of inbreeding' for each ruler as an instrument for cognitive ability. This is important, because the instrumental variables (IV) approach they employ reduces the impact of any measurement error in cognitive ability, as well as dealing with the endogenous selection of rulers. However, as always with the IV approach, the key identifying assumption is that inbreeding affects the outcome (state performance) only through its effect on ruler cognitive ability (not, say, through the instability of succession). Ottinger and Voigtländer provide a detailed discussion in favour of the validity of the instrument, and support this by showing that the results hold when they instead use 'hidden inbreeding' (inbreeding that is less direct than, say, parents being first cousins or an uncle and niece) as an instrument.

Now, in their main instrumental variables analysis, they find:

...a sizeable effect of (instrumented) ruler ability on all three dimensions of state performance. A one-std increase in ruler ability leads to a 0.8 std higher broad State Performance, to an expansion in territory by 16%, and to an increase in urban population by 14%.

Ottinger and Voigtländer also explore the mechanisms explaining this effect, finding that:

...less inbred, capable rulers tended to improve their states’ finances, commerce, law and order, and general living conditions. They also reduced involvement in international wars, but when they did, won a larger proportion of battles, leading to an expansion of their territory into urbanized areas. This suggests that capable rulers chose conflicts “wisely,” resulting in expansions into valuable, densely populated territories.

Finally, Ottinger and Voigtländer looked at whether a country's institutions mattered for the effect of the ruler's cognitive ability on state performance. They measure how constrained a ruler was, such as by the power of parliament, and using this measure in their analysis they find that:

...inbreeding and ability of unconstrained leaders had a strong effect on state borders and urban population in their reign, while the of constrained rulers (those who faced “substantial limitations on their authority”) made almost no difference.

That result is further support that the cognitive ability of rulers mattered precisely in those situations where a ruler might be expected to have an effect - that is, when they are unconstrained by political institutions. When the ruler is constrained by parliament or other political institutions, their cognitive ability will likely have much less effect on state performance, and that is what Ottinger and Voigtländer found.

One surprising finding from the paper appears in the supplementary materials, where Ottinger and Voigtländer report that the marginal effect of cognitive ability on state performance doesn't vary by gender. That surprises me a little given that earlier research by Dube and Harish (which Ottinger and Voigtländer cite in a footnote) found that queens were more likely to engage in wars than kings (see here). Now, this paper shows that more able rulers fight fewer wars. So, I would have expected that queens, having fought more wars, would show a different relationship between cognitive ability and state performance, but that didn't prove to be the case. Perhaps that tells us that, while queens may have fought more wars, they made better choices about which wars to fight? Or perhaps, they fought more wars but that only affected the level of wars, and not the interaction between cognitive ability and wars (or cognitive ability and state performance)?

Regardless, overall these results tell us that the 'quality' of a leader really does matter. A higher quality ruler, in terms of cognitive ability, improves state performance. Extending from those results, we might expect that a higher quality CEO also improves company performance. Of course, CEO selection isn’t hereditary and differs in important ways, but the broader lesson that leader quality can matter a lot when leaders have discretion likely holds in that setting as well.

[HT: Marginal Revolution, early last year]

Read more:

Monday, 26 January 2026

Roman rule, and personality traits and subjective wellbeing in modern Germany

History has a long tail. Events in the distant past can have surprising effects today. For instance, past research I have blogged on has shown that autocratic rule in Qing dynasty China affects social capital today (see here), the Spanish Inquisition affects GDP in Spanish municipalities (see here), and Roman roads affect the modern location and density of roads in Europe (see here). In that vein, this recent article by Martin Obschonka (University of Amsterdam) and co-authors, published in the journal Current Research in Ecological and Social Psychology (open access), looks at the effect of Roman rule on modern incidence of personality traits and subjective wellbeing in Germany. To do this, Obschonka et al. compare people on either side of the Limes Wall, noting that:

To protect their territory with its cultural and economic advancements, the Romans built the Limes wall around 150 AD and it served as a border of the empire for more than a century. The Limes consists of three major rivers, namely the Rhine, the Danube, and the Main ("Main Limes"), as well as a physical wall ... It is well-documented that the Limes constituted a physical, economic, and cultural border between the Roman and Germanic cultures...

By comparing people on either side of the Limes Wall, Obschonka et al. try to reveal the enduring impact of Roman rule. They expect this effect on personality traits and subjective wellbeing because:

...the Roman society was much wealthier and considerably more structured and organized than the “barbaric” Germanic tribes, with an effective public administration and a relatively well-elaborated legal system... When the Romans occupied parts of the territories inhabited by Germanic tribes, they imported superior scientific knowledge and a civic structure.

To measure personality traits, Obschonka et al. turn to the German dataset from the Gosling-Potter Internet Personality Project, the largest dataset on the 'Big Five' personality traits. The German sample they use includes over 73,000 observations between 2003 and 2015, which they aggregate to regional-level averages. For subjective wellbeing (life satisfaction), they use data from the German Socioeconomic Panel between 1984 and 2016, again aggregated to regional-level averages. They also look at life expectancy. Using a simple OLS regression model, with a 'treatment variable' indicating that a region was in the Roman occupied area, Obschonka et al. find that:

...the populations in those regions that were occupied by the Romans nearly 2000 years ago show significantly higher levels of extraversion, agreeableness, and openness, and significantly lower levels of neuroticism (which points to more adaptive personality patterns in the former Roman regions of present-day Germany) than do the populations living in the non-occupied regions... Moreover, populations living in the formerly Roman areas today report greater satisfaction with life and health, and also have longer life expectancies...

After including a range of control variables into their models, the effects on agreeableness and openness became statistically insignificant. However, that leaves significant effects of Roman rule on extraversion and neuroticism, as well as life satisfaction and life expectancy. The results are similar when they use a spatial regression discontinuity design (RDD) instead of OLS. The spatial RDD takes account of how far away an observation is from the Limes Wall, which separates the 'treated' and 'control' regions (and regions closer to the line provide more information about the distinctive effect of the treatment, in this case Roman rule). The method assumes that places on either side of the border are similar except for the Roman occupation. This seems plausible, so the spatial RDD results in particular make the results more believable.

Obschonka et al. then turn to looking at the mechanisms that might explain the enduring effect of Roman rule. They show that:

Density of road infrastructure built by the Romans shows a statistically significant, positive effect on life and health satisfaction, as well as on life expectancy. There is a negative, statistically significant relationship with neuroticism, a positive one with extraversion, and a non-significant one with agreeableness, conscientiousness, and openness...

Running the models with the number of Roman markets and mines as the independent variable reveals a negative effect on neuroticism and a positive effect on extraversion. In addition, there is also a positive effect on conscientiousness (and openness). None of the effects on psychological well-being or health were statistically significant. Including Roman road density and the number of Roman markets and mines in the same model... clearly indicates that markets and mines are more strongly related to the personality traits, whereas Roman road density is more closely related to the health and well-being outcomes.

These results should be seen as more exploratory, but Obschonka et al. interpret them as showing:

...support for the notion that the tangible and lasting economic infrastructure built and established by the Romans left a long-term macro-psychological legacy...

Perhaps. I find it less plausible that Roman physical infrastructure had a lasting effect on modern personality traits and subjective wellbeing, and more likely that Roman worldviews and 'social infrastructure' (things like institutions or social norms, for example) was passed down from one generation to the next, showing up as a lasting effect on personality and wellbeing. Unfortunately, Obschonka et al. aren't able to tease out those sorts of mechanisms. Either way, it’s another reminder that borders drawn 2000 years ago can still show up in the data, even in places we might not think to look.

[HT: Marginal Revolution, early last year]

Sunday, 25 January 2026

The Census Tree project

An exciting (and new-ish) dataset offers us an unprecedented opportunity to explore research questions using historical US Census data. When I posted about what's new in regional and urban economics last year, one of the things that was raised was the linking of historical Census records over time. That was based on the work of Abramitzky et al., known as the Census Linking Project (CLP). However, in a recent article published in the journal Explorations in Economic History (open access), Kasey Buckles (University of Notre Dame) and co-authors report on an alternative Census linking dataset that has far larger coverage than the CLP. As they explain:

In the Census Tree project, we use information provided by members of the largest genealogy research community in the world to create hundreds of millions of new links among the historical U.S. Censuses (1850–1940). The users of the platform link data sources—including decennial census records—to the profiles of deceased people as part of their own family history research. In doing so, they rely on private information like maiden names, family members’ names, and geographic moves to make links that a researcher would never be able to make using the observable information...

The result is the publicly-available Census Tree dataset, which contains over 700 million links among the 1850–1940 censuses...

The article describes the creation of the Census Tree dataset, which can be accessed for free online. Buckles et al. also demonstrate the use of the dataset, in a particular application in comparison with the CLP data of Abramitzky et al.:

...who show that the children of immigrants were more upwardly mobile on average than the children of the U.S.-born in the late 19th and early 20th centuries. We replicate this result using the Census Tree, and are able to increase the precision of estimates for each sending country. Furthermore, the Census Tree includes sufficient numbers of links to produce estimates for an additional ten countries, including countries from Central America and the Caribbean. We find that the sons of low-income immigrants from Mexico had significantly worse outcomes on average than sons of fathers from other countries, including U.S.-born Whites. We further extend [Abramitzky et al.] by analyzing the mobility of women in a historical sample, and compare these results to historical estimates for men and modern estimates for women. While the patterns for daughters and sons are broadly similar, differences in marriage patterns contribute to gender gaps in mobility in some countries.

As I noted in this post last year, the ability to link people over long periods of time (including between generations) has opened up a wealth of new research questions. Buckles et al. offers a peek at the range of research that has already been done using the Census Tree dataset (see Appendix B in the paper for a bibliography).

Now, the coverage isn't perfect, and there is still some ways to go. You can evaluate the quality of the dataset based on what Buckles et al. report in their article, but it is clearly better than previous efforts. And importantly:

...we plan to update the Census Tree every two-to-three years to incorporate new information added by FamilySearch users, to include new links... and to implement methodological advances in linking methods that we and others develop.

This seems like a really important resources for researchers in economics, sociology, regional science, and other fields, and not just for those interested in economic history. 

Saturday, 24 January 2026

The long persistence of retracted 'zombie' papers

When a paper is retracted by a journal, that understandably tends to negatively impact perceptions of the researcher and the quality of their research (see here). However, these 'zombie' papers can maintain an undead existence for some time, continuing to be cited and used, sometimes uncritically, because retractions take time and because publishers are not good at highlighting when an article has been retracted. They may even continue to accrue further citations even after being retracted. In terms of understanding the effect of retractions on the research system, a key question is: how long does it take for a paper to be retracted?

That is essentially the question that this new article by Marc Joëts (University of Lille) and Valérie Mignon (University of Paris Nanterre), published in the journal Research Policy (open access), addresses. Joëts and Mignon draw on a sample of 25,480 retracted research articles over the period from 1923-2023 (taken from the Retraction Watch database), and look at the factors associated with the time to retraction (that is, the time between first publication and when the article is retracted). First, they find that:

...the average time to retraction is approximately 1045 days (nearly 3 years), but there is significant variability, with a standard deviation of 1225 days... However, some extreme cases take much longer, with the longest retraction occurring 81 years after publication.

Joëts and Mignon use several different forms of survival model to evaluate the relationship between the characteristics of an article and the time to retraction. In this analysis, they find that:

Papers in biomedical and life sciences are generally retracted faster than those in social sciences and humanities, and articles published by predatory publishers are withdrawn more promptly than those from reputable journals. Collaboration intensity and type of misconduct also emerge as significant predictors of retraction delays.

The result for predatory journals seems somewhat surprising. However, Joëts and Mignon suggest that:

...predatory journals often publish papers with evident deficiencies that are more easily detectable by external parties, such as watchdog organizations or institutions, leading to quicker retractions when misconduct is identified. Additionally, the lack of formal editorial procedures in predatory journals may result in a less structured and faster retraction process...

Of course, a faster time to retraction doesn't make predatory journals good. It simply makes them less bad, since they almost certainly are a large source of low-quality research that deserves retraction (Joëts and Mignon don't report the proportion of retractions that come from predatory journals).

In terms of collaboration intensity, articles with more co-authors take longer to retract, presumably because more people are involved in the retraction process, or because disputes over who is to blame may take some time to resolve. For types of misconduct, retractions due to 'data issues' take the longest to occur, while those for 'peer review errors' and 'referencing problems' take the least. That likely reflects that it takes some time for data analyses to be replicated and for problems to surface, whereas problems with referencing are more likely to be readily apparent from a simple reading of the article.

Joëts and Mignon also do a lot of modelling of different editorial policy changes and their effects on the distribution of times to retraction, but I don't think we can read too much into that part of the article, as the results are mostly driven by the assumptions on how the policies affect retractions. Nevertheless, this paper provides some insight into why zombie papers can keep shambling through the literature: retractions are slow and the time to retraction depends on discipline, publisher type, collaboration, and the kind of misconduct involved.

Read more:

Friday, 23 January 2026

This week in research #110

Here's what caught my eye in research over the past week (a quiet one, after a bumper week last week):

  • Khan, Önder, and Ozcan (open access) use the UK’s transition from the Research Assessment Exercise to the Research Excellence Framework in 2009 as a natural experiment, and find that performance-based funding increased female participation in collaborative research by 10.3 percentage points, and that increased female participation coincided with higher research impact, with treated papers receiving 4.79 more citations on average

Thursday, 22 January 2026

What Hamilton and Waikato can learn from France about the consequences of inter-municipal water supply

Hamilton City and Waikato District are transitioning their water services (drinking water, wastewater, stormwater) to a new, jointly owned Council Controlled Organisation (CCO) called IAWAI - Flowing Waters. IAWAI will deliver water services across all of Hamilton City and Waikato District, and is a response to the central government's 'Local Water Done Well' plan "to address New Zealand’s long-standing water infrastructure challenges". What are likely to be some of the consequences of the merging of water services across Hamilton and Waikato?

Interestingly, this recent article by Mehdi Guelmamen, Serge Garcia (both University of Lorraine), and Alexandre Mayol (University of Lille), published in the journal International Review of Law and Economics (open access), may provide us with some idea. They look at inter-municipal cooperation in the provision of drinking water in France. France provides an interesting case study because:

With roughly 12,000 water services—90 % serving populations under 10,000—and over 70 % managed by individual municipalities acting independently, there is substantial heterogeneity in governance arrangements.

That is similar in spirit to our situation. Although Hamilton City (population around 192,000) and Waikato District (population around 86,000) are substantially larger than the municipalities in Guelmamen et al.'s sample, Waikato District is made up of many communities with their own separate water infrastructure (Huntly, Ngāruawāhia, Raglan, Pōkeno, Te Kauwhata, and others). Those many communities, aggregated into a single water entity, mimics the French context. 

Guelmamen et al. investigate the determinants of inter-municipal cooperation (IMC) in drinking water supply, as well as how IMC affects pricing of drinking water, water quality, and scarcity of water, using data from 10,000 water services operations over the period from 2008 to 2021. Their analysis involves a two-step approach, where they first look at the associations with pricing, and then look at how the services are organised, conditional on the prices. They find that public water services are more likely to cooperate than privatised services, but of more interest to me, they also found that:

First, IMC does not necessarily lead to lower water prices; on the contrary, water prices are often higher under IMC, reflecting additional transaction costs and the financing of investments enabled or encouraged by cooperative arrangements... Third, while IMC generally improves network performance—as evidenced by lower loss rates—the quality improvements are more pronounced in some institutional forms (e.g., communities rather than syndicates).

That first finding arises in spite of an expectation of economies of scale from larger water services operations. Guelmamen et al. explain this as follows:

First, cooperation often involves additional administrative costs due to the need for inter-municipal coordination, governance structures and compliance with multi-party agreements. Second, the larger scale management facilitated by IMC may lead to increased investment in infrastructure, which, while beneficial in the long run, increases short-term costs that are passed on to consumers...

So, even though there may be economies of scale in terms of water provision, these were more than offset by coordination and governance costs, and investment in higher quality water services. In their estimates, this showed up in a combination of three effects. First, there was a negative (and convex) relationship between network size and price (representing economies of scale, as bigger networks have lower average costs, but the cost savings from bigger networks get smaller as network size gets bigger). Second, there was a negative (but concave) relationship between the number of municipalities in the IMC and price (again representing economics of scale, but in this case they become less negative as more municipalities are included). Third, there was a positive relationship between population size and price. The combination of those three effects is that larger IMCs, particularly those that involve more municipalities, have higher, rather than lower prices.

The greater investment in higher quality water services is supported by their third finding above, which shows that IMCs have better network performance (less water is lost). IMCs also had higher quality water, measured as fewer breaches of microbiological and physico-chemical water standards).

What does this tell us for Hamilton and Waikato? Obviously, the context is different, but many of the elements (such as combining multiple municipal water services suppliers into one, and potential economics of scale) are the same. Moreover, Waikato District already has many small water services combined into a single entity, which is not dissimilar to the situation in France. So, if we take these French results at face value, then the risk is that the price of water will go up. Hamilton and Waikato don't currently have water meters, so the unit price of water will remain zero (which in itself may be a problem, because it incentives overuse of water). Instead, water is charged as a fixed charge in annual property rates. The higher price of water will need to be covered by a higher annual fixed charge within the rates bills in Hamilton and Waikato. On the other hand, the quality of drinking water may increase, and drinking water provision may be more sustainable due to higher investment spending. And, of course, a more sustainable provision of water services is what the central government's plan was intended to achieve.

How will we know if the creation of IAWAI is a good thing? Earlier indicators will be decreases in total administration and overhead costs, increases in capital expenditure (both for new construction and for maintenance), and improvements in water quality.

Read more:

Tuesday, 20 January 2026

Why the effects of a guaranteed income on income and employment in Texas and Illinois shouldn't surprise us

The idea of a universal basic income (sometimes called an income guarantee) has gathered a lot of interest over recent years, particularly as fears of job losses to artificial intelligence have risen. The underlying idea is simple. Government makes a regular payment to all citizens (so it's universal) large enough to cover their basic needs (so it's a basic income). However, other than a number of pilot projects, no country has yet fully implemented a universal basic income (UBI), and many have apparently changed their minds after a pilot (see here and here). There are a couple of reasons for that. First, obviously, is the cost. A basic income of just $100 per week for all New Zealanders would cost about $26 billion per year. That would increase the government budget by about 14 percent [*]. And $100 is not a basic income, because no one is going to be able to live on such a paltry amount. Second, there are worries about the incentive effects of a universal basic income. When workers can receive money from the government for doing nothing (because it's universal), will they work less, offsetting some (if not all) of the additional income from the UBI?

That brings me to this NBER working paper by Eva Vivalt (University of Toronto) and co-authors. The paper was originally published back in 2024, and received quite a bit of coverage then (for examples from the media, see here and here), but has been revised since (and I read the September 2025 revision). Vivalt et al. evaluate the impact of two large guaranteed income programmes in north central Texas (including Dallas) and northern Illinois (including Chicago), both of which were implemented by local non-profit organisations (with the programmes funded by OpenResearch, founded by OpenAI CEO Sam Altman). These are not quite UBIs of course, because they weren't available to everyone. Nevertheless, they do help us to understand the incentive effects that could apply to a UBI. Like many would hope a UBI would be (ignoring the immense fiscal cost), the programmes were quite generous (for those in the treatment group, at least) and:

...distributed $1,000 per month for three years to 1,000 low-income individuals randomized into the treatment group. 2,000 participants were randomly assigned to receive $50 per month as the control group.

Vivalt et al. look at the impacts on employment and other related outcomes. There is a huge amount of detail in the paper, so I'm just going to look at some of the highlights. In terms of the overall effect, they find that:

...total individual income excluding the transfers fell by about $1,800 per year relative to the control group, with these effects growing over the course of the study.

So, people receiving the UBI received less income (excluding the UBI - their income increased once you consider the UBI plus their other income). In terms of employment:

The program caused a 3.9 percentage point reduction in the extensive margin of labor supply and a 1-2 hours/week reduction in labor hours for participants. The estimates of the effects of cash on income and labor hours represent an approximately 5-6% decline relative to the control group mean.

People responded to receiving a UBI by working less, just as many of those who had concerns about the incentive effects of a UBI feared. However, the negative incentives also extended to others in the household:

Interestingly, partners and other adults in the household seem to change their labor supply by about as much as participants. For every one dollar received, total household income excluding the transfers fell by around 29 cents, and total individual income fell by around 16 cents.

So, although households received $1000 extra per month from the UBI, their income only increased by $710 on average, because the person receiving the UBI, and other adults in the household, worked less on average. What were they doing with their extra time? Vivalt et al. use American Time Use Survey data, and find that:

Treated participants primarily use the time gained through working less to increase leisure, also increasing time spent on driving or other transportation and finances, though the effects are modest in magnitude. We can reject even small changes in several other specific categories of time use that could be important for gauging the policy effects of an unearned cash transfer, such as time spent on childcare, exercising, searching for a job, or time spent on self improvement.

So, people spend more time on leisure. Do they upgrade to better jobs, which is what some people claim would happen (because the UBI would give people the freedom to spend more time searching for a better job match)? Or do they invest in more education, or start their own business? It appears not, as:

...we find no substantive changes in any dimension of quality of employment and can rule out even small improvements, rejecting improvements in the index of more than 0.022 standard deviations and increases in wages of more than 60 cents. We find that those in the treatment group have more interest in entrepreneurial activities and are willing to take more financial risks, but the coefficient on whether a participant started a business is close to 0 and not statistically significant. Using data from the National Student Clearinghouse on post-secondary education, we see no significant impacts overall but some suggestive evidence that younger individuals may pursue more education as a result of the transfers...

Some people have concluded that the results show that a guaranteed income or UBI is a bad policy. However, the guaranteed income did increase incomes (including transfers) overall and therefore makes people on average better off financially. Leisure time is an important component of our wellbeing, so we shouldn't necessarily consider more leisure time a bad outcome for a policy. In fact, Vivalt et al. also find that on average the guaranteed income increases subjective wellbeing on average (but only in the first year, after which subjective wellbeing returns to baseline). 

The results should have surprised anyone. They are consistent with a simple model of the labour-leisure tradeoff that I cover in my ECONS101 class. The model (of the worker's decision) is outlined in the diagram below. The worker's decision is constrained by the amount of discretionary time available to them. Let's call this their time endowment, E. If they spent every hour of discretionary time on leisure, they would have E hours of leisure, but zero income. That is one end point of the worker's budget constraint, on the x-axis. The x-axis measures leisure time from left to right, but that means that it also measures work time (from right to left, because each one hour less leisure means one hour more of work). The difference between E and the number of leisure hours is the number of work hours. Next, if the worker spent every hour working, they would have zero leisure, but would have an income equal to W0*E (the wage, W0, multiplied by the whole time endowment, E). That is the other end point of the worker's budget constraint, on the y-axis. The worker's budget constraint joins up those two points, and has a slope that is equal to the wage (more correctly, it is equal to -W0, and it is negative because the budget constraint is downward sloping). The slope of the budget constraint represents the opportunity cost of leisure. Every hour the worker spends on leisure, they give up the wage of W0. Now, we represent the worker's preferences over leisure and consumption by indifference curves. The worker is trying to maximise their utility, which means that they are trying to get to the highest possible indifference curve that they can, while remaining within their budget constraint. The highest indifference curve they can reach on our diagram is I0. The worker's optimum is the bundle of leisure and consumption where their highest indifference curve meets the budget constraint. This is the bundle A, which contains leisure of L0 (and work hours equal to [E-L0]), and consumption of C0.

Now, consider what happens when the worker receives a UBI. This is shown in the diagram below. At each level of leisure (and work), their income (and therefore consumption) is higher. That shifts the budget constraint up vertically by the amount of the UBI. If the worker spends no time at all working, they now have consumption of U, instead of zero, and if they spend all of their time working (and have no leisure) their consumption would be W0*E+U. The worker can now reach a higher indifference curve (I1). Their new optimal bundle of leisure and consumption is B, which contains leisure of L1 (and work hours equal to [E-L1]), and consumption of C1. Notice that the worker now consumes more leisure and more consumption as well. Because leisure has increased, that means that the number of work hours has decreased. The increase in leisure, decrease in work hours, and increase in income overall (when the UBI is included), are consistent with what Vivalt et al. found.

So, based on a simple model of the labour-leisure tradeoff, the results of this guaranteed income programme are not surprising. We should have expected a reduction in work, and a reduction in labour income, and that's what Vivalt et al. found. The question policymakers are left with is whether a large income transfer like this is worth it for government, if each $1000 transferred increases incomes by just $710 on average.

[HT: Marginal Revolution, back in 2024]

*****

[*] Of course, if other welfare payments were scrapped in favour of a universal basic income, then the net cost would be lower. Nevertheless, the point that the cost is very high still stands.

Monday, 19 January 2026

Immigration and the wages of the native-born population

Restrictions on immigration flows are getting a lot of policy attention of late. The argument is that immigration reduces wages for the native-born population. But, is there evidence for that? As you might expect, there are literally dozens of studies that have looked into this question, and there are now several meta-analyses that combine the results across many studies (including the meta-analysis that I referred to in this 2016 post. That post referred to this 2005 article by Longhi et al., which found that:

Overall, the effect is very small. A 1 percentage point increase in the proportion of immigrants in the labour force lowers wages across the investigated studies by only 0.119%.

Longhi et al. then followed up with another article in 2010, which also found a very small effect of immigration on wages, specifically:

...a 1% point increase in the immigration to population ratio reduces wages by only 0.03%.

A new meta-analysis article by Amandine Aubry (Université de Caen Normandie) and co-authors, published in the journal Labour Economics (open access), picks up those two earlier meta-analyses, and extends the analysis up to 2023. Specifically, their analysis includes:

...88 studies published between 1985 and 2023, encompassing 2,989 reduced-form estimates of the wage effects of immigration.

Many post-2010 studies use shift-share (Bartik) instruments to estimate the causal effect of immigration on wages. These instruments predict regional immigrant inflows by interacting a region’s pre-existing settlement shares by origin with national inflows from those origins. They then use the predicted inflows as an instrument for actual inflows in an instrumental variables framework. This approach helps address the concern that immigrants may sort into destinations with stronger labour markets, which would make immigration and wages correlated for reasons other than a causal effect of immigration on wages.

Now, Aubry et al. are more concerned with investigating the heterogeneity in the estimated effects of immigration on wages, rather than the overall estimate. Nevertheless, I think the overall estimate is interesting and important, and for that they find:

...a 1% rise in the immigrant labour force reduces native wages by about 0.033% on average.

This overall effect is very similar to that from the second meta-analysis by Longhi et al. But it's tiny - a 1 percent larger immigrant labour force would reduce the wages of a native-born worker earning $1000 per week by about 33 cents. And, there is substantial variation around that small overall estimate, which Aubry et al. investigate in some detail. They find that:

...contextual heterogeneity explains part of the variance in the estimates. Estimates for Anglo-Saxon and developing countries are systematically larger than those for other economies, and the historical period covered by a study also affects the results, with later periods being associated with smaller effects. Third, methodological heterogeneity is key... In particular, instrumental variable estimations, which are commonly used to infer causality, yield smaller coefficients than OLS...

More recent studies tend to estimate smaller effects of immigration on wages, as do studies that employ instrumental variables (which also tend to be more recent studies). That accords with the results from the two Longhi et al. meta-analyses, where the second study found a much smaller overall effect than the first study. The shift-share instrument only became established as a method by David Card and others in the early 2000s, so its use only began diffusing from then. Given that these sorts of analyses have become the industry standard now, we can generally expect future studies to find smaller effects than older studies.

The results for developing countries, where the effect of immigration on wages is more positive than for developed countries deserves more exploration. Aubry et al.'s sample includes estimates from only a handful of developing countries (Colombia, Costa Rica, Malaysia, Peru, South Africa, and Thailand). This also suggests that more studies on the effect of immigration on wages in developing country contexts would be useful.

The overall takeaway from this meta-analysis is that immigration on average has a negligible overall effect on the wages of the native-born population on average. Unfortunately, this is one of those cases where the empirical results do not accord with 'folk economics'. Although the average effect is negligible, the wages of some subgroups may be negatively impacted by immigration in some contexts (and Aubry et al.'s results are consistent with the idea that the impacts are negative in some contexts or for some groups). The general public (and policy makers) will tend to focus on those negative impacts. Nevertheless, it should be possible in principle to address those negative impacts through policy (economists refer to this as the compensation principle), so that those who benefit from immigration (including immigrants themselves) can continue to do so.

Read more:

Sunday, 18 January 2026

The impact of British austerity on mortality and life expectancy

In 2010, the British government adopted a contractionary fiscal policy (austerity) to try and reduce government debt, which had built up during the Global Financial Crisis. Education and social security (social welfare) bore the brunt of the reductions in government spending, but other areas of spending, such as health, were not immune to the cuts (although health spending did not reduce, the increase in spending from year to year reduced substantially). However, austerity is not a free lunch. What were some of the consequences of the reduction in spending?

That is the question that this discussion paper by Yonatan Berman and Tora Hovland (both King’s College London) takes up, focusing on the impacts on mortality and life expectancy. Berman and Hovland note that the reductions in welfare and health spending did not affect all parts of the country equally. They use the differential impacts between different local authorities (or regions, in some analyses) to evaluate the impact of the austerity measures, in a difference-in-differences research design. That essentially involves comparing areas that were more impacted by austerity to those that were less impacted, between the time before and the time after austerity was introduced in 2010. Berman and Hovland measure exposure to austerity by the reduction in welfare (or health) spending per capita at the local authority level (or region). Their data covers the period from 2002 to 2019 in annual time steps. In addition to a pooled difference-in-differences analysis (which estimates one overall impact of austerity), they also conduct an event study, which estimates the impact of austerity over time. The event study analysis is the more interesting, so that's what I will focus on. The key results for reductions in welfare spending are summarised in Figure 5 in the paper:

The y-axis on the figure shows the coefficient (how much life expectancy changes for a  £100 per capita per year reduction in spending, relative to the pre-austerity baseline). The red vertical line shows the point in time where austerity began (in mid-2010). Notice that there is a clear reduction in life expectancy for both males and females, starting from about 2013, and increasing over time. Berman and Hovland note that:

...after the onset of austerity measures, we observe a clear reduction in life expectancy, with a more pronounced effect among females. The results indicate that every £100 per capita per year of lost benefits led to a decrease in life expectancy of approximately 0.5–2.5 months.

The results are qualitatively similar for health spending, as shown in Figure 6 of the paper:

Again, the negative impact on life expectancy is noticeable from 2013, and increases over time. Also, notice that the effect from health spending is much larger in magnitude for each £100 per capita per year reduction in spending. This is not surprising, given that health spending has a more direct impact on health, mortality, and longevity. However, the overall impact of austerity also depends on the amount of spending that was cut, which was much larger for welfare than for health. 

Now, it would have been good for Berman and Hovland to explore a little further why the impact of austerity on life expectancy was delayed by two or three years. The delay might raise concerns about whether there were other things that changed between 2010 and 2013 that affected mortality and life expectancy differentially by exposure to austerity. Having said that, we might expect cuts to spending to take some time to filter through into worse health outcomes, and that is also consistent with the increasing magnitude of the impact over time shown in Figures 5 and 6.

Combining the two effects (of welfare spending and health spending), and conducting some back-of-the-envelope calculations, Berman and Hovland find that:

Between 2010 and 2019, austerity measures caused a three-year setback in life expectancy progress, equivalent to about 190,000 excess deaths, or 3 percent of all deaths.

The costs of austerity were quite substantial! However, were there offsetting benefits? Berman and Hovland conduct a Marginal Value of Public Funds (MVPF) analysis, which essentially weighs up the costs and benefits of austerity (in this context, it is basically a cost-benefit analysis for austerity). In this analysis, they find that (when combining both welfare and health effects), the total costs (in terms of the value of life years lost) was £89.6 billion, while the savings on government spending were £38.75 billion. So, every pound of government spending saved had a cost to society of £2.31. On a cost-benefit basis, austerity was not a good deal for society. Moreover, the distributional impacts were important, because:

...poorer local authorities saw smaller increases in life expectancy between 2010 and 2019, or even decreases, compared to richer local authorities (defined by average pay in 2010). These results indicate that austerity measures were not only regressive in their impact on post-tax and transfer income, but they also led to more unequal health outcomes.

If governments are looking to implement policy, ideally those policies shouldn't make society worse off. That should go without saying. Based on this paper, British austerity appears to have made British people significantly worse off, trading lower government spending for higher mortality and lower life expectancy. Berman and Hovland stop short of saying that this was bad policy, instead concluding that:

Paradoxically, this fiscal strategy appears to have contributed to an increase in mortality, potentially offsetting its financial gains. However, it is possible that without austerity, the economic recession in the early 2010s might have been more severe.

It may be the case that the recession would have been worse without austerity, but that is not a certainty. However, given the choice up front, would people living in Britain have preferred a longer recession with fewer deaths, or a shorter recession with more deaths? If austerity really did reduce the length of the recession, the implied tradeoff here is quite stark, and Berman and Hovland's analysis suggests that a longer recession may have been the preferable option.

[HT: Les Oxley]

Friday, 16 January 2026

This week in research #109

Here's what caught my eye in research over the past week (a busy one, after a few quiet weeks):

  • Wang et al. find that a 10 percent increase in housing prices is associated with an average 3.85 percent rise in the probability of smoking, an increase of 0.73 cigarettes smoked per day, and a 3.9 percent increase in the likelihood of frequent drinking in China
  • Agnew, Roger, and Roger find that cognitive reflection, fluid intelligence, and approximate numeracy, account for nearly half of the variance in financial literacy scores and help explain the observed gender gap
  • Joëts and Mignon (open access) study a sample of 25,480 retracted research articles over the period 1923 to 2023, and find that articles retracted for serious misconduct, such as data fabrication, take longer to be retracted, and subscription-based journals are more effective than open access journals in implementing timely retractions
  • Adams and Xu (open access) find that women’s representation in both STEM and Non-STEM fields is higher in more gender-equal countries and countries with greater academic freedom, and women’s representation is higher in fields with more inclusive cultures
  • Chugunova et al. (open access) survey German researchers, and find that researchers are widely using AI tools, for primary and creative tasks, but that there is a persistent gender gap in AI use
  • Ham, Wright, and Ye (open access) produce updated rankings of economics journals and document the spectacular rise of the new society journals in economics, then show that soliciting top authors connected to the editors explains their performance, rather than editor reputations, editor experience, citations from parent journals, or the number of articles published
  • Aubry et al. (open access) conduct a meta-analysis of 88 studies published between 1985 and 2023, and find that a 1% rise in the immigrant labour force reduces native wages by a statistically and economically insignificant 0.033% on average
  • Brade, Himmler, and Jäckle find that providing students with ongoing relative feedback on accumulated course credits increases the likelihood of graduating within one year of the officially scheduled study duration by 3.7 percentage points (an 8 percent increase)
  • Galván and Tenenbaum (with ungated earlier version here) find that parenthood imposes a significant penalty on scientific productivity of mothers but not on that of fathers in Uruguay, with mothers’ productivity declining on average by 17 percent following childbirth
  • Nye et al. find that there is a robust positive relationship between education and free market views in most developed and developing countries
  • Bruns et al. (open access) find that female-authored articles in economics take 9 percent longer to accept in journals, but that this gender gap narrows as female representation in an area of research deepens

Thursday, 15 January 2026

What we learn from Freelancer.com about labour market signalling in the age of generative AI

In yesterday's post, I outlined my case for why generative AI reduces the quality of signalling in education. That is, how good education (qualification, or grades) is as a signal to employers of an applicant's ability. There is evidence to support this case, from two recent papers.

The first paper is this pre-print by Jingyi Cui, Gabriel Dias, and Justin Ye (all Yale University), which looks at the signalling benefit in cover letters. Specifically, they study:

...the introduction of a generative AI cover letter writing tool on Freelancer.com, one of the world’s largest online labor platforms. Freelancer connects international workers and employers to collaborate on short-term, skilled, and mostly remote jobs. On April 19, 2023, Freelancer introduced the “AI Bid Writer,” a tool that automatically generates cover letters tailored to employers’ job descriptions that workers can use or edit. The tool was available to a large subset of workers depending on their membership plans.

Cui et al. use eight months of data on two skill categories (PHP, and Internet Marketing), which covers over five million cover letters submitted to over 100,000 job opportunities. They observe who had access to the tool, as well as who used the tool to generate a cover letter, and how much time they spent refining the AI-generated cover letter.

Cui et al. look at the impact of the availability of the generative AI tool on callback rates, using a difference-in-differences research design. This effectively involves comparing differences in callback rates between applicants with and without access to the tool, before and after the tool was made available. Cui et al. find that:

...access to the generative AI writing tool increased cover-letter tailoring by 0.16 standard deviations, while actual usage raised tailoring by 1.36 standard deviations. Applying the same design to callbacks as the outcome, we find that access to the generative AI tool increased the probability of receiving a callback by 0.43 percentage points, and usage raised it by 3.56 percentage points. The latter represents a 51% increase relative to the pre-rollout average callback rate of 7.02%.

All good so far. Job applicants are made significantly better off (in terms of receiving a callback) by using the tool. However:

Our second finding is that AI substitutes for, rather than complements, workers’ pre-AI cover letter tailoring skills... We find that workers who previously wrote more tailored cover letters experienced smaller gains in cover letter tailoring—indeed, the best writers... experienced 27% smaller gains than the weakest ones. By enabling less skilled writers to produce more tailored cover letters, AI narrows the gap between workers with different initial abilities.

In other words, employers are now less able to distinguish the quality of the worker by using the quality of the writing in the cover letter. The consequence of this is that:

The correlation between cover-letter tailoring and receiving a callback fell by 51% after the launch of the AI tool, and the correlation with receiving an offer fell by 79%. Instead, employers shifted toward other signals less susceptible to AI influence, such as workers’ past work experience. The correlation between callbacks and workers’ review scores—the platform’s proprietary metric summarizing past work experiences on the platform and determining the default ranking of applications—rose by 5%. These patterns suggest that as AI adoption increases, employers substitute away from easily manipulated signals like cover letters toward harder-to-fake indicators of quality.

The total number of interviews and job offers were unchanged during this period. Cui et al. don't directly report whether the number of callbacks changed, but if we infer that from there being no aggregate change in the number of interviews, then this is consistent with the idea that the key difference is in the distribution of who received the jobs (and callbacks). Workers with a strong alternative signal (other than a well-written cover letter) received more callbacks, meaning that workers who lack an alternative signal received fewer callbacks. That has an important distributional consequence. New workers typically lack past review scores, so as employers lean more heavily on reviews, workers who are new to Freelancer.com will be disadvantaged and will find it more difficult to get a callback. Overall, in this case, the impact of the generative AI tool on the quality of signalling is negative.

The second paper is this job market paper by Anaïs Galdin (Dartmouth College) and Jesse Silbert (Princeton), who also use data from Freelancer.com. The difference is that they carefully evaluate employers' willingness-to-pay for workers, using the bid data. They also look at customisation of the text of the whole proposal, not just the cover letter. Another difference is that Galdin and Silbert look at a different job type, coding. Their data covers 2.7 million applications to 61,000 job openings, by 212,000 job applicants. Although Galdin and Silbert's paper is far more technical than the Cui et al. paper, Galdin and Silbert's results are somewhat similar (in terms of what they tell us about signalling):

First, we show that before the mass adoption of LLMs, employers had a significantly higher willingness to pay for workers who sent more customized proposals. Estimating a reduced-form multinomial logit model of employer demand using our measure of signal, we find that, all else equal, workers with a one standard deviation higher signal have the same increased chance of being hired as workers with a $26 lower bid... Second, we provide evidence that before the adoption of LLMs, employers valued workers’ signals because signals were predictive of workers’ effort, which in turn predicted workers’ ability to complete the posted job successfully. Third, we find, however, that after the mass adoption of LLMs, these patterns weaken significantly or disappear completely: employer willingness to pay for workers sending higher signals falls sharply, proposals written with the platform’s native AI-writing tool exhibit a negative correlation between effort and signal, and signals no longer predict successful job completion conditional on being hired.

This is strong evidence that, in this context at least, the introduction of the generative AI tool substantially reduces the quality of the job application signal. Galdin and Silbert then build an economic model calibrated based on their empirical results, and using that model they find that:

Compared to the status quo pre-LLM equilibrium with signaling, our no-signaling counterfactual equilibrium is far less meritocratic. Workers in the bottom quintile of the ability distribution are hired 14% more often, while workers in the top quintile are hired 19% less often.

This suggests an even worse outcome than what Cui et al. find. Galdin and Silbert's results suggest that the distributional changes in who gets offered work make high-quality workers worse off, and low-quality workers better off. That is what we would expect when the quality of signalling is reduced. Galdin and Silbert go on to say that:

These effects are driven by three mechanisms. First, employers previously relied on signals to make hiring decisions, so losing access to them impinges on their ability to discern worker ability. Second, more indirectly, the significant positive correlation between a worker’s ability and cost implies that, when employers lose access to signals and workers are forced to compete more intensely on wages, the prevailing workers with lower bids tend to have lower abilities. Third, since workers’ observable characteristics are poor predictors of their ability, employers have little to no information to distinguish between high and low-ability workers.

These changes to hiring patterns lead to a 5% reduction in average wages, a 1.5% reduction in overall hiring rate per posted job, a 4% reduction in worker surplus, and a small, less than 1%, increase in employer surplus.

The overall takeaway from both papers is that generative AI reduces the quality of signals to employers. They don't speak directly to the quality of education signalling, but we can infer that if the quality of other signals of worker quality are reduced by generative AI, then the quality of the education signal likely is as well. That's because proposals and cover letters on Freelancer.com play much the same signalling role as degrees and grades. In both cases, employers can’t observe ability directly, so they rely on an observable, costly signal. On Freelancer.com, that is the proposal or cover letter, and for education, that is the degree or grade. Generative AI makes it much easier for almost anyone to produce a polished proposal or assessment, so the observable output becomes less tightly linked to ability, weakening the value of both kinds of signal.

Read more:

Wednesday, 14 January 2026

David Deming on generative AI and commitment to learning, and the impact of generative AI on signalling in education

When I was writing yesterday's post on generative AI and the economics major, I really wished I had read this post by David Deming on generative AI and learning, and then I could have linked the two together. Instead, I'll use this post to draw on Deming's ideas and flesh out why I think that generative AI makes signalling in education harder, and why that is a problem (in contrast with Matthew Kahn, who as noted in yesterday's post thinks that generative AI reduces problems of information asymmetry).

First, Deming writes about the tension in education between students' desire to learn, and their desire to make life easier (the 'divided self', drawing on the example of Odysseus:

A vivid illustration of our divided self comes from a famous behavioral economics paper called “Tying Odysseus to the Mast: Evidence from a Commitment Savings Product in the Philippines”. They found that customers flocked to and greatly benefited from a bank product that prevented them from accessing their own savings in the future. Just like when Odysseus tied himself to the mast of his ship so that he would not be tempted by the alluring song of the Sirens...

The Sirens offer Odysseus the promise of unlimited knowledge and wisdom without effort. He survives not by resisting his curiosity, but by restricting its scope and constraining his own ability to operate. The Sirens possess all the knowledge that Odysseus seeks, but he realizes he must earn it. There are no shortcuts. This is the perfect metaphor for learning in the age of superintelligence.

The analogy to generative AI is obvious. Generative AI is a tool that offers unlimited knowledge without effort, but using that tool means that the effort necessary for genuine learning is not expended. As Deming concludes:

Learning is hard work. And there is now lots of evidence that people will offload it if given the chance, even if it isn’t in their long-run interest. After nearly two decades of teaching, I’ve realized that my classroom is more than just a place where knowledge is transmitted. It’s also a community where we tie ourselves to the mast together to overcome the suffering of learning hard things.

How does this relate to the quality of signalling? It is worth reviewing the role of signalling in education, as I discussed in this post:

On the other hand, education provides a signal to employers about the quality of the job applicant. Signalling is necessary because there is an adverse selection problem in the labour market. Job applicants know whether they are high quality or not, but employers do not know. The 'quality' of a job applicant is private information. High-quality (intelligent, hard-working, etc.) job applicants want to reveal to employers that they are hard-working. To do this, they need a signal - a way of credibly revealing their quality to prospective employers.

In order for a signal to be effective, it must be costly (otherwise everyone, even those who are lower quality job applicants, would provide the signal), and it must be costly in a way that makes it unattractive for the lower quality job applicants to attempt (such as being more costly for them to engage in).

Qualifications (degrees, diplomas, etc.) provide an effective signal (they are costly, and more costly for lower quality applicants who may have to attempt papers multiple times in order to pass, or work much harder in order to pass). So by engaging in university-level study, students are providing a signal of their quality to future employers. The qualification signals to the employer that the student is high quality, since a low-quality applicant wouldn't have put in the hard work required to get the qualification.

What does generative AI like ChatGPT do to this signalling? When students can outsource much of the effort required to complete assessments, then not-so-good students no longer need to spend more time or effort to complete their qualification than do good students. Take-home assignments, essays, or written reports might be completed to a passing standard with little effort from the student at all. Completing a qualification is no longer costly in a way that makes it unattractive for lower quality job applicants to attempt. That means that employers would no longer be able to infer a job applicant's quality from whether they completed a qualification or not.

A solution suggested by Deming's post is for students to find some way of committing themselves to not using generative AI in assessment. For this to solve the signalling problem, the commitment has to be credible (believable), such as being verifiable by potential employers later. While students could commit themselves to not using generative AI, and maintaining effortful learning, it is difficult to see how students who do so could credibly reveal that they have done so. They require some way of ensuring that potential employers could verify that the student didn't use generative AI. This is where universities could step in. If universities can certify that particular qualifications were 'AI-resistant', such as where assessment includes substantial supervised, in-person components (for example, tests or examinations), then that would help maintain the quality of the education signal. There are other options of course, including oral examinations, group or individual presentations, or supervised practice assessments that make learning harder to fake. However, anything that falls short of being AI-resistant in the eyes of employers is unlikely to work. However, limiting assessment styles in order to certify effortful learning doesn't come without a trade-off. AI-resistant assessment is likely to be less accessible, less flexible, less authentic, and potentially more likely to promote anxiety in students.

Kahn suggested in his post that "AI-proctored assessments and virtual tutors suddenly make effort and mastery visible in real time". That could work. However, AI proctoring by itself is not a solution. In order to retain its status as a signal of quality for students, assessments need to require more effort to complete well for not-so-good students than for good students. Having an assessment where an AI proctors while a student uses a generative AI avatar to make an AI-generated presentation is not going to work. I'm sure that's not what Kahn was envisaging. Proctoring of online assessment (either by humans or by AI) is not as easy as it sounds. Last year I was part of a group tasked with evaluating online proctoring tools, to be rolled out for our new graduate medical school, and I was left thoroughly underwhelmed. All of the tools that we evaluated seemed to have simple workarounds that moderately tech-savvy students could easily employ. The solution that was offered (when the demonstrators could even offer a solution) was to have students complete assessments on-site, which more or less defeats the purpose of online proctoring.

Anyway, the point is that generative AI reduces the signalling value of education. There are solutions where that signalling value can be retained, but that requires students to commit to effortful learning, and universities to certify that effort in a way that students who don’t expend it cannot mimic.

[HT: Marginal Revolution]

Read more:

Tuesday, 13 January 2026

Matthew Kahn on generative AI and the economics major

There doesn't appear to be much of a consensus on how to adapt higher education to generative AI. I have my own thoughts, which I have shared here several times already (see the links at the end of this post). However, I am open to the ideas of others. So, I was interested to read this new paper by Matthew Kahn (University of Southern California), where he discusses his views on the future of the economics major. Specifically:

I present an optimistic outlook on the evolution of our economics major over the coming decade, centered on the possibility of highly tailored, student-specific training that fully acknowledges the rich diversity of our students’ abilities, interests, and educational goals.

Kahn is correct in laying out the challenge that we face:

Faculty now face a steeper challenge in helping students see the value of investing sustained effort in a demanding subject like economics, especially when AI tools can produce quick answers and when attention is pulled in countless directions by social media, short-form video, gaming, and other digital platforms...

If students are not prepared for rigorous material, then the easy path for them to follow is to rely on the AI as a crutch. AI creates a moral hazard effect. In recent years, I have stopped assigning class papers because it was obvious to me that the well written papers were being written by the AI. Each economics professor faces the challenge of how to use the incentives we control to nudge students to make AI a complement (not a substitute) for their own time investment in their studies.

The challenge of making AI a complement rather than a substitute for learning has been a common theme in my writing on generative AI in education. Kahn's proposed solutions are not dissimilar from mine too. For instance, in introductory economics:

Large language models can now go much further, acting as tireless, patient coaches that deliver truly adaptive “batting practice.” The AI begins with simple exercises and progressively escalates in difficulty, adjusting in real time to the student’s performance. This is exactly the repetitive, low-stakes practice every introductory economics student needs to build intuition. Going forward, I expect that we will see a growing number of economics educators introducing specialized AI economics tools...

And that is exactly what I have done in my ECONS101 and ECONS102 classes this year. Both classes had AI tutors that were pre-trained with a knowledge base of the lecture material, and students could chat with the tutors, ask them questions, develop study guides, practice multiple choice questions, and probably a dozen other use cases I haven't considered. The flexibility of these AI tutors, both for myself and for the students, made them a huge contributor to students' learning this year (at least, that's what students said in their course evaluations at the end of each paper).

Unfortunately, Kahn's prescription for changes at higher levels of the economics major are much weaker. For instance, for intermediate microeconomics he advocates for making use of short skills videos, then:

AI will help here. Students can take the written transcripts from these video presentations and feed these to AI and ask for more examples to make it more intuitive for them. Students can explain their logic to AI and allow the AI to patiently tutor them. Students can ask the robot to generate likely exam questions for them to practice on.

That isn't much of an advance on what he advocates at the introductory level, because it is still simply content plus discussion with an AI tutor. I think there is much more potential value at the intermediate level of getting students to engage in more back-and-forth exploratory discussions with generative AI, and making those discussions a small part of the assessment. That works in theory-based courses (intermediate microeconomics) and econometrics. Kahn could have thought deeper here about the possibilities. However, for intermediate macroeconomics, I really like this suggestion:

AI tools make it possible to immerse students in the real-time decisions faced by figures such as Ben Bernanke in 2008. What information was available at each moment? What nightmare scenarios kept policymakers awake? Interactive simulations can let students experience economic policymaking “on the fly,” combining partial scientific knowledge with radical uncertainty. Such exercises tend to be far more memorable and engaging than static diagrams.

Some 'scripted' AI tools, built on top of ChatGPT (like my AI tutors are) would be wonderful tools for simulation. The AI could be instructed to maintain certain relationships through the simulation, introduce particular shocks, and help the students to evaluate different monetary and fiscal policy responses (or, evaluate the impact of fiscal policy changes). This would be a much more tailored approach than the simulation modelling that Brian Silverstone used when I studied intermediate macroeconomics some twenty years ago. Kahn also has great suggestions for field classes:

Professors teaching field classes often assign a textbook. Such a textbook offers both the professor and the students a linear progression structure but this teaching approach can feel dated as the professor delegates the course structure to a stranger who does not have experience teaching at that specific university. Textbooks are not often updated and the material (such as specific box examples) can quickly feel dated. AI addresses this staleness challenge...

In recent months, I have experimented with loading many interesting readings to a shared Google LM Notebook website and encouraging my students to ask the AI for summaries about these writings and to ask their own questions...

This year, I'll be teaching graduate development economics, for the first time in about a decade, and Kahn has pre-empted almost exactly the approach I was intending to adopt, with students engaged in conversation with a generative AI model (I wasn't sure if I would use NotebookLM or ChatGPT for this purpose), then expanding on that conversation within class. I'm also considering the feasibility of getting students in that class to work with generative AI on a short research project - collating and analysing data to answer some particular research questions, or to replicate some specific study. The paper is in the B Trimester, so I still have time to flesh out the details.

Kahn then goes on to discuss the impacts of generative AI on research assistant and teaching assistant opportunities. I think he is a bit too pessimistic though, since he concludes that human research assistants will only be useful for developing new (spatial) datasets. I think there are many more use cases for human research assistants still, and not just for data collection or data cleaning. Finally, Kahn addresses information asymmetry, noting that:

For far too long, students have been choosing majors in the dark—picking “prestigious” fields without really knowing what the degree will do for them, while universities have been able to hide behind vague reputations and opaque classrooms. Parents write enormous checks with almost no idea what they’re buying, employers wonder if the diploma still means anything, and everyone quietly suspects a lot of the game is just expensive signaling.

AI changes that. Cheap, frequent, AI-proctored assessments and virtual tutors suddenly make effort and mastery visible in real time. Professors discover whether students are actually learning the material. Parents can peek at meaningful progress dashboards instead of just getting billing statements. Employers can ask for verifiable records of real skills instead of trusting a transcript that could have been gamed.

I'm not sold on AI proctoring as a solution. In fact, I worry that it will simply lead to an 'arms race' of student AI tools vs. faculty AI tools. The advent of AI avatars and agentic AI simply makes this even more likely across a wider range of assessment types. However, I do agree with Kahn that a lot of education is signalling to employers, and that generative AI is going to change the dynamics of education away from signalling. Kahn seems to think that is a good thing. I worry the opposite! Without signalling, it is difficult for good students to distinguish themselves, and that limits the value proposition of higher education. Kahn wants "verifiable records of real skills instead of... a transcript that could have been gamed". However, generative AI makes it much easier for students to game the record of real skills, rendering those records less reliable.

There isn't a consensus on the best path forward. Kahn's paper is a work in progress, and he is inviting others to share their thoughts. I have offered a few of mine in this post, and I look forward to sharing more of my explorations of generative AI in teaching as we go through this year.

[HT: Marginal Revolution]

Read more:

Sunday, 11 January 2026

Book review: The Big Con

Many of my students go into the consulting industry when they graduate. Most go to one of the 'Big Four' (PWC, EY, Deloitte, KPMG). I've only had a couple that I know have gone to McKinsey, and none to Boston Consulting or Bain (the 'Big Three'). So, I was interested to read what Mariana Mazzucato and Rosie Collington would have to say in their 2023 book The Big Con. The thesis of the book is simple, as they explain in the introduction:

This book shows why the growth in consulting contracts, the business model of big consultancies, the underlying conflicts of interest and the lack of transparency matter hugely. The consulting industry today is not merely a helping hand; its advice and actions are not purely technical and neutral, facilitating a more effective functioning of society and reducing the "transaction costs" of clients. It enables the actualisation of a particular view of the economy that has created dysfunctions in government and business around the world.

The book uses a large number of real-world stories of 'consultancy firms gone wrong', stitching them together into a narrative of how the consulting industry makes us worse off. Many of the individual stories will not be unknown to those who regularly keep up with business and politics. What Mazzucato and Collington do well is track the rise of the consulting industry over time, and how it has become endemic across the public sector in particular. They use far fewer examples from the private sector, but I don't doubt that many of the issues that governments face also apply to private sector firms, but just do not have the same societal impacts. Through the development of the consulting industry, Mazzucato and Collington unpack the industry incentives at play, the interconnections between consulting, business, and government, and the conflicts of interest that result. Finally, they outline the consequential impacts on state capacity. In particular:

The more governments and businesses outsource, the less they know how to do, causing organizations to become hollowed out, stuck in time and unable to evolve. With consultants involved at every turn, there is often very little "learning-by-doing." Consultancies' clients become "infantilised"... A government department that contracts out all the services it is responsible for providing may be able to reduce costs in the short Term, but it will eventually cost it more due to the loss in knowledge about how to deliver those services, and thus how to adapt the collection of capabilities within its department to meet citizens' changing needs.

What is missing from the discussion of problems is an evaluation of just how costly the loss of capability in the public service is. Governments are focused on cost savings, and there are short term cost savings. But how large are the long-term losses that result from the loss in the ability to monitor and evaluate contracts (as one example)? This would have given the arguments in the book more weight than the few case studies that Mazzucato and Collington use.

Moreover, while the explanation of the problem and the examples used to illustrate it are good, the solutions proposed are underdeveloped and somewhat banal. For example, while "a new vision, narrative and mission for the civil service" is a shout-out to Mazzucato's previous book Mission Economy (which I reviewed here), the book fails to provide a coherent pathway to extricate the public sector from the grip of consultants. I imagine that, faced with the need to develop a new vision, narrative, and mission, the first thing that many government departments would do is to contract a consultancy to assist with that need. Mazzucato and Collington don't offer a way of avoiding that outcome. Their second solution, of investing in internal capacity and capability creation, is likely to be important. But again, it requires the public service to disentangle itself from the consulting industry, and the ways that can be achieved are not explained. Third, embedding learning into contract evaluations is important, but it relies on other factors that are not addressed, such as the ability of the public sector to retain talent. Finally, mandating transparency and disclosure of conflicts of interest should almost go without saying, but it is good that Mazzucato and Collington say it.

Overall, I enjoyed reading this book. It's a couple of years old now, but the examples are still highly relevant, and the consulting industry's tentacles are still firmly wrapped around the body of government in many (most?) countries. Mazzucato and Collington have highlighted the problem, and shone some light on potential solutions. What we need now is a strong public sector leadership, backed by government, that is willing to rebuild capacity and capability in sensible ways. Hopefully, this book is one step on that journey (and apologies to my future students if the consulting industry becomes smaller as a result!).