Thursday, 22 January 2026

What Hamilton and Waikato can learn from France about the consequences of inter-municipal water supply

Hamilton City and Waikato District are transitioning their water services (drinking water, wastewater, stormwater) to a new, jointly owned Council Controlled Organisation (CCO) called IAWAI - Flowing Waters. IAWAI will deliver water services across all of Hamilton City and Waikato District, and is a response to the central government's 'Local Water Done Well' plan "to address New Zealand’s long-standing water infrastructure challenges". What are likely to be some of the consequences of the merging of water services across Hamilton and Waikato?

Interestingly, this recent article by Mehdi Guelmamen, Serge Garcia (both University of Lorraine), and Alexandre Mayol (University of Lille), published in the journal International Review of Law and Economics (open access), may provide us with some idea. They look at inter-municipal cooperation in the provision of drinking water in France. France provides an interesting case study because:

With roughly 12,000 water services—90 % serving populations under 10,000—and over 70 % managed by individual municipalities acting independently, there is substantial heterogeneity in governance arrangements.

That is similar in spirit to our situation. Although Hamilton City (population around 192,000) and Waikato District (population around 86,000) are substantially larger than the municipalities in Guelmamen et al.'s sample, Waikato District is made up of many communities with their own separate water infrastructure (Huntly, Ngāruawāhia, Raglan, Pōkeno, Te Kauwhata, and others). Those many communities, aggregated into a single water entity, mimics the French context. 

Guelmamen et al. investigate the determinants of inter-municipal cooperation (IMC) in drinking water supply, as well as how IMC affects pricing of drinking water, water quality, and scarcity of water, using data from 10,000 water services operations over the period from 2008 to 2021. Their analysis involves a two-step approach, where they first look at the associations with pricing, and then look at how the services are organised, conditional on the prices. They find that public water services are more likely to cooperate than privatised services, but of more interest to me, they also found that:

First, IMC does not necessarily lead to lower water prices; on the contrary, water prices are often higher under IMC, reflecting additional transaction costs and the financing of investments enabled or encouraged by cooperative arrangements... Third, while IMC generally improves network performance—as evidenced by lower loss rates—the quality improvements are more pronounced in some institutional forms (e.g., communities rather than syndicates).

That first finding arises in spite of an expectation of economies of scale from larger water services operations. Guelmamen et al. explain this as follows:

First, cooperation often involves additional administrative costs due to the need for inter-municipal coordination, governance structures and compliance with multi-party agreements. Second, the larger scale management facilitated by IMC may lead to increased investment in infrastructure, which, while beneficial in the long run, increases short-term costs that are passed on to consumers...

So, even though there may be economies of scale in terms of water provision, these were more than offset by coordination and governance costs, and investment in higher quality water services. In their estimates, this showed up in a combination of three effects. First, there was a negative (and convex) relationship between network size and price (representing economies of scale, as bigger networks have lower average costs, but the cost savings from bigger networks get smaller as network size gets bigger). Second, there was a negative (but concave) relationship between the number of municipalities in the IMC and price (again representing economics of scale, but in this case they become less negative as more municipalities are included). Third, there was a positive relationship between population size and price. The combination of those three effects is that larger IMCs, particularly those that involve more municipalities, have higher, rather than lower prices.

The greater investment in higher quality water services is supported by their third finding above, which shows that IMCs have better network performance (less water is lost). IMCs also had higher quality water, measured as fewer breaches of microbiological and physico-chemical water standards).

What does this tell us for Hamilton and Waikato? Obviously, the context is different, but many of the elements (such as combining multiple municipal water services suppliers into one, and potential economics of scale) are the same. Moreover, Waikato District already has many small water services combined into a single entity, which is not dissimilar to the situation in France. So, if we take these French results at face value, then the risk is that the price of water will go up. Hamilton and Waikato don't currently have water meters, so the unit price of water will remain zero (which in itself may be a problem, because it incentives overuse of water). Instead, water is charged as a fixed charge in annual property rates. The higher price of water will need to be covered by a higher annual fixed charge within the rates bills in Hamilton and Waikato. On the other hand, the quality of drinking water may increase, and drinking water provision may be more sustainable due to higher investment spending. And, of course, a more sustainable provision of water services is what the central government's plan was intended to achieve.

How will we know if the creation of IAWAI is a good thing? Earlier indicators will be decreases in total administration and overhead costs, increases in capital expenditure (both for new construction and for maintenance), and improvements in water quality.

Read more:

Tuesday, 20 January 2026

Why the effects of a guaranteed income on income and employment in Texas and Illinois shouldn't surprise us

The idea of a universal basic income (sometimes called an income guarantee) has gathered a lot of interest over recent years, particularly as fears of job losses to artificial intelligence have risen. The underlying idea is simple. Government makes a regular payment to all citizens (so it's universal) large enough to cover their basic needs (so it's a basic income). However, other than a number of pilot projects, no country has yet fully implemented a universal basic income (UBI), and many have apparently changed their minds after a pilot (see here and here). There are a couple of reasons for that. First, obviously, is the cost. A basic income of just $100 per week for all New Zealanders would cost about $26 billion per year. That would increase the government budget by about 14 percent [*]. And $100 is not a basic income, because no one is going to be able to live on such a paltry amount. Second, there are worries about the incentive effects of a universal basic income. When workers can receive money from the government for doing nothing (because it's universal), will they work less, offsetting some (if not all) of the additional income from the UBI?

That brings me to this NBER working paper by Eva Vivalt (University of Toronto) and co-authors. The paper was originally published back in 2024, and received quite a bit of coverage then (for examples from the media, see here and here), but has been revised since (and I read the September 2025 revision). Vivalt et al. evaluate the impact of two large guaranteed income programmes in north central Texas (including Dallas) and northern Illinois (including Chicago), both of which were implemented by local non-profit organisations (with the programmes funded by OpenResearch, founded by OpenAI CEO Sam Altman). These are not quite UBIs of course, because they weren't available to everyone. Nevertheless, they do help us to understand the incentive effects that could apply to a UBI. Like many would hope a UBI would be (ignoring the immense fiscal cost), the programmes were quite generous (for those in the treatment group, at least) and:

...distributed $1,000 per month for three years to 1,000 low-income individuals randomized into the treatment group. 2,000 participants were randomly assigned to receive $50 per month as the control group.

Vivalt et al. look at the impacts on employment and other related outcomes. There is a huge amount of detail in the paper, so I'm just going to look at some of the highlights. In terms of the overall effect, they find that:

...total individual income excluding the transfers fell by about $1,800 per year relative to the control group, with these effects growing over the course of the study.

So, people receiving the UBI received less income (excluding the UBI - their income increased once you consider the UBI plus their other income). In terms of employment:

The program caused a 3.9 percentage point reduction in the extensive margin of labor supply and a 1-2 hours/week reduction in labor hours for participants. The estimates of the effects of cash on income and labor hours represent an approximately 5-6% decline relative to the control group mean.

People responded to receiving a UBI by working less, just as many of those who had concerns about the incentive effects of a UBI feared. However, the negative incentives also extended to others in the household:

Interestingly, partners and other adults in the household seem to change their labor supply by about as much as participants. For every one dollar received, total household income excluding the transfers fell by around 29 cents, and total individual income fell by around 16 cents.

So, although households received $1000 extra per month from the UBI, their income only increased by $710 on average, because the person receiving the UBI, and other adults in the household, worked less on average. What were they doing with their extra time? Vivalt et al. use American Time Use Survey data, and find that:

Treated participants primarily use the time gained through working less to increase leisure, also increasing time spent on driving or other transportation and finances, though the effects are modest in magnitude. We can reject even small changes in several other specific categories of time use that could be important for gauging the policy effects of an unearned cash transfer, such as time spent on childcare, exercising, searching for a job, or time spent on self improvement.

So, people spend more time on leisure. Do they upgrade to better jobs, which is what some people claim would happen (because the UBI would give people the freedom to spend more time searching for a better job match)? Or do they invest in more education, or start their own business? It appears not, as:

...we find no substantive changes in any dimension of quality of employment and can rule out even small improvements, rejecting improvements in the index of more than 0.022 standard deviations and increases in wages of more than 60 cents. We find that those in the treatment group have more interest in entrepreneurial activities and are willing to take more financial risks, but the coefficient on whether a participant started a business is close to 0 and not statistically significant. Using data from the National Student Clearinghouse on post-secondary education, we see no significant impacts overall but some suggestive evidence that younger individuals may pursue more education as a result of the transfers...

Some people have concluded that the results show that a guaranteed income or UBI is a bad policy. However, the guaranteed income did increase incomes (including transfers) overall and therefore makes people on average better off financially. Leisure time is an important component of our wellbeing, so we shouldn't necessarily consider more leisure time a bad outcome for a policy. In fact, Vivalt et al. also find that on average the guaranteed income increases subjective wellbeing on average (but only in the first year, after which subjective wellbeing returns to baseline). 

The results should have surprised anyone. They are consistent with a simple model of the labour-leisure tradeoff that I cover in my ECONS101 class. The model (of the worker's decision) is outlined in the diagram below. The worker's decision is constrained by the amount of discretionary time available to them. Let's call this their time endowment, E. If they spent every hour of discretionary time on leisure, they would have E hours of leisure, but zero income. That is one end point of the worker's budget constraint, on the x-axis. The x-axis measures leisure time from left to right, but that means that it also measures work time (from right to left, because each one hour less leisure means one hour more of work). The difference between E and the number of leisure hours is the number of work hours. Next, if the worker spent every hour working, they would have zero leisure, but would have an income equal to W0*E (the wage, W0, multiplied by the whole time endowment, E). That is the other end point of the worker's budget constraint, on the y-axis. The worker's budget constraint joins up those two points, and has a slope that is equal to the wage (more correctly, it is equal to -W0, and it is negative because the budget constraint is downward sloping). The slope of the budget constraint represents the opportunity cost of leisure. Every hour the worker spends on leisure, they give up the wage of W0. Now, we represent the worker's preferences over leisure and consumption by indifference curves. The worker is trying to maximise their utility, which means that they are trying to get to the highest possible indifference curve that they can, while remaining within their budget constraint. The highest indifference curve they can reach on our diagram is I0. The worker's optimum is the bundle of leisure and consumption where their highest indifference curve meets the budget constraint. This is the bundle A, which contains leisure of L0 (and work hours equal to [E-L0]), and consumption of C0.

Now, consider what happens when the worker receives a UBI. This is shown in the diagram below. At each level of leisure (and work), their income (and therefore consumption) is higher. That shifts the budget constraint up vertically by the amount of the UBI. If the worker spends no time at all working, they now have consumption of U, instead of zero, and if they spend all of their time working (and have no leisure) their consumption would be W0*E+U. The worker can now reach a higher indifference curve (I1). Their new optimal bundle of leisure and consumption is B, which contains leisure of L1 (and work hours equal to [E-L1]), and consumption of C1. Notice that the worker now consumes more leisure and more consumption as well. Because leisure has increased, that means that the number of work hours has decreased. The increase in leisure, decrease in work hours, and increase in income overall (when the UBI is included), are consistent with what Vivalt et al. found.

So, based on a simple model of the labour-leisure tradeoff, the results of this guaranteed income programme are not surprising. We should have expected a reduction in work, and a reduction in labour income, and that's what Vivalt et al. found. The question policymakers are left with is whether a large income transfer like this is worth it for government, if each $1000 transferred increases incomes by just $710 on average.

[HT: Marginal Revolution, back in 2024]

*****

[*] Of course, if other welfare payments were scrapped in favour of a universal basic income, then the net cost would be lower. Nevertheless, the point that the cost is very high still stands.

Monday, 19 January 2026

Immigration and the wages of the native-born population

Restrictions on immigration flows are getting a lot of policy attention of late. The argument is that immigration reduces wages for the native-born population. But, is there evidence for that? As you might expect, there are literally dozens of studies that have looked into this question, and there are now several meta-analyses that combine the results across many studies (including the meta-analysis that I referred to in this 2016 post. That post referred to this 2005 article by Longhi et al., which found that:

Overall, the effect is very small. A 1 percentage point increase in the proportion of immigrants in the labour force lowers wages across the investigated studies by only 0.119%.

Longhi et al. then followed up with another article in 2010, which also found a very small effect of immigration on wages, specifically:

...a 1% point increase in the immigration to population ratio reduces wages by only 0.03%.

A new meta-analysis article by Amandine Aubry (Université de Caen Normandie) and co-authors, published in the journal Labour Economics (open access), picks up those two earlier meta-analyses, and extends the analysis up to 2023. Specifically, their analysis includes:

...88 studies published between 1985 and 2023, encompassing 2,989 reduced-form estimates of the wage effects of immigration.

Many post-2010 studies use shift-share (Bartik) instruments to estimate the causal effect of immigration on wages. These instruments predict regional immigrant inflows by interacting a region’s pre-existing settlement shares by origin with national inflows from those origins. They then use the predicted inflows as an instrument for actual inflows in an instrumental variables framework. This approach helps address the concern that immigrants may sort into destinations with stronger labour markets, which would make immigration and wages correlated for reasons other than a causal effect of immigration on wages.

Now, Aubry et al. are more concerned with investigating the heterogeneity in the estimated effects of immigration on wages, rather than the overall estimate. Nevertheless, I think the overall estimate is interesting and important, and for that they find:

...a 1% rise in the immigrant labour force reduces native wages by about 0.033% on average.

This overall effect is very similar to that from the second meta-analysis by Longhi et al. But it's tiny - a 1 percent larger immigrant labour force would reduce the wages of a native-born worker earning $1000 per week by about 33 cents. And, there is substantial variation around that small overall estimate, which Aubry et al. investigate in some detail. They find that:

...contextual heterogeneity explains part of the variance in the estimates. Estimates for Anglo-Saxon and developing countries are systematically larger than those for other economies, and the historical period covered by a study also affects the results, with later periods being associated with smaller effects. Third, methodological heterogeneity is key... In particular, instrumental variable estimations, which are commonly used to infer causality, yield smaller coefficients than OLS...

More recent studies tend to estimate smaller effects of immigration on wages, as do studies that employ instrumental variables (which also tend to be more recent studies). That accords with the results from the two Longhi et al. meta-analyses, where the second study found a much smaller overall effect than the first study. The shift-share instrument only became established as a method by David Card and others in the early 2000s, so its use only began diffusing from then. Given that these sorts of analyses have become the industry standard now, we can generally expect future studies to find smaller effects than older studies.

The results for developing countries, where the effect of immigration on wages is more positive than for developed countries deserves more exploration. Aubry et al.'s sample includes estimates from only a handful of developing countries (Colombia, Costa Rica, Malaysia, Peru, South Africa, and Thailand). This also suggests that more studies on the effect of immigration on wages in developing country contexts would be useful.

The overall takeaway from this meta-analysis is that immigration on average has a negligible overall effect on the wages of the native-born population on average. Unfortunately, this is one of those cases where the empirical results do not accord with 'folk economics'. Although the average effect is negligible, the wages of some subgroups may be negatively impacted by immigration in some contexts (and Aubry et al.'s results are consistent with the idea that the impacts are negative in some contexts or for some groups). The general public (and policy makers) will tend to focus on those negative impacts. Nevertheless, it should be possible in principle to address those negative impacts through policy (economists refer to this as the compensation principle), so that those who benefit from immigration (including immigrants themselves) can continue to do so.

Read more:

Sunday, 18 January 2026

The impact of British austerity on mortality and life expectancy

In 2010, the British government adopted a contractionary fiscal policy (austerity) to try and reduce government debt, which had built up during the Global Financial Crisis. Education and social security (social welfare) bore the brunt of the reductions in government spending, but other areas of spending, such as health, were not immune to the cuts (although health spending did not reduce, the increase in spending from year to year reduced substantially). However, austerity is not a free lunch. What were some of the consequences of the reduction in spending?

That is the question that this discussion paper by Yonatan Berman and Tora Hovland (both King’s College London) takes up, focusing on the impacts on mortality and life expectancy. Berman and Hovland note that the reductions in welfare and health spending did not affect all parts of the country equally. They use the differential impacts between different local authorities (or regions, in some analyses) to evaluate the impact of the austerity measures, in a difference-in-differences research design. That essentially involves comparing areas that were more impacted by austerity to those that were less impacted, between the time before and the time after austerity was introduced in 2010. Berman and Hovland measure exposure to austerity by the reduction in welfare (or health) spending per capita at the local authority level (or region). Their data covers the period from 2002 to 2019 in annual time steps. In addition to a pooled difference-in-differences analysis (which estimates one overall impact of austerity), they also conduct an event study, which estimates the impact of austerity over time. The event study analysis is the more interesting, so that's what I will focus on. The key results for reductions in welfare spending are summarised in Figure 5 in the paper:

The y-axis on the figure shows the coefficient (how much life expectancy changes for a  £100 per capita per year reduction in spending, relative to the pre-austerity baseline). The red vertical line shows the point in time where austerity began (in mid-2010). Notice that there is a clear reduction in life expectancy for both males and females, starting from about 2013, and increasing over time. Berman and Hovland note that:

...after the onset of austerity measures, we observe a clear reduction in life expectancy, with a more pronounced effect among females. The results indicate that every £100 per capita per year of lost benefits led to a decrease in life expectancy of approximately 0.5–2.5 months.

The results are qualitatively similar for health spending, as shown in Figure 6 of the paper:

Again, the negative impact on life expectancy is noticeable from 2013, and increases over time. Also, notice that the effect from health spending is much larger in magnitude for each £100 per capita per year reduction in spending. This is not surprising, given that health spending has a more direct impact on health, mortality, and longevity. However, the overall impact of austerity also depends on the amount of spending that was cut, which was much larger for welfare than for health. 

Now, it would have been good for Berman and Hovland to explore a little further why the impact of austerity on life expectancy was delayed by two or three years. The delay might raise concerns about whether there were other things that changed between 2010 and 2013 that affected mortality and life expectancy differentially by exposure to austerity. Having said that, we might expect cuts to spending to take some time to filter through into worse health outcomes, and that is also consistent with the increasing magnitude of the impact over time shown in Figures 5 and 6.

Combining the two effects (of welfare spending and health spending), and conducting some back-of-the-envelope calculations, Berman and Hovland find that:

Between 2010 and 2019, austerity measures caused a three-year setback in life expectancy progress, equivalent to about 190,000 excess deaths, or 3 percent of all deaths.

The costs of austerity were quite substantial! However, were there offsetting benefits? Berman and Hovland conduct a Marginal Value of Public Funds (MVPF) analysis, which essentially weighs up the costs and benefits of austerity (in this context, it is basically a cost-benefit analysis for austerity). In this analysis, they find that (when combining both welfare and health effects), the total costs (in terms of the value of life years lost) was £89.6 billion, while the savings on government spending were £38.75 billion. So, every pound of government spending saved had a cost to society of £2.31. On a cost-benefit basis, austerity was not a good deal for society. Moreover, the distributional impacts were important, because:

...poorer local authorities saw smaller increases in life expectancy between 2010 and 2019, or even decreases, compared to richer local authorities (defined by average pay in 2010). These results indicate that austerity measures were not only regressive in their impact on post-tax and transfer income, but they also led to more unequal health outcomes.

If governments are looking to implement policy, ideally those policies shouldn't make society worse off. That should go without saying. Based on this paper, British austerity appears to have made British people significantly worse off, trading lower government spending for higher mortality and lower life expectancy. Berman and Hovland stop short of saying that this was bad policy, instead concluding that:

Paradoxically, this fiscal strategy appears to have contributed to an increase in mortality, potentially offsetting its financial gains. However, it is possible that without austerity, the economic recession in the early 2010s might have been more severe.

It may be the case that the recession would have been worse without austerity, but that is not a certainty. However, given the choice up front, would people living in Britain have preferred a longer recession with fewer deaths, or a shorter recession with more deaths? If austerity really did reduce the length of the recession, the implied tradeoff here is quite stark, and Berman and Hovland's analysis suggests that a longer recession may have been the preferable option.

[HT: Les Oxley]

Friday, 16 January 2026

This week in research #109

Here's what caught my eye in research over the past week (a busy one, after a few quiet weeks):

  • Wang et al. find that a 10 percent increase in housing prices is associated with an average 3.85 percent rise in the probability of smoking, an increase of 0.73 cigarettes smoked per day, and a 3.9 percent increase in the likelihood of frequent drinking in China
  • Agnew, Roger, and Roger find that cognitive reflection, fluid intelligence, and approximate numeracy, account for nearly half of the variance in financial literacy scores and help explain the observed gender gap
  • Joëts and Mignon (open access) study a sample of 25,480 retracted research articles over the period 1923 to 2023, and find that articles retracted for serious misconduct, such as data fabrication, take longer to be retracted, and subscription-based journals are more effective than open access journals in implementing timely retractions
  • Adams and Xu (open access) find that women’s representation in both STEM and Non-STEM fields is higher in more gender-equal countries and countries with greater academic freedom, and women’s representation is higher in fields with more inclusive cultures
  • Chugunova et al. (open access) survey German researchers, and find that researchers are widely using AI tools, for primary and creative tasks, but that there is a persistent gender gap in AI use
  • Ham, Wright, and Ye (open access) produce updated rankings of economics journals and document the spectacular rise of the new society journals in economics, then show that soliciting top authors connected to the editors explains their performance, rather than editor reputations, editor experience, citations from parent journals, or the number of articles published
  • Aubry et al. (open access) conduct a meta-analysis of 88 studies published between 1985 and 2023, and find that a 1% rise in the immigrant labour force reduces native wages by a statistically and economically insignificant 0.033% on average
  • Brade, Himmler, and Jäckle find that providing students with ongoing relative feedback on accumulated course credits increases the likelihood of graduating within one year of the officially scheduled study duration by 3.7 percentage points (an 8 percent increase)
  • Galván and Tenenbaum (with ungated earlier version here) find that parenthood imposes a significant penalty on scientific productivity of mothers but not on that of fathers in Uruguay, with mothers’ productivity declining on average by 17 percent following childbirth
  • Nye et al. find that there is a robust positive relationship between education and free market views in most developed and developing countries
  • Bruns et al. (open access) find that female-authored articles in economics take 9 percent longer to accept in journals, but that this gender gap narrows as female representation in an area of research deepens

Thursday, 15 January 2026

What we learn from Freelancer.com about labour market signalling in the age of generative AI

In yesterday's post, I outlined my case for why generative AI reduces the quality of signalling in education. That is, how good education (qualification, or grades) is as a signal to employers of an applicant's ability. There is evidence to support this case, from two recent papers.

The first paper is this pre-print by Jingyi Cui, Gabriel Dias, and Justin Ye (all Yale University), which looks at the signalling benefit in cover letters. Specifically, they study:

...the introduction of a generative AI cover letter writing tool on Freelancer.com, one of the world’s largest online labor platforms. Freelancer connects international workers and employers to collaborate on short-term, skilled, and mostly remote jobs. On April 19, 2023, Freelancer introduced the “AI Bid Writer,” a tool that automatically generates cover letters tailored to employers’ job descriptions that workers can use or edit. The tool was available to a large subset of workers depending on their membership plans.

Cui et al. use eight months of data on two skill categories (PHP, and Internet Marketing), which covers over five million cover letters submitted to over 100,000 job opportunities. They observe who had access to the tool, as well as who used the tool to generate a cover letter, and how much time they spent refining the AI-generated cover letter.

Cui et al. look at the impact of the availability of the generative AI tool on callback rates, using a difference-in-differences research design. This effectively involves comparing differences in callback rates between applicants with and without access to the tool, before and after the tool was made available. Cui et al. find that:

...access to the generative AI writing tool increased cover-letter tailoring by 0.16 standard deviations, while actual usage raised tailoring by 1.36 standard deviations. Applying the same design to callbacks as the outcome, we find that access to the generative AI tool increased the probability of receiving a callback by 0.43 percentage points, and usage raised it by 3.56 percentage points. The latter represents a 51% increase relative to the pre-rollout average callback rate of 7.02%.

All good so far. Job applicants are made significantly better off (in terms of receiving a callback) by using the tool. However:

Our second finding is that AI substitutes for, rather than complements, workers’ pre-AI cover letter tailoring skills... We find that workers who previously wrote more tailored cover letters experienced smaller gains in cover letter tailoring—indeed, the best writers... experienced 27% smaller gains than the weakest ones. By enabling less skilled writers to produce more tailored cover letters, AI narrows the gap between workers with different initial abilities.

In other words, employers are now less able to distinguish the quality of the worker by using the quality of the writing in the cover letter. The consequence of this is that:

The correlation between cover-letter tailoring and receiving a callback fell by 51% after the launch of the AI tool, and the correlation with receiving an offer fell by 79%. Instead, employers shifted toward other signals less susceptible to AI influence, such as workers’ past work experience. The correlation between callbacks and workers’ review scores—the platform’s proprietary metric summarizing past work experiences on the platform and determining the default ranking of applications—rose by 5%. These patterns suggest that as AI adoption increases, employers substitute away from easily manipulated signals like cover letters toward harder-to-fake indicators of quality.

The total number of interviews and job offers were unchanged during this period. Cui et al. don't directly report whether the number of callbacks changed, but if we infer that from there being no aggregate change in the number of interviews, then this is consistent with the idea that the key difference is in the distribution of who received the jobs (and callbacks). Workers with a strong alternative signal (other than a well-written cover letter) received more callbacks, meaning that workers who lack an alternative signal received fewer callbacks. That has an important distributional consequence. New workers typically lack past review scores, so as employers lean more heavily on reviews, workers who are new to Freelancer.com will be disadvantaged and will find it more difficult to get a callback. Overall, in this case, the impact of the generative AI tool on the quality of signalling is negative.

The second paper is this job market paper by Anaïs Galdin (Dartmouth College) and Jesse Silbert (Princeton), who also use data from Freelancer.com. The difference is that they carefully evaluate employers' willingness-to-pay for workers, using the bid data. They also look at customisation of the text of the whole proposal, not just the cover letter. Another difference is that Galdin and Silbert look at a different job type, coding. Their data covers 2.7 million applications to 61,000 job openings, by 212,000 job applicants. Although Galdin and Silbert's paper is far more technical than the Cui et al. paper, Galdin and Silbert's results are somewhat similar (in terms of what they tell us about signalling):

First, we show that before the mass adoption of LLMs, employers had a significantly higher willingness to pay for workers who sent more customized proposals. Estimating a reduced-form multinomial logit model of employer demand using our measure of signal, we find that, all else equal, workers with a one standard deviation higher signal have the same increased chance of being hired as workers with a $26 lower bid... Second, we provide evidence that before the adoption of LLMs, employers valued workers’ signals because signals were predictive of workers’ effort, which in turn predicted workers’ ability to complete the posted job successfully. Third, we find, however, that after the mass adoption of LLMs, these patterns weaken significantly or disappear completely: employer willingness to pay for workers sending higher signals falls sharply, proposals written with the platform’s native AI-writing tool exhibit a negative correlation between effort and signal, and signals no longer predict successful job completion conditional on being hired.

This is strong evidence that, in this context at least, the introduction of the generative AI tool substantially reduces the quality of the job application signal. Galdin and Silbert then build an economic model calibrated based on their empirical results, and using that model they find that:

Compared to the status quo pre-LLM equilibrium with signaling, our no-signaling counterfactual equilibrium is far less meritocratic. Workers in the bottom quintile of the ability distribution are hired 14% more often, while workers in the top quintile are hired 19% less often.

This suggests an even worse outcome than what Cui et al. find. Galdin and Silbert's results suggest that the distributional changes in who gets offered work make high-quality workers worse off, and low-quality workers better off. That is what we would expect when the quality of signalling is reduced. Galdin and Silbert go on to say that:

These effects are driven by three mechanisms. First, employers previously relied on signals to make hiring decisions, so losing access to them impinges on their ability to discern worker ability. Second, more indirectly, the significant positive correlation between a worker’s ability and cost implies that, when employers lose access to signals and workers are forced to compete more intensely on wages, the prevailing workers with lower bids tend to have lower abilities. Third, since workers’ observable characteristics are poor predictors of their ability, employers have little to no information to distinguish between high and low-ability workers.

These changes to hiring patterns lead to a 5% reduction in average wages, a 1.5% reduction in overall hiring rate per posted job, a 4% reduction in worker surplus, and a small, less than 1%, increase in employer surplus.

The overall takeaway from both papers is that generative AI reduces the quality of signals to employers. They don't speak directly to the quality of education signalling, but we can infer that if the quality of other signals of worker quality are reduced by generative AI, then the quality of the education signal likely is as well. That's because proposals and cover letters on Freelancer.com play much the same signalling role as degrees and grades. In both cases, employers can’t observe ability directly, so they rely on an observable, costly signal. On Freelancer.com, that is the proposal or cover letter, and for education, that is the degree or grade. Generative AI makes it much easier for almost anyone to produce a polished proposal or assessment, so the observable output becomes less tightly linked to ability, weakening the value of both kinds of signal.

Read more:

Wednesday, 14 January 2026

David Deming on generative AI and commitment to learning, and the impact of generative AI on signalling in education

When I was writing yesterday's post on generative AI and the economics major, I really wished I had read this post by David Deming on generative AI and learning, and then I could have linked the two together. Instead, I'll use this post to draw on Deming's ideas and flesh out why I think that generative AI makes signalling in education harder, and why that is a problem (in contrast with Matthew Kahn, who as noted in yesterday's post thinks that generative AI reduces problems of information asymmetry).

First, Deming writes about the tension in education between students' desire to learn, and their desire to make life easier (the 'divided self', drawing on the example of Odysseus:

A vivid illustration of our divided self comes from a famous behavioral economics paper called “Tying Odysseus to the Mast: Evidence from a Commitment Savings Product in the Philippines”. They found that customers flocked to and greatly benefited from a bank product that prevented them from accessing their own savings in the future. Just like when Odysseus tied himself to the mast of his ship so that he would not be tempted by the alluring song of the Sirens...

The Sirens offer Odysseus the promise of unlimited knowledge and wisdom without effort. He survives not by resisting his curiosity, but by restricting its scope and constraining his own ability to operate. The Sirens possess all the knowledge that Odysseus seeks, but he realizes he must earn it. There are no shortcuts. This is the perfect metaphor for learning in the age of superintelligence.

The analogy to generative AI is obvious. Generative AI is a tool that offers unlimited knowledge without effort, but using that tool means that the effort necessary for genuine learning is not expended. As Deming concludes:

Learning is hard work. And there is now lots of evidence that people will offload it if given the chance, even if it isn’t in their long-run interest. After nearly two decades of teaching, I’ve realized that my classroom is more than just a place where knowledge is transmitted. It’s also a community where we tie ourselves to the mast together to overcome the suffering of learning hard things.

How does this relate to the quality of signalling? It is worth reviewing the role of signalling in education, as I discussed in this post:

On the other hand, education provides a signal to employers about the quality of the job applicant. Signalling is necessary because there is an adverse selection problem in the labour market. Job applicants know whether they are high quality or not, but employers do not know. The 'quality' of a job applicant is private information. High-quality (intelligent, hard-working, etc.) job applicants want to reveal to employers that they are hard-working. To do this, they need a signal - a way of credibly revealing their quality to prospective employers.

In order for a signal to be effective, it must be costly (otherwise everyone, even those who are lower quality job applicants, would provide the signal), and it must be costly in a way that makes it unattractive for the lower quality job applicants to attempt (such as being more costly for them to engage in).

Qualifications (degrees, diplomas, etc.) provide an effective signal (they are costly, and more costly for lower quality applicants who may have to attempt papers multiple times in order to pass, or work much harder in order to pass). So by engaging in university-level study, students are providing a signal of their quality to future employers. The qualification signals to the employer that the student is high quality, since a low-quality applicant wouldn't have put in the hard work required to get the qualification.

What does generative AI like ChatGPT do to this signalling? When students can outsource much of the effort required to complete assessments, then not-so-good students no longer need to spend more time or effort to complete their qualification than do good students. Take-home assignments, essays, or written reports might be completed to a passing standard with little effort from the student at all. Completing a qualification is no longer costly in a way that makes it unattractive for lower quality job applicants to attempt. That means that employers would no longer be able to infer a job applicant's quality from whether they completed a qualification or not.

A solution suggested by Deming's post is for students to find some way of committing themselves to not using generative AI in assessment. For this to solve the signalling problem, the commitment has to be credible (believable), such as being verifiable by potential employers later. While students could commit themselves to not using generative AI, and maintaining effortful learning, it is difficult to see how students who do so could credibly reveal that they have done so. They require some way of ensuring that potential employers could verify that the student didn't use generative AI. This is where universities could step in. If universities can certify that particular qualifications were 'AI-resistant', such as where assessment includes substantial supervised, in-person components (for example, tests or examinations), then that would help maintain the quality of the education signal. There are other options of course, including oral examinations, group or individual presentations, or supervised practice assessments that make learning harder to fake. However, anything that falls short of being AI-resistant in the eyes of employers is unlikely to work. However, limiting assessment styles in order to certify effortful learning doesn't come without a trade-off. AI-resistant assessment is likely to be less accessible, less flexible, less authentic, and potentially more likely to promote anxiety in students.

Kahn suggested in his post that "AI-proctored assessments and virtual tutors suddenly make effort and mastery visible in real time". That could work. However, AI proctoring by itself is not a solution. In order to retain its status as a signal of quality for students, assessments need to require more effort to complete well for not-so-good students than for good students. Having an assessment where an AI proctors while a student uses a generative AI avatar to make an AI-generated presentation is not going to work. I'm sure that's not what Kahn was envisaging. Proctoring of online assessment (either by humans or by AI) is not as easy as it sounds. Last year I was part of a group tasked with evaluating online proctoring tools, to be rolled out for our new graduate medical school, and I was left thoroughly underwhelmed. All of the tools that we evaluated seemed to have simple workarounds that moderately tech-savvy students could easily employ. The solution that was offered (when the demonstrators could even offer a solution) was to have students complete assessments on-site, which more or less defeats the purpose of online proctoring.

Anyway, the point is that generative AI reduces the signalling value of education. There are solutions where that signalling value can be retained, but that requires students to commit to effortful learning, and universities to certify that effort in a way that students who don’t expend it cannot mimic.

[HT: Marginal Revolution]

Read more:

Tuesday, 13 January 2026

Matthew Kahn on generative AI and the economics major

There doesn't appear to be much of a consensus on how to adapt higher education to generative AI. I have my own thoughts, which I have shared here several times already (see the links at the end of this post). However, I am open to the ideas of others. So, I was interested to read this new paper by Matthew Kahn (University of Southern California), where he discusses his views on the future of the economics major. Specifically:

I present an optimistic outlook on the evolution of our economics major over the coming decade, centered on the possibility of highly tailored, student-specific training that fully acknowledges the rich diversity of our students’ abilities, interests, and educational goals.

Kahn is correct in laying out the challenge that we face:

Faculty now face a steeper challenge in helping students see the value of investing sustained effort in a demanding subject like economics, especially when AI tools can produce quick answers and when attention is pulled in countless directions by social media, short-form video, gaming, and other digital platforms...

If students are not prepared for rigorous material, then the easy path for them to follow is to rely on the AI as a crutch. AI creates a moral hazard effect. In recent years, I have stopped assigning class papers because it was obvious to me that the well written papers were being written by the AI. Each economics professor faces the challenge of how to use the incentives we control to nudge students to make AI a complement (not a substitute) for their own time investment in their studies.

The challenge of making AI a complement rather than a substitute for learning has been a common theme in my writing on generative AI in education. Kahn's proposed solutions are not dissimilar from mine too. For instance, in introductory economics:

Large language models can now go much further, acting as tireless, patient coaches that deliver truly adaptive “batting practice.” The AI begins with simple exercises and progressively escalates in difficulty, adjusting in real time to the student’s performance. This is exactly the repetitive, low-stakes practice every introductory economics student needs to build intuition. Going forward, I expect that we will see a growing number of economics educators introducing specialized AI economics tools...

And that is exactly what I have done in my ECONS101 and ECONS102 classes this year. Both classes had AI tutors that were pre-trained with a knowledge base of the lecture material, and students could chat with the tutors, ask them questions, develop study guides, practice multiple choice questions, and probably a dozen other use cases I haven't considered. The flexibility of these AI tutors, both for myself and for the students, made them a huge contributor to students' learning this year (at least, that's what students said in their course evaluations at the end of each paper).

Unfortunately, Kahn's prescription for changes at higher levels of the economics major are much weaker. For instance, for intermediate microeconomics he advocates for making use of short skills videos, then:

AI will help here. Students can take the written transcripts from these video presentations and feed these to AI and ask for more examples to make it more intuitive for them. Students can explain their logic to AI and allow the AI to patiently tutor them. Students can ask the robot to generate likely exam questions for them to practice on.

That isn't much of an advance on what he advocates at the introductory level, because it is still simply content plus discussion with an AI tutor. I think there is much more potential value at the intermediate level of getting students to engage in more back-and-forth exploratory discussions with generative AI, and making those discussions a small part of the assessment. That works in theory-based courses (intermediate microeconomics) and econometrics. Kahn could have thought deeper here about the possibilities. However, for intermediate macroeconomics, I really like this suggestion:

AI tools make it possible to immerse students in the real-time decisions faced by figures such as Ben Bernanke in 2008. What information was available at each moment? What nightmare scenarios kept policymakers awake? Interactive simulations can let students experience economic policymaking “on the fly,” combining partial scientific knowledge with radical uncertainty. Such exercises tend to be far more memorable and engaging than static diagrams.

Some 'scripted' AI tools, built on top of ChatGPT (like my AI tutors are) would be wonderful tools for simulation. The AI could be instructed to maintain certain relationships through the simulation, introduce particular shocks, and help the students to evaluate different monetary and fiscal policy responses (or, evaluate the impact of fiscal policy changes). This would be a much more tailored approach than the simulation modelling that Brian Silverstone used when I studied intermediate macroeconomics some twenty years ago. Kahn also has great suggestions for field classes:

Professors teaching field classes often assign a textbook. Such a textbook offers both the professor and the students a linear progression structure but this teaching approach can feel dated as the professor delegates the course structure to a stranger who does not have experience teaching at that specific university. Textbooks are not often updated and the material (such as specific box examples) can quickly feel dated. AI addresses this staleness challenge...

In recent months, I have experimented with loading many interesting readings to a shared Google LM Notebook website and encouraging my students to ask the AI for summaries about these writings and to ask their own questions...

This year, I'll be teaching graduate development economics, for the first time in about a decade, and Kahn has pre-empted almost exactly the approach I was intending to adopt, with students engaged in conversation with a generative AI model (I wasn't sure if I would use NotebookLM or ChatGPT for this purpose), then expanding on that conversation within class. I'm also considering the feasibility of getting students in that class to work with generative AI on a short research project - collating and analysing data to answer some particular research questions, or to replicate some specific study. The paper is in the B Trimester, so I still have time to flesh out the details.

Kahn then goes on to discuss the impacts of generative AI on research assistant and teaching assistant opportunities. I think he is a bit too pessimistic though, since he concludes that human research assistants will only be useful for developing new (spatial) datasets. I think there are many more use cases for human research assistants still, and not just for data collection or data cleaning. Finally, Kahn addresses information asymmetry, noting that:

For far too long, students have been choosing majors in the dark—picking “prestigious” fields without really knowing what the degree will do for them, while universities have been able to hide behind vague reputations and opaque classrooms. Parents write enormous checks with almost no idea what they’re buying, employers wonder if the diploma still means anything, and everyone quietly suspects a lot of the game is just expensive signaling.

AI changes that. Cheap, frequent, AI-proctored assessments and virtual tutors suddenly make effort and mastery visible in real time. Professors discover whether students are actually learning the material. Parents can peek at meaningful progress dashboards instead of just getting billing statements. Employers can ask for verifiable records of real skills instead of trusting a transcript that could have been gamed.

I'm not sold on AI proctoring as a solution. In fact, I worry that it will simply lead to an 'arms race' of student AI tools vs. faculty AI tools. The advent of AI avatars and agentic AI simply makes this even more likely across a wider range of assessment types. However, I do agree with Kahn that a lot of education is signalling to employers, and that generative AI is going to change the dynamics of education away from signalling. Kahn seems to think that is a good thing. I worry the opposite! Without signalling, it is difficult for good students to distinguish themselves, and that limits the value proposition of higher education. Kahn wants "verifiable records of real skills instead of... a transcript that could have been gamed". However, generative AI makes it much easier for students to game the record of real skills, rendering those records less reliable.

There isn't a consensus on the best path forward. Kahn's paper is a work in progress, and he is inviting others to share their thoughts. I have offered a few of mine in this post, and I look forward to sharing more of my explorations of generative AI in teaching as we go through this year.

[HT: Marginal Revolution]

Read more:

Sunday, 11 January 2026

Book review: The Big Con

Many of my students go into the consulting industry when they graduate. Most go to one of the 'Big Four' (PWC, EY, Deloitte, KPMG). I've only had a couple that I know have gone to McKinsey, and none to Boston Consulting or Bain (the 'Big Three'). So, I was interested to read what Mariana Mazzucato and Rosie Collington would have to say in their 2023 book The Big Con. The thesis of the book is simple, as they explain in the introduction:

This book shows why the growth in consulting contracts, the business model of big consultancies, the underlying conflicts of interest and the lack of transparency matter hugely. The consulting industry today is not merely a helping hand; its advice and actions are not purely technical and neutral, facilitating a more effective functioning of society and reducing the "transaction costs" of clients. It enables the actualisation of a particular view of the economy that has created dysfunctions in government and business around the world.

The book uses a large number of real-world stories of 'consultancy firms gone wrong', stitching them together into a narrative of how the consulting industry makes us worse off. Many of the individual stories will not be unknown to those who regularly keep up with business and politics. What Mazzucato and Collington do well is track the rise of the consulting industry over time, and how it has become endemic across the public sector in particular. They use far fewer examples from the private sector, but I don't doubt that many of the issues that governments face also apply to private sector firms, but just do not have the same societal impacts. Through the development of the consulting industry, Mazzucato and Collington unpack the industry incentives at play, the interconnections between consulting, business, and government, and the conflicts of interest that result. Finally, they outline the consequential impacts on state capacity. In particular:

The more governments and businesses outsource, the less they know how to do, causing organizations to become hollowed out, stuck in time and unable to evolve. With consultants involved at every turn, there is often very little "learning-by-doing." Consultancies' clients become "infantilised"... A government department that contracts out all the services it is responsible for providing may be able to reduce costs in the short Term, but it will eventually cost it more due to the loss in knowledge about how to deliver those services, and thus how to adapt the collection of capabilities within its department to meet citizens' changing needs.

What is missing from the discussion of problems is an evaluation of just how costly the loss of capability in the public service is. Governments are focused on cost savings, and there are short term cost savings. But how large are the long-term losses that result from the loss in the ability to monitor and evaluate contracts (as one example)? This would have given the arguments in the book more weight than the few case studies that Mazzucato and Collington use.

Moreover, while the explanation of the problem and the examples used to illustrate it are good, the solutions proposed are underdeveloped and somewhat banal. For example, while "a new vision, narrative and mission for the civil service" is a shout-out to Mazzucato's previous book Mission Economy (which I reviewed here), the book fails to provide a coherent pathway to extricate the public sector from the grip of consultants. I imagine that, faced with the need to develop a new vision, narrative, and mission, the first thing that many government departments would do is to contract a consultancy to assist with that need. Mazzucato and Collington don't offer a way of avoiding that outcome. Their second solution, of investing in internal capacity and capability creation, is likely to be important. But again, it requires the public service to disentangle itself from the consulting industry, and the ways that can be achieved are not explained. Third, embedding learning into contract evaluations is important, but it relies on other factors that are not addressed, such as the ability of the public sector to retain talent. Finally, mandating transparency and disclosure of conflicts of interest should almost go without saying, but it is good that Mazzucato and Collington say it.

Overall, I enjoyed reading this book. It's a couple of years old now, but the examples are still highly relevant, and the consulting industry's tentacles are still firmly wrapped around the body of government in many (most?) countries. Mazzucato and Collington have highlighted the problem, and shone some light on potential solutions. What we need now is a strong public sector leadership, backed by government, that is willing to rebuild capacity and capability in sensible ways. Hopefully, this book is one step on that journey (and apologies to my future students if the consulting industry becomes smaller as a result!).

Friday, 9 January 2026

This week in research #108

Here's what caught my eye in research over the past week:

  • Biehl, Neto, and Gomez (with ungated earlier version here) find that, in Florida, the opening of a Dollar General store drives some nearby firms out of business but provides positive spillovers in terms of revenues and employment for the firms that survive
  • Butler, Butler, and Singleton (open access) find that referees add substantially more time in the second half than the first half of games at the FIFA World Cup and UEFA European Championship, and that referees allow more stoppage time when the score is close in the second half, but only at the World Cup
  • Moyo and Gwatidzo (open access) find, using the synthetic control method, that although hosting the FIFA World Cup in 2010 had no positive effects on GDP in South Africa, it significantly increased tourism inflows
  • Botha and de New (open access) find that the apparent decrease in measured financial literacy in Australia between 2016 and 2020 was entirely a consequence of moving from in-person to telephone interviews
  • Sokolov and Libman (open access) conduct an online “beauty contest” experiment with a sample of academic economists in Russia, and find that economists who rely on theories assuming common knowledge of rationality do not expect more rational behaviour from their colleagues (so, even economists who believe in rationality don't believe that other economists are rational)

Thursday, 8 January 2026

'First in family' as a measure of disadvantage in higher education

In higher education policy circles, it is an article of faith that students who are the first in their family (usually in the sense that neither their parents, nor any older siblings, has already studied at university) appear to be at higher risk of being unsuccessful in university education. The rationale links to Bourdieu's concept of social capital - students being able to tap into who they know (their family) and importantly what their family knows (about university study) matters. Family members with past university experience can help with university-specific knowledge - things like how to choose majors, manage workload, seek extensions, interpret feedback, and navigate various university systems and processes. This all makes the challenges of studying at university a little easier.

So, I was surprised to learn from this 2020 article by Anna Adamecz-Völgyi, Morag Henderson, and Nikki Shure (all University College London), published in the journal Economics of Education Review (ungated earlier version here), that there is actually limited empirical evidence supporting first-in-family as an indicator of disadvantage. It is that empirical gap that Adamecz-Völgyi et al. attempt to fill, but the interesting thing about this paper is not so much that they find support for first-in-family as a measure of disadvantage, but the mechanism through which it works.

Adamecz-Völgyi et al.  use data from 7707 students from the Next Steps (formerly the Longitudinal Study of Young People in England, LSYPE), which followed a cohort of young people born in 1989/1990. The 'age 25' wave of that study captures most of the cohort after they have completed university education. Adamecz-Völgyi et al. look at various measures of disadvantage, and how well they predict students participating in, or graduating from, higher education. Aside from first-in-family, their battery of disadvantage measures (which they refer to as Widening Participation (WP) measures) includes whether the student had special education needs at high school, whether they were eligible for free school meals, whether their parents were of low social class (based on occupation), whether their family was part of the 20 percent most deprived families (based on a measure of deprivation), whether they had care responsibilities while at high school, whether they were non-white ethnicity, whether they have a disability, whether they lived in a single-parent household, whether they had ever been in care, and whether they lived in an area of high socioeconomic deprivation.

Adamecz-Völgyi et al. use a few different methods to establish whether first-in-family (which they refer to as 'potential FiF', because they only have data on parental education, and not the education of older siblings) is a good predictor of disadvantage (in terms of participating in, or completing) in higher education, including: (1) comparing the predictive power of each variable in separate models (compared using the 'Area under the Receiver Operating Characteristic' curve (AUC)); (2) looking at whether adding first-in-family to a model that already includes a parsimonious set of other measures of disadvantage improves predictions; and (3) using a 'random forest' model to rank the predictors in terms of importance. The AUC is a measure of how often the model correctly predicts a binary variable (in this case, whether a student enrols/does not enrol in university, or whether they do/do not complete university). The random forest model identifies which variables are the most important by running many regressions with different selections of variables. In their analyses, Adamecz-Völgyi et al. find that:

When we compare potential FiF to other WP indicators, it emerges as the most important measure until we condition on prior attainment and all measures end up similarly predictive. We provide evidence that the effects of family background manifest in educational attainment at an early age and pre-university educational attainment is the most important channel of the effect of parental education on HE participation and graduation.

So, this research supports the common belief that first-in-family is a good measure of disadvantage. Moreover, it shows that first-in-family picks up some dimension of disadvantage that other common measures do not. However, the mechanism through which first-in-family affects higher education participation and success is almost entirely through the students' success in pre-university education. Students who are first-in-family at university tend to have worse performance in high school, and that largely accounts for their lower performance in university. Adamecz-Völgyi et al. conclude that:

...being potential FiF (and having social and economic disadvantages in general) matters all along the production function of a child's human capital from early childhood to university. Thus, the educational achievement measures that a university can use are contaminated by this pre-existing disadvantage carried along since early childhood (or probably, since birth). They do not reflect the child's true capacity, but rather the interaction of their innate abilities and family circumstances. Thus, WP measures that simply favour the disadvantaged student out of two students having the same level of pre-university attainment are not enough to widen participation: on average, those from disadvantaged backgrounds are not going have the same pre-university educational attainment levels than those from advantaged backgrounds. The attainment gap must be addressed explicitly by CA [Contextual Admissions] measures.

Instead, I see two ways that universities may respond to these results. Conditional on pre-university educational attainment, first-in-family students do not have worse higher education outcomes than other similarly-prepared students. The reason first-in-family students do less well on average is that they tend to enter university with lower prior educational attainment, and it’s that pre-university gap that accounts for most of the observed difference. On one hand, as Adamecz-Völgyi et al. argue, first-in-family is an indicator of disadvantage, and from a social justice perspective universities should try to mitigate sources of disadvantage whenever they are apparent. On the other hand, these results could be read as suggesting that universities shouldn't worry about first-in-family students, because they perform as well as otherwise similarly-prepared students. The problem is the lack of pre-university educational attainment, and that needs to be addressed in pre-university education, not at university. University-level support may help at the margin, but it risks being an ambulance at the bottom of the cliff. Moreover, two otherwise similar students in terms of pre-university educational attainment could be treated very differently under targeted support policies (such as Contextual Admissions) when one is first-in-family and the other is not, raising issues of fairness.

I'm not going to take a stand on which of those two perspectives (social justice or fairness) is more important. They both have merit. If you accept first-in-family as a measure of disadvantage, the actionable question is whether universities can cost-effectively close preparedness gaps after entry, or whether they should rely on advocating for changes in pre-university education. At least, this research can provide us with confidence that first-in-family is indeed a suitable measure of disadvantage in higher education.

Wednesday, 7 January 2026

Lessons from Joshua Gans on AI for economics research

I've been increasingly using generative AI (specifically, ChatGPT) to assist with research. I've been quite cautious though, worrying a lot about the quality of AI output, although my worries reduced substantially once ChatGPT started linking to its sources. However, I know many other researchers are using generative AI far more extensively and directly in their research than I am. My approach continues to be to use ChatGPT as an enthusiastic, but not fully polished, research assistant. Given my experience so far, I was interested to read the reflections of Joshua Gans on his year of using generative AI for economics research. His approach was:

I had lots of ideas for papers that I hadn’t developed, so I decided to spend the year working my way down the list. I would also add new ideas as they came to me. My proposed workflow was all about speed. Get papers done and out the door as quickly as possible, where a paper would only be released if I decided I was “satisfied” with the output. So it cut any peer reviews or discussions out during the process of generating research quickly, but I would send those papers to journals for validation. If I produced a paper that I didn’t think could be published (or shouldn’t be), then I would discard it. There were many such papers.

Like Gans, I have a lot of research ideas, and not enough time to pursue all of them. Many of my ideas would go nowhere, even if I did have time to pursue them. But for some others, I have later read research papers that have done something I had thought of earlier but not had time to do myself. There are therefore a lot of missed opportunities, because it isn't possible to perfectly identify the good ideas in advance - you need to try them out before you realise that they are uninteresting or dead ends. Being able to try out more ideas seems like a good thing.

The opportunity cost of spending time pursuing one research idea is not pursuing other ideas. If generative AI allows us to pursue research ideas in less time, then it lowers the opportunity cost of pursuing those ideas. However, as Gans notes:

When you lower the cost of doing something, you do more of it. Normally, the decision whether to continue or abandon a project gives rise to some introspection (or rationalisation) of whether continuing is worthwhile relative to the costs. When the going gets tough, you drop ideas that don’t look as great.

The issue with an AI-first approach is that its benefit, reducing the toughness of going, is also its weak point; you don’t face those decision points of continuing/abandoning as often. That means that you are more likely to end up completing a project. But this lack of decision points means that you end up pursuing more lower-quality ideas to fruition than you would otherwise.

In Gans's experience, AI increases research output, but it also weakens the stopping rule that would otherwise kill bad ideas early, by decreasing the marginal cost of continuing the research. When the marginal cost of continuing is lower, we spend more time on each bad idea before discarding it. And if we spend too long on bad ideas, the opportunity cost of pursuing an idea increases - time spent working on bad ideas is time not spent pursuing ideas that turn out to be better. As a result, the average quality of our research may decrease. That risk needs more careful consideration.

Although he doesn't note the potential increasing opportunity cost, Gans's conclusion seems to point in that direction:

My point is that the experiment — can we do research at high speed without much human input — was a failure. And it wasn’t just a failure because LLMs aren’t yet good enough. I think that even if LLMs improve greatly, the human taste or judgment in research is still incredibly important, and I saw nothing over the course of the year to suggest that LLMs were able to encroach on that advantage. They could be of great help and certainly make research a ton more fun, but there is something in the judgment that comes from research experience, the judgment of my peers and the importance of letting research gestate that seems more immutable to me than ever.

Generative AI has the potential to increase the quality, and the quantity, of research. Gans seems to have seen it in his work, and I've seen it already in my own work too. In fact, my experience so far has been that careful use of generative AI (for example, checking for literature gaps, or exploring econometric methods and robustness checks) has reduced the time wasted on fruitless research that would have gone nowhere. However, it is possible to use too much generative AI in research, just as it is possible to use too little. There is a middle ground, and Gans seems to be finding it from one direction (starting from over-using generative AI), and maybe I am finding it from the other (starting from under-using generative AI). The important thing seems to be ensuring that a human is kept in the loop (as Ethan Mollick noted in his book Co-Intelligence, which I reviewed here). Specifically, we can use generative AI for its strengths (in testing our initial ideas, mapping the literature, exploring alternative methods, or stress-testing assumptions). And we can keep the human in the loop by pausing the research at more key points to consider where it has gotten to and check our intuition, as well as continuing to seek peer review of the draft end-product.

So, Gans might be holding back on the generative AI this year, but I'll be further expanding my use. Starting with something related to this post: writing up some research on using AI tutors in teaching first-year economics, which is research I presented in a brown-bag seminar at Waikato last month (and I will have more on that in a future post).

[HT: Marginal Revolution]

Tuesday, 6 January 2026

Try this: The Opportunity Atlas

It's hard to believe that, in over twelve years of blogging, I have never blogged about any of Raj Chetty's research. That's not because I haven't read it. If anything, it's because it is so detailed that it defies a short blog take. For example, we read two related papers published in the journal Nature (here and here, both open access) in the Waikato Economics Discussion Group back in 2022. Ordinarily, I would follow up with a blog post, but they are so in-depth that I couldn't find the time to summarise them effectively [*]. Three years later, they are sitting in a virtual pile of read-but-not-yet-blogged-about papers [**].

Anyway, Chetty and co-authors have suddenly made it much easier for me to summarise their extensive research on social mobility in the US. That's because you can see the data in action for yourself now, at The Opportunity Atlas. This very cool online tool allows you to see social mobility in action. Social mobility is effectively how much a child's socioeconomic position in adulthood depends on their socioeconomic position when they were growing up.

On The Opportunity Atlas, you can choose from a range of outcomes in adulthood, and see where the mean outcome is, if they grew up in a household at different rankings of parental income (1st, 25th, 50th, 75th, or 100th percentile). You can also look separately by gender and by race (Black, White, Hispanic, Asian, Native American). The interface is quite intuitive to use. For example, here's the basic map of expected (mean) income at age 35, for children who grew up in households at the 25th percentile of parental income:

The red areas, such as the South, have lower social mobility, because children who grew up there in households at the 25th percentile have lower incomes as adults. In contrast, the blue areas (in the north and west) have higher social mobility, because children who grew up there in households at the same 25th percentile have higher incomes as adults.

The tool is very flexible. It's very easy to switch to looking at other outcome variables, and for other percentiles of parental income, as well as zooming in on particular areas. For example, here's the teenage birth rate for Black women who grew up in households at the lowest (1st) percentile of parental income in Los Angeles:

The greyed-out Census tracts are those where there are too few Black women who grew up in the lowest income households for the data to be reported. However, the map shows a band of high teenage birth rates for mothers who grew up in the lowest income households, that stretches from South Central to Compton.

Importantly, the underlying data can be downloaded from the Opportunity Insights website. The cool thing about the data underlying the Atlas is that it is based on the census tract where the child grew up, not the census tract where they live as an adult. That means that the Atlas is showing you the adult outcomes for children who grew up in a particular area, not the adult outcomes of adults who live there today. That is explained in this new article by Chetty (Harvard University) and co-authors, published in the journal American Economic Review (ungated earlier version here).

That article outlines the methods underlying the dataset. In short:

...we use de-identified data from the 2000 and 2010 decennial censuses linked to data from federal income tax returns and the 2005–2015 American Community Surveys to obtain information on children’s outcomes in adulthood and their parents’ characteristics. We focus in our baseline analysis on children in the 1978–1983 birth cohorts who were born in the United States or are authorized immigrants who came to the United States in childhood...

We construct tract-level estimates of children’s incomes in adulthood and other outcomes, such as incarceration rates and teenage birth rates by race, gender, and parents’ household income level—the three dimensions on which we find children’s outcomes vary the most. We assign children to locations in proportion to the amount of their childhood they spent growing up in each census tract. In each tract-by-gender-by-race cell, we estimate the conditional expectation of children’s outcomes given their parents’ household income using a univariate regression whose functional form is chosen based on estimates at the national level to capture potential nonlinearities.

Chetty et al. then go on to show why it matters that we look at social mobility based on the place where children grew up, rather than contemporary poverty rates or adult outcomes, and finally give some short use cases for the dataset. I won't go into detail on those (you should read the paper), but one of the things that Chetty et al. do show is that because the effects change slowly over time, looking at outcomes today for children who grew up in a particular census tract in the 1980s still provides meaningful information that can be used for targeting social programmes today.

It's important to note that the Opportunity Atlas by itself doesn't show us causal estimates of adult outcomes. However, Chetty et al. establish how much of the effect is causal using a couple of different methods: (1) using data from the Moving to Opportunity experiment; and (2) a quasi-experiment that looks at how the effects differ depending on how many years a child was 'exposed' to a particular Census tract). Both methods both imply that roughly 62%) of the observed variation across census tracts reflects causal neighbourhood exposure effects, not just higher-opportunity families sorting into better places.

In the conclusion, Chetty et al. highlight a number of applications where the Opportunity Atlas data has already been used:

For researchers, the Opportunity Atlas data provide a new tool to study the determinants of economic opportunity. For example, recent studies have used the Opportunity Atlas data to analyze the effects of lead exposure, pollution, neighborhood redlining, and the Great Migration on children’s long-term outcomes (Manduca and Sampson 2019; Colmer, Voorheis, and Williams 2019; Park and Quercia 2020; Aaronson, Hartley, and Mazumder 2021; Derenoncourt 2022). Other studies use the Atlas statistics as inputs into models of residential sorting (Aliprantis, Carroll, and Young 2024; Davis, Gregory, and Hartley 2019) and to understand perceptions of inequality (Ludwig and Kraus 2019). The ongoing American Voices Project (https://americanvoicesproject.org/) is interviewing families in neighborhoods with particularly low or high levels of upward mobility to uncover new mechanisms from a qualitative lens.

I can see a number of use cases for this as well. For instance, there is probably a lot of value in using the Opportunity Atlas data alongside the data on racial diversity and segregation from the Mixed Metro project (which also offers data down to the Census tract level). Also related is this from a footnote in the Chetty et al. paper:

Understanding how neighborhood effects change with the composition of the neighborhood is an important question that warrants further work...

This also makes me think (again) that we need more detailed work on social mobility in New Zealand, building on the work of my colleagues Niyi Alimi and Dave Maré (see here). One of the amazing things about Chetty's research is that it is now looking at the neighbourhood (Census tract) level, and that sort of spatial disaggregation offers a lot of opportunity for detailed follow-up research and policy action. And with StatsNZ's Integrated Data Infrastructure, we have the basic framework necessary to do this sort of work in New Zealand as well. We could use that to build our own Opportunity Atlas for New Zealand.

*****

[*] So, in lieu of a separate blog post, here's the short summary of those two papers. In the first paper, Chetty et al. use billions of Facebook friendship links to measure local social capital, especially "economic connectedness" (cross-class friendships). They find that places with higher economic connectedness have much higher upward social mobility. In the second paper, the same group of authors show that cross-class friendship gaps come from both who people are exposed to (whether schools, neighbourhoods, or groups) and "friending bias" (less cross-class befriending even when exposed).

[**] In case you're wondering, there are currently 45 papers in that virtual pile, and it seems to be growing. I'm reading research faster than I'm blogging about it. I might have to start blogging about multiple papers in a single post to keep from falling further behind!