Monday, 30 June 2025

Strawberry and cream sandwiches are the new jaffa cakes

CityAM reported earlier this week:

M&S has jumped on the bandwagon of the viral Japanese strawberry sando by launching its take, taking the internet by storm, but the sweet treat has raised the old familiar question about VAT.

The British retailer’s strawberry and cream (half) sandwich, wrapped up in its meal-deal packaging, is all over social media as shoppers race to taste the sweet sandwiches. Most are intrigued to know if the bread is the standard meal-deal bread or if it is sweet, as the Japanese typically use milk bread in their original version.

However, as the hype builds up, accountants and lawyers have been lighting up LinkedIn, questioning whether this dessert sandwich may be classified as confectionery.

If a food product is deemed a confectionery, it will be liable for 20 per cent VAT, compared to zero-rated, which most sandwiches typically fall under.

The mystery surrounding VAT has long persisted, following the famous ‘Is it a biscuit or a cake’ debate from the Jaffa Cake legal battle with HMRC.

The question of whether a particular food item is of one type or another, because different types attract different levels of tax, is exactly the sort of debate that New Zealand avoids through our very simple GST regime. With only a couple of exceptions, which are fairly well delineated (like residential housing rents, and financial services), every good or service that is traded domestically attracts the exact same rate of GST. There is no quibbling about whether a strawberry and cream sandwich is a confectionary, or whether a jaffa cake is a biscuit or a cake.

It is this simplicity that would be lost if advocates for removing GST from 'healthy foods', or all unprocessed food, or any other set of favoured goods, got their way. We'd then be left arguing over whether cut pineapple is a fresh food, or a processed food, or whether fresh and frozen pizza should be treated differently, or whether the set of ingredients on the pizza made a difference, and so on. You laugh, but witness previous arguments in the UK about whether a flapjack is a muesli bar or a cake, or in the US about whether a snuggie is clothes or a blanket.

Thankfully, for now, the government doesn't appear to have an appetite for messing with our simple GST.

[HT: Marginal Revolution]

 Read more:

Saturday, 28 June 2025

Greg Mankiw on Modern Monetary Theory

Modern Monetary Theory (MMT) had a real moment in the spotlight in the late 2010s, with political support in the US from Presidential hopefuls Bernie Sanders and Alexandria Ocasio-Cortez. However, mainstream economists mostly didn't understand it, or ridiculed it, or both. I mostly ignored the detail of it, only picking up what I knew about it from the mainstream media. Some economists took it a little more seriously. At least seriously enough to look into it in more detail.

One example is this 2020 article by Gregory Mankiw (Harvard University), published in the AEA Papers and Proceedings (ungated earlier version here). Mankiw carefully explored a new textbook:

...simply titled Macroeconomics, written by three MMT proponents: William Mitchell and Martin Watts (both of the University of Newcastle, Australia) and L. Randall Wray (Bard College).

Mankiw then compares the MMT textbook treatment of macroeconomic issues with a more traditional approach (such as that found in Mankiw's own textbooks). Obviously, Mankiw has good reason to challenge other textbook treatments as they compete with his offering. So, we should be careful in over-interpreting Mankiw's views. Mankiw concludes that:

In the end, my study of MMT led me to find some common ground with its proponents without drawing all the radical inferences they do. I agree that the government can always print money to pay its bills. But that fact does not free the government from its intertemporal budget constraint. I agree that the economy normally operates with excess capacity, in the sense that the economy’s output often falls short of its optimum. But that conclusion does not mean that policymakers only rarely need to worry about inflationary pressures. I agree that, in a world of pervasive market power, government price setting might improve private price setting as a matter of economic theory. But that deduction does not imply that actual governments in actual economies can increase welfare by inserting themselves extensively in the price-setting process.

Put simply, MMT contains some kernels of truth, but its most novel policy prescriptions do not follow cogently from its premises.

Based on those conclusions, it is unlikely that MMT is going to have much impact on the economics mainstream. However, it is worth keeping Mankiw's views handy, because MMT is something that caught the public's (and politicians') attention, and even though it doesn't have the profile now that it did five or more years ago, it will likely remain a factor in public debate for some time.

Friday, 27 June 2025

This week in research #81

Here's what caught my eye in research over the past week:

  • Farnell et al. (open access) use data from Major League Baseball pitchers to study task switching, finding that task switching between pitching and batting, can improve subsequent pitching performance, shown by fastball velocity increasing by up to 0.225mph on average after reaching base
  • Reddy critiques the work of 2024 Nobel Prize winners Acemoglu, Johnson, and Robinson, arguing that their property-rights-based approach is excessively narrow, and that other factors, including the privileged relationship between settlers and their countries of origin, can both explain the divergence between settler colonies and other countries

Thursday, 26 June 2025

The challenges of farmer succession

This week Christchurch hosted the 2025 Primary Industries Summit. As this article in the New Zealand Herald noted, alongside the summit Rabobank released a new white paper on farmer succession, titled "Changing of the Guard". The white paper outlines the challenges that the farming community in New Zealand faces in preparing to transfer farms to the next generation of farmers, as well as highlighting the experiences of some farming families that have been relatively successful at managing succession. In his overview of the white paper, Rabobank CEO Todd Charteris wrote that:

Succession can be a highly emotive process and is becoming increasingly complex. The stakes are increasing as the value of farming assets continues to grow amid challenges around maintaining profitability in the face of geopolitical, regulatory and climatic hurdles...

New Rabobank data (February 2025) shows that only one-third (33%) of farmers have a formal succession plan. However, over the next 10 years, more than half of all New Zealand farm and orchard owners – around 17,320 farmers – will hit retirement age.

The scale of the challenge is clearly large. I made a research contribution to the white paper, along with my colleagues Frank Scrimgeour, Gemma Piercy-Cameron, and PhD student Kalpani Vidanagamage. Our role was to collate and summarise demographic, economic, and land use data from various sources, and to conduct some focus groups with young farmers to understand their experiences and aspirations related to farm succession. You can see our contributions to the white paper in the various quotes from young farmer focus groups early in the report, and in the data reported on pages 13-18.

There was far more detail in the data we had available than could fit into the white paper. Rabobank have held over some of the data, possibly to be used next year. However, we intend to write up our findings for a more academic audience in due course (and I'll blog on it in more detail at that stage).

To some extent, this project was a 'back to the future' moment for me. In 2010, I published an article (co-authored with Pat Barrett, Bill Cochrane, and Kellie McNeill) in the International Journal of Environmental, Cultural, Economic, and Social Sustainability (it is gated, but contact me for a pre-print version if you are interested) that covered very similar ground. What we found in 2010 is not dissimilar to what we found in 2025 - rural population decline, ageing rural and farmer populations, and succession challenges, were leading to the aggregation of farm holdings and the corporatisation of farms.

As the saying goes, the more things change, the more they stay the same.

Wednesday, 25 June 2025

Generative AI is mastering 'metrics

The capabilities of generative AI continue to grow. In the latest example, some enterprising economists have developed an agentic AI that can complete tasks using econometrics (the economist's statistical toolset) - 'mastering 'metrics', as Angrist and Pischke would say (see my review of their excellent econometrics text). The agentic AI approach is outlined in this new working paper by Qiang Chen (Shandong University) and co-authors. As they explain:

We propose and implement a zero-shot learning framework, called Econometrics AI Agent, that enables AI agents to acquire domain knowledge without costly LLM fine-tuning. The framework’s core component is an econometrics “tool library” implementing popular econometric methods, including IV-2SLS, DID, and RDD.

...we augment each econometric tool with detailed “prompts”—comprehensive method descriptions that specify inputs, hyperparameters, and outputs. These prompts are provided alongside corresponding Python implementations, creating a standardized interface between the econometric methods and the AI agent. This design allows the LLM to leverage both its general econometric knowledge and the specifically crafted prompts and tools, enabling it to conduct complex econometric analyses through multi-round interactions with users. The resulting framework empowers Econometrics AI Agent to independently handle applied econometric tasks, delivering comprehensive results that include parameter estimation, inference, and analytical discussions.

The Econometrics AI Agent that Chen et al. created is available here. That site also includes detailed installation instructions, and a helpful demonstration video. Coming back to the paper, Chen et al. show the capabilities of the model by testing it on several real-world problems:

We evaluate the Econometrics AI Agent through two sets of inquiries. The first comprises 18 exercises from the coursework assignments of a doctoral-level course titled “Applied Econometrics” at the University of Hong Kong, with Python-generated standard solutions. These exercises cover OLS & PanelOLS regression, propensity score matching, IV-2SLS regression, Difference-in-Differences (DID) analysis, and Regression Discontinuity Design. The second set consists of test datasets from randomly selected seminal articles in reputable journals, primarily accompanied by Stata-based replication packages.

They compare the performance of their agent, in terms of creating code that works correctly, and in terms of the resulting estimated coefficient of interest, in comparison to three alternatives:

...(i) direct LLM generation in Python code, (ii) direct LLM generation in Stata code, and (iii) baseline general-purpose AI agents without specialized econometric tools and domain knowledge.

So, this is an approach to replication, which is important (for example, see here), and is a point that I will return to later. Overall, in comparison to LLMs and general-purpose AI agents, the Econometrics AI Agent performs much better. In terms of the econometrics coursework assignments, Chen et al. find that:

The Econometrics AI Agent demonstrates superior performance with a 95% directional replication rate and average coefficient value errors below 3%. In contrast, both GPT-generated Python and Stata control groups show incorrect directions in over half of test cases. While the general AI Agent achieves a 78% directional replication rate, its coefficient values frequently deviate significantly from true values.

The rate of 'perfect replication' (which Chen et al. defined as the errors in the coefficient, standard error, and p-value all within 1% of the 'true' value) was 51.85 percent for the Econometrics AI Agent, but less than 30 percent for the other models. Turning to the published paper replications, Chen et al. find that the rate of 'perfect replication' was just 27.41 percent for the Econometrics AI Agent, but that was still far higher than the other models, which all had rates under 18 percent. In relation to those results, Chen et al. note that:

...the Econometrics AI Agent does show room for improvement. For example, its performance declines for complex econometric methods like DID and RDD compared to simpler approaches such as OLS and IV-2SLS. Similarly, results slightly deteriorate when moving from straightforward coursework problems to more sophisticated paper replication tasks. However, these limitations can be addressed through the AI agent’s domain knowledge architecture—specifically by developing customized tools and enhancing prompt instructions to better support complex algorithms and detailed requirements.

Indeed, it is the modular nature of the agent's architecture that may be its key advantage, allowing modules relevant to each econometric task to be added or updated over time. On this point, Chen et al. note that:

Unlike the costly and often infeasible process of fine-tuning an LLM to keep pace with rapid academic advances in developing new techniques, our agent can be updated simply by adding new tool functions and descriptions to the prompt library. This modularity allows the agent’s knowledge base to expand alongside the field’s developments, making the integration of recently published procedures as straightforward as adding new modules.

So, we can expect that the agent's capabilities, and its accuracy, will only improve over time. We may not be very far away from a time when empirical economists spend far less of their time on coding esoteric econometric code in order to extract meaningful results. What will we do with our free time? Maybe we'll be able to turn our attention to a greater variety of research questions.

There is a further positive aspect to these results. The replication crisis is real in many disciplines, including economics. Having AI agents that can automate the steps required to generate econometric results will decrease the time cost of completing replications. That means that we may expect more paper replications in the future (at least, more of the type of replication that Miguel and Christensen call a 'verification'). This move will certainly be a positive, leading to improvements in the quality of research in the future.

[HT: Marginal Revolution]

Read more:

Monday, 23 June 2025

Swiss workers are worried about the risk of automation

How worried are you about the risk of automation? Automation and generative AI might be coming for your job. Despite some good news, people seem more likely to take account of pessimistic views than more optimistic views (see here) about their future job prospects, contributing to their worries.

Given the perceived risk of automation, would workers be willing to forego some of their income in exchange for greater job security? That is the question answered in this new article by Maria Cattaneo (Swiss Coordination Centre for Research in Education), Christian Gschwendt, and Stefan Wolter (both University of Bern), published in the Journal of Economic Behavior and Organization (open access). They conducted a discrete choice experiment (DCE), where research participants were asked to repeatedly choose between different options for a future 'hypothetical' child. Each option had different characteristics, two of which were the income and the risk of automation, with the latter specified as:

...“the probability that the job could be completely replaced by digital technologies such as robots or artificial intelligence within 10 years.”

The advantage of a DCE is it enables Cattaneo et al. to estimate the willingness to pay for lower risk of automation - a measure of how much income workers would be willing to forego for greater job security. Based on their representative survey of nearly 6000 Swiss adults, they find that:

...respondents would be indifferent between choosing a job with a given automation risk and a job with a 10 percentage points higher risk if the increase in yearly wages is CHF 15,333... Given that the median yearly income of an employee was around CHF 88,000 in 2022... this would mean that people are willing to pay around 17 % of the median yearly income for a 10 percentage point reduction in the automation risk...

That is a substantial willingness to pay (WTP) for lower risk from automation. In comparison:

The absolute and relative level of this value is also reflected, for example, in the fact that the WTP is more than ten times higher than respondents would be prepared to pay for a top management position compared to a low hierarchical position.

For some additional context, the baseline automation risk for 'Hospitality, Retail and Other Services Managers' is 10 percent, for 'Drivers and Mobile Plant Operators' is 40 percent, and for 'Business and Administration Professionals' is 100 percent. Sadly, Cattaneo et al. don't evaluate the willingness to pay based on different occupations - that might be a useful exercise for follow-up research, as the willingness to pay is unlikely to be linear in baseline risk. Indeed, they report that:

...the WTP to reduce job automation risk by 10 percentage points is 1.6 times higher when moving within the higher automation risk range than within the lower range. This suggests that individuals perceive greater benefits from mitigating higher baseline automation risks or, conversely, value risk reductions less at lower baseline risks.

It makes sense that those in jobs that are most at risk of automation should be differentially willing to pay to reduce that risk than those in jobs that are at lower risk. Looking at other characteristics of the sample, Cattaneo et al. find that:

Male respondents show a mean WTP that is CHF 686 lower than that of female respondents, which implies that a 10-percentage-point reduction in the automation risk is valued higher by women than men. Concerning age, people aged 50 and older need CHF 2102 more to counteract an increase in automation risk compared to people younger than 35.

In terms of the latter, Cattaneo et al. suggest that:

The reason behind this may be that older people have more labor market experience and have already lived through one or more technological and employment changes in labor market conditions, potentially including periods of unemployment. As a result, they may be better positioned to understand or predict changes in labor market dynamics, while younger respondents, who are more familiar with new technologies, feel more secure in coping with these changes.

I wonder whether older research participants might have children of their own, making questions about 'hypothetical' children more salient for them, because they are thinking about their real child, rather than a hypothetical child. That would likely make them more willing to pay to reduce automation risk.

Finally, Cattaneo et al. find that:

...respondents with tertiary education have a lower WTP for lower exposure to automation than those with a secondary or below secondary degree.

That result is surprising, given that future automation is more likely to affect better-educated workers than previous job changes due to automation. Perhaps the more educated workers are more likely to have used generative AI themselves, and see the potential benefits alongside the risks? Indeed, Cattaneo et al. suggest that more educated workers:

...may be more optimistic that advancing automation will generate new tasks and opportunities for which they feel well-prepared, given their skills and education, contributing to their lower WTP to reduce automation risk. Consistent with this, our survey data shows that the adoption of generative AI—a technology that automates a wide range of new work tasks—increases with education level, possibly indicating that higher-educated respondents are already finding ways to leverage this emerging technology to their advantage...

Overall, it is clear that there is a high degree of worry about automation risk among Swiss workers. Periods of change and uncertainty are particularly worrying, and we are certainly in such a period. Automation will lead to considerable changes in job tasks, while there remains a high degree of uncertainty about the contribution of generative AI in particular, and automation more generally, to future labour market outcomes. To reduce our worries, we need to keep in mind that past periods of substantial technological change and uncertainty in the labour market have actually led eventually to better outcomes overall.

Sunday, 22 June 2025

What the persuasiveness of economics experts and signalling tell us about the gender gap in economics

Are the opinions of expert economists persuasive? The answer to that question will no doubt depend on who you ask. And at least some would answer 'yes, but I wish they weren't'. Does the gender of the economist matter for how persuasive their opinion is? That is a much more difficult question to answer. However, this recent article by Hans Sievertsen and Sarah Smith (both University of Bristol), published in the Journal of Economic Behavior and Organization (open access, but see this paywalled FT article as well), provides us with a starting point. Sievertsen and Smith:

...use an information provision experiment... to test whether the opinion about a topical policy issue expressed by a senior female economist is more, or less, persuasive than the same opinion expressed by a senior male economist. We run the same experiment twice – the first time, members of the public are shown credentials of expertise, the second time they are not.

Specifically, Sievertsen and Smith survey people in the US, asking their opinion on a range of issues (rated on a five-point scale from 'strongly disagree' to 'strongly agree'). Alongside a statement of the issue, each research participant was provided with the opinion of an economist, drawn from the US Economic Experts Panel run out of the University of Chicago. By comparing the results from the survey with those of an earlier survey of people who didn't get to see an economist's opinion, the persuasiveness of the economist whose opinion is provided (on each issue) can be evaluated.

The ten issues are based on the following statements:

1. Use of artificial intelligence over the next ten years will lead to a substantial increase in the growth rates of real per capita income in the US and Western Europe over the subsequent two decades.

2. There needs to be more government regulation around Twitter’s content moderation and personal data protection.

3. It would serve the US economy well to make it unlawful for companies with revenues over $1 billion to offer goods or services for sale at an excessive price during an exceptional market shock. (Price Gouging)

4. Efforts to achieve the goal of reaching net-zero emissions of greenhouse gases by 2050 will be a major drag on global economic growth.

5. Given the centrality of semiconductors to the manufacturing of many products, securing reliable supplies should be a key strategic objective of national policy.

6. A significant factor behind today’s higher US inflation is dominant corporations in uncompetitive markets taking advantage of their market power to raise prices. (Greedflation)

7. Financial regulators in the US and Europe lack the tools and authority to deter runs on banks by uninsured depositors.

8. When economic policy-makers are unable to commit credibly in advance to a specific decision rule, they will often follow a poor policy trajectory.

9. A windfall tax on the profits of large oil companies ‚ with the revenue rebated to households‚ would provide an efficient means to protect the average US household.

10. A ban on advertising junk foods (those that are high in sugar, salt, and fat) would be an effective policy to reduce child obesity.

The economists' actual views on each of the statements is not important for the research question (although the article does tell you, and you can probably guess for some of them what the average opinion of the economists is). Sievertsen and Smith first evaluate how persuasive economists are in general, finding that:

On average, a one-point change on the Likert scale in expert opinion is associated with a 0.17 point change in public opinion... Expert opinions have no effect on public opinions about Greedflation, while there are stronger effects for Price Gouging, Financial Regulation and Economic Policy. There is some support for the argument... that persuaders are more effective when receivers are less certain: The degree of persuasiveness is weaker on issues where baseline public opinion is more certain... The degree of persuasiveness is also stronger on issues where there is less distance between sub-panel expert opinion and baseline public opinion... this suggests that experts may be perceived as less credible when their views are further out of line with those of the general public.

So, the opinions of economic experts are convincing (somewhat). The last point is interesting though - people are most convinced when the economist's views are similar to those of the general public. This suggests some confirmation bias - people believe the experts more when the experts agree with them! It is also interesting who is persuaded most:

The degree of persuasiveness is greater for men [p = 0.002] and for non-whites [p = 0.000]. It is also greater for those with a degree [p = 0.000] and for those with higher self-reported economics knowledge [p = 0.000]. Those who identify as Republicans are also more persuaded by economists’ opinions than Democrats/Independents [p = 0.030].

Those who are more educated, and who claim to have more economics knowledge, are more persuaded by expert economists. More educated people likely give more credence to the views of other educated people, while those who claim to know more economics are more likely to modify their views to fit with those of economics experts.

What about gender differences in persuasion? Sievertsen and Smith switch the analysis to evaluating whether the opinion of the research participants exactly matches that of the expert whose opinion they are provided, and find that:

...members of the public are 1.1 percentage points more likely to match with the opinion of a female expert than with the same opinion expressed by a male economist.

Sievertsen and Smith find similar results when evaluating the distance between the opinion of the research participant and the expert (the distance is smaller for female experts). The effect (1.1 percentage points, against a match rate for male experts of 33.5 percent) seems quite small, but Sievertsen and Smith note that some matches would happen purely by chance, and after accounting for that:

...the degree of persuasiveness of female expert opinions is around 20 per cent higher {than male experts]...

Why are female economics experts so persuasive? Here's where things get interesting. Sievertsen and Smith run their survey a second time. The first time around, research participants were told the name and institutional affiliation of the expert economist (as well as shown their profile photo). In the second survey, research participants were only told the name of the expert economist (and shown the photo). In that second survey:

The overall effect of

female expert on the probability of matching opinion drops from 0.011 in the main experiment to 0.0002 in the follow-up.

The extra persuasiveness of female experts (over and above male experts) disappears! Sievertsen and Smith conclude that:

...in the first experiment, credentials provided an information signal that favoured senior female experts. Removing that signal in the follow-up experiment eliminates the gender difference.

It is worth explaining that last result in a bit more detail, because what it really shows is that the general public recognises the gender bias in top economics institutions.

The quality of a purported economics expert is private information. it is known to the expert themselves, but not known to the public. This is a case of asymmetric information. The expert is the informed party, and the general public is the uninformed party. Since the general public cannot tell high-quality and low-quality experts apart, they might assume that all experts are low quality. How can a high-quality expert instead convince the general public of their high quality? They must credibly reveal their quality to the public - this is called signalling.

For a signal to be effective, it must meet two conditions: (1) it must be costly; and (2) it must be costly in such a way that those with low quality attributes would not want to attempt the signal. Getting a tenured position is signal of high quality for experts. It is costly to get such a position, and it is costly in a way that low-quality experts wouldn't want to attempt it (because they would be unsuccessful in getting tenure anyway). So, a position at a top university is a signal of quality.

Now, why would this signal be even more effective for female economists? If getting a tenured position at a top institution is even more costly for female economists than for male economists, then the quality of the signal is higher for women than for men. And therefore, the general public would be even more believing of the signal for female economists than for male economists. The gender gap in economics is pervasive (see this post, and the posts linked at the bottom of that post). What is interesting is that this gender gap is so well established that the general public is acting on it!

Sievertsen and Smith finish by pointing out a bit of a paradox:

...if senior female economists have greater credibility in the eyes of the public, then why are they less confident in giving their opinion. This remains an open question.

Indeed. Hopefully, at the margin, this research will convince senior female economists to use their persuasiveness for the good of all (where 'good' is defined as persuaded more people to believe in economists' opinions).

Friday, 20 June 2025

This week in research #80

Here's what caught my eye in research over the past week:

  • Wright, Martin, and Krieg (with ungated version here) find that rising minimum wages are associated with reduced summer employment among college students in Washington State
  • Goldin asks why some countries in Europe and Asia with moderate fertility levels in the 1980s have become the 'lowest low' fertility countries today, whereas countries where fertility decreased earlier have not, and finds that in the 'lowest low' fertility countries, rapid economic change led to both generational and gendered conflicts that resulted in a rapid decrease in fertility

Wednesday, 18 June 2025

Good news, bad news, and students' views about the impact of ChatGPT on their labour market outcomes

Will ChatGPT have a positive or negative impact on labour market outcomes? On the positive side, ChatGPT and other large language models are seen by many as a 'force multiplier', increasing productivity. And more productive workers are generally paid more. On the negative side, ChatGPT and other large language models can complete many routine tasks that human workers currently do. Jobs that are predominantly made up of routine tasks are likely to be replaced, or heavily changed, by these models.

Does reading the latest (positive or negative) news story affect your views about whether the labour market impacts will be positive or negative overall? That is essentially the question addressed in this new article by Samir Huseynov (Auburn University), published in the Journal of Economic Psychology (ungated earlier version here). Huseynov focuses on the views of students, because:

Today’s students face the possibility that AI may partially or entirely overtake their anticipated jobs upon graduation. This challenging scenario could affect expected salaries, pushing students to drop out or switch to ‘‘safer’’ majors to secure future earnings... However, it is also possible that AI technologies could potentially enhance future workers’ productivity and earning potential... The discourse on AI, both optimistic and pessimistic, could shape students’ educational choices, leading to career-defining decisions. 

Huseynov used a survey experiment, where student research participants were first asked about their beliefs about future labour market outcomes for themselves (and for the median student in their studying the same major). They then read either an optimistic ('GoodNews') story about the potential impacts of ChatGPT and other AI tools, or a pessimistic ('BadNews') story, and were then asked again about those beliefs. Huseynov then tests whether the good news or bad news affects students' beliefs. Based on a sample of 716 US students, they find that:

...exposure to both optimistic and pessimistic AI ChatGPT discussions leads students to revise down their beliefs about ranking in the top 50% of the post-graduation earning distribution. The BadNews treatment, however, induces a more significant revision than the GoodNews condition. Interestingly, neither condition influences students’ reported expected earnings...

In terms of the mean value, students rated themselves as 10 percent less likely to be in the top half of the earnings distribution after receiving the BadNews treatment, and 4 percent less likely after receiving the GoodNews treatment. The effects were smaller when asked about the median student - 8 percent less likely in the BadNews treatment, but 3 percent more likely in the GoodNews treatment. However, it is worth noting that both GoodNews treatment effects were statistically insignificant. Interestingly, Huseynov also found that the effect of the BadNews treatment was particularly larger for female students and for students in non-STEM majors.

It is interesting that reading a single story about the potential impacts of ChatGPT and other AI tools would have an effect on students' beliefs. It would be more interesting to know whether the results were related to students' prior experience with generative AI. Huseynov reports that:

Nearly half of our subjects have never used ChatGPT or only used it a few times.

I thought that was surprising. However, I guess the students in the sample could have used other generative AI tools that they weren't asked about. We do perhaps get some sense of whether prior familiarity with generative AI affects perceptions, because the effects for students in STEM majors (who I guess are more likely to have encountered or used generative AI before) are not statistically significant.

If students' only understanding of the impacts of generative AI on their labour market prospects comes from doomscrolling through pessimistic perspectives online, it should be no surprise that students will become concerned about their future prospects. This reinforces that giving students experience in using generative AI in an intentional way will be helpful, and not just for their learning. It will also help students to understand their future potential and how they can work alongside generative AI. It isn't all bad news for current students, and they need to recognise that.

Monday, 16 June 2025

Some good news for human accountants in the face of generative AI

A lot of occupations made up of routine tasks (and a lot made up of tasks that are less routine) are looking likely to be greatly impacted by generative AI. One occupation that I thought might be severely impacted is accountancy. However, things might not be so dire for accountants (or accounting students).

This new working paper by Jung Ho Choi (Stanford University) and Chloe Xie (MIT) looks at the impact and integration of generative AI in the accounting profession, using data from a panel of 277 professional accountants. Nine of those accountants work for a firm that Choi and Xie are partnered with, while the others are almost entirely comprised of users of the partner firm's software. The data from accountants at the partner firm allow Choi and Xie to look in detail at task-level impacts of generative AI. The broader sample provide survey-based data on AI adoption, work patterns, and attitudes to generative AI. The surveys were conducted in November 2024 and March 2025, so this is almost up-to-the-minute results.

Choi and Xie find that generative AI contributes to a significant productivity improvement for accountants:

On average, AI-using accountants support 55% more clients per week compared to non-users, enabling them to broaden their client service scope. These accountants also log more billable hours, indicating that AI helps convert previously non-productive time into client-facing work. Importantly, we find evidence of task reallocation facilitated by AI. Routine tasks like data entry and transaction coding consume a smaller share of time for AI adopters – an 8.5 percentage point reduction in time spent on data entry for extensive AI users, equivalent to approximately 3.5 hours per week, assuming a standard 40-hour work week, freed from manual entry. Accountants appear to re-allocate this saved time to higher-value activities: we observe corresponding increases in time spent on client communication and quality assurance tasks for those using AI. In other words, AI is augmenting accountants’ capacity by taking over low-level tasks and allowing them to focus more on advisory and analytical work.

One interesting aspect is that the experienced accountants used generative AI in different ways to the less-experience accountants. Choi and Xie found that:

...more experienced accountants tend to utilize Generative AI tools more for simple tasks, such as transaction characterization, but less for complex tasks such as accounting for accruals. A 1 standard deviation increase in accounting experience... is associated with a 6 percent increase in the utilization of AI in transaction categorization and a 3 percent decrease in its use for accruals.

So, the use of generative AI improved productivity and shifted the tasks that accountants engaged in. A valid concern, also raised during their surveys, is whether the productivity improvements come at a cost of lower quality work. After all, generative AI has been prone to hallucinations, and that would lead to inaccuracy and error in the accounting context. Fortunately, that doesn't appear to be the case. Choi and Xie report that they find:

...improvements in financial reporting quality associated with AI usage. Firms where accountants deploy Generative AI show significantly more detailed and timely accounting records. In our sample, AI adoption is linked to a 12% increase in general ledger granularity, as measured by the number of unique accounts used to categorize transactions. This finer granularity suggests that AI-enabled accountants can capture transactions in more specific accounts, enhancing the informational richness of financial reports. Moreover, AI usage correlates with faster reporting cycles: on average, accountants using AI close the books 7.5 days sooner at month-end than those who do not use AI.

That last point doesn't sound grand, but is actually very dramatic, because:

On average, for each month, accountants require about 7.6 days (in addition to two weeks) to close the books for the previous month... accountants that use Generative AI on average close the books 7.5 days faster, effectively closing their books almost immediately after month-end.

When you consider the two weeks plus the 7.6 days, that means 21.6 days to close the books on average, but with generative AI this decreases to 14.1 days, a 35 percent decrease. Choi and Xie also report on a pilot incentivised field experiment, where they find that:

...randomly assigned AI-assisted participants have higher accuracy in terms of their categorization of accounts. Participants who received AI assistance completed their tasks faster and more accurately compared to those without AI assistance.

Given that this was a pilot experiment, it is likely that there are more results to come in the future. So, the improved productivity and quality are shown in survey-based results, in task-based data, and in a field experimental setting.

Given the productivity and quality improvement with generative AI, does this mean that there will be less demand for accountants in the future? Not necessarily. It is more likely that the role of accountants will change, with less routine data entry tasks, and more client-facing work. Not every accountant will agree, but I think this is a win, since the client-facing work is the more interesting stuff!

[HT: Marginal Revolution]

Sunday, 15 June 2025

Book review: The Little Book of Economics (Greg Ip)

I've reviewed a lot of popular economics books on this blog. I read them because they give me ideas that I can use in my teaching. Popular economics books vary a lot in quality, but those that are among the best tend to give me lots of teaching material and ideas. One exception is likely to be The Little Book of Economics, by Greg Ip.

I say that this book is an exception, not because the book is not good, but because it is very much focused on macroeconomics, which I don't teach! Given that macroeconomics is what most non-economists have in mind when they think about economics, that is not necessarily a bad thing in a popular economics book. Ip explains that:

I wrote The Little Book of Economics to provide non-economists with a practical, plain-language guide to the concepts they encounter in their daily lives, whether as students, business managers, or concerned citizens, from growth, unemployment, and inflation to deficits, globalization, and the Federal Reserve.

I think Ip mostly succeeds in this, and a lot of people would gain a lot from reading this book. Ip provides clear-eyed explanations of everything from business cycles, to monetary policy, fiscal policy, trade, and exchange rates. Having said that, the book is very US-centric, and a lot of the specific policy and regulatory detail is not applicable to other countries. The book is also getting quite dated now - the 'revised and updated' edition that I read was published in 2013.

Nevertheless, the core ideas of macroeconomics haven't changed, and that is where this book does its best work. And Ip makes it quite relatable as well. Consider this passage on tracking the status of the economy:

On a coast-to-coast flight you can relax with a drink and watch your progress on the video monitor in front of you, up to the minute you descend into your destination city. Wouldn't it be nice if we could do the same with the economy: Flip on a screen and know instantly where the economy is, how fast it's growing, and whether a recession lies ahead.

Unfortunately, when you clamber into the economy's cockpit you discover erratic and imprecise instruments, a filthy windshield, and outdated, faded maps.

I really enjoyed reading this book. Although it is now out of print, it doesn't seem difficult to find online. If you're interested in learning a bit more about macroeconomics, without all of the theory and models getting in the way, this book is a good place to start. And at 255 pages (which are short - it is a little book of economics, after all), it is quick and easy read.

Saturday, 14 June 2025

Penalty shootouts and first-mover advantage

I enjoyed watching the UEFA Nations League final on Monday. Spain and Portugal put on a good show, and the scores were tied at 2-2 at the end of extra time. The game went to a penalty shootout. Portugal had the first penalty shot, and eventually ended up winning the shootout 5-3, after Portuguese goalkeeper Diogo Costa saved a weak shot by Alvaro Morata.

Would the result have been different if Spain had taken the first penalty shot? There certainly is conventional wisdom that says that going first in a penalty shootout conveys an advantage (a first-mover advantage in game theory terminology). The argument is that, because the team going second is often trying to come from behind, that team faces more pressure than the team going first.

However, the evidence in favour of that conventional wisdom has been challenged, most recently and most thoroughly in this new article by David Pipke (Kiel Institute for the World Economy), published in the Journal of Economic Psychology (open access). Pipke looks at the outcomes of 7116 penalty shootouts from 1970 to 2024, across top leagues and international competitions. He then tests whether the outcome deviates from a random outcome (in which the team kicking first wins 50 percent of the time). He finds that:

In soccer, the first-kicking team wins 50.2 % of the time (p =0.785) across 7,116 matches in the Flashscore data.

So, there is no statistical evidence for a first-mover advantage in penalty shootouts in football (soccer). Pipke then turns to ice hockey, which also features shootouts but where the probability of a successful shot in a shootout is much lower. Using data from 4407 shootouts in North American ice hockey leagues over the period from 2010 to 2024, Pipke finds that:

In ice hockey, the first team wins 48.9 % of shootouts (p =0.148)...

It's closer to statistical significance, but not quite. There is no evidence for a first-mover advantage in ice hockey shootouts either. Pipke then notes that his statistical tests can:

...reject the hypothesis that the first-mover’s winning probability deviates by more than 1.6 percentage points in soccer... and 2.9 percentage points in hockey from a 50:50 split, at a 1 % significance level.

Pipke then looks at some subsets of the football data, and finds that:

In 342 women’s soccer competitions, the first-moving team wins 172 times (50.3 %, p = 0.957). In youth soccer shootouts, the first-kicking team prevails in 130 out of 277 cases (46.9 %, p = 0.336).

Second, between 2017 and 2019, an alternative format, where teams alternate in an A,B,B,A pattern, was tested in various competitions to address concerns about an inherent advantage of kicking first. In 44 shootouts following this sequence, the first-kicking team won 56.8 % of the time (25 shootouts), with no statistically significant deviation from a 50:50 split (p = 0.451).

So, overall, there is no evidence of a first-mover advantage in a penalty shootout (in football or ice hockey). The result may have been different if Spain had gone first in the UEFA Nations League final penalty shootout, but going first wouldn't have given Spain a statistical advantage.

Read more:

Friday, 13 June 2025

This week in research #79

Here's what caught my eye in research over the past week:

  • Berg et al. (with ungated earlier version here) find that Buy-Now-Pay-Later increases sales by 20%, driven by customers with low-creditworthiness and products where market power is larger
  • Vatsa and Pino find a positive association between petrol and food price inflation and inflation perceptions in New Zealand
  • Peukert and Windisch (open access) provide a synthesis of the literature on the law and economics of copyright in the digital age, paying special attention to online copyright enforcement, changes in the supply of works due to digital technology, and the importance of creative re-use and new licensing and business models
  • Gechert et al. (open access) systematically review a wide range of influential meta-analyses in economics and compare them to 'conventional wisdom', finding that the effect sizes decrease by 45 to 60 percent on average in the meta-analyses, compared with the 'conventional wisdom'
  • Geddes and Holz (with ungated earlier version here) find that, under rent control, vacancy decontrol provisions that allow rent re-sets between tenants increase the number of evictions and wrongful eviction claims, using data from San Francisco
  • Also on rent control, Stacy et al. (open access) find that more restrictive rent control reforms are associated with a 10% reduction in the total number of rental units in a city, and that while reforms lead to an increase in the availability of units affordable to extremely low-income households by about 52%, this is offset by a decline in units affordable to higher-income households of about 46%
  • Cattaneo, Gschwendt, and Wolter (open access) find, using a discrete choice experiment, that Swiss workers are willing to accept a salary reduction of almost 20% of the Swiss median annual gross wage to reduce their automation risk by 10 percentage points or, conversely, demand a 20% risk premium to accept an equivalent increase in automation risk
  • Selva, Deng, and Zhang (with ungated earlier version here) track all facemasks sold on Amazon from September 2019 to September 2020, and find that the average user rating of a facemask dropped significantly following the first consumer review or question and answer stating it was made in China, but not for other countries
  • Datta and Tzur-Ilan (with ungated earlier version here) document a slight increase in women’s representation in the Federal Reserve system over the past 20 years, although noting that there is still a persistent gender gap in research output
  • Teutloff et al. (open access) look at data on freelance jobs and find that labour demand increased after the launch of ChatGPT, but only in skill clusters that were complementary to or unaffected by the AI tool, while demand for substitutable skills, such as writing and translation, decreased by 20-50% relative to the counterfactual trend, with the sharpest decline for short-term (1-3 week) jobs
  • Adamson (open access) finds that the spatial covariance of natural vegetation endowments amongst potential trading partners is important for explaining the development of silver coin money, battles, and city-state formation in ancient Greek city-states
  • Migliore, Rossi-Lamastra, and Tagliaro (open access) find that the decision to prioritise work on-site at university over working from home positively influences scientific productivity, using data from Italian academics
  • Goehring looks at historical data on sex work in Britain, and finds that the 1861 'cotton shock' recession led to 12 more establishments per 100,000 people in exposed counties, an increase of approximately 20%

Thursday, 12 June 2025

What is the most valuable superpower?

How much would you be willing to pay to have superhero powers? Obviously, the answer depends on the type of superhero powers, so let me be more specific. How much would you be willing to pay to be able to fly? Or have mind control? Or teleport? Or to have superhuman strength? These are the questions that this recent article by Julian Hwang (West Virginia University) and Dongso Lee (Korea Rural Economic Institute), published in the Journal of Cultural Economics (ungated version here) attempts to answer.

Hwang and Lee conducted a discrete choice experiment, which involved asking research participants to choose a superpower. However, each alternative came with a 'price' measured in terms of a shorter life expectancy. So, Hwang and Lee note that the resulting estimate of willingness-to-pay is really a 'willingness-to-sacrifice', since the cost is expected years of life foregone.

Their sample is made up of 51 undergraduates at the University of Florida. Each research participant was presented with ten choice tasks, each of which looked something like this (from Figure 1 in the paper):

Hwang and Lee then use a mixed logit model to estimate the willingness to sacrifice (WTS) for each of the four superpowers, for two experimental groups. The treatment group was asked to swear that they will give truthful answers to each question, while the control group was not. Hwang and Lee find that, for the treatment group:

...the mean WTS for mind-control, flight, teleportation, and supernatural physical strength is 3.2 years, 2.07 years, 5.04 years, and 2.95 years, respectively. For the control group, the mean WTS is 2.04 years, 2.95 years, 4.01 years, and 3.9 years, respectively.

Hwang and Lee then use the value of a statistical life-year to estimate the willingness-to-pay for each superpower, finding that:

The mean WTP for mind-control, flight, teleportation, and supernatural physical strength is $332,579, $215,137, $523,812, and $306,596, respectively.

So it appears that, of the four superpowers that Hwang and Lee asked about, teleportation is viewed as the most valuable. However, to a large extent, the results depend on how each superpower is described to the research participants. For teleportation, research participants were told:

You can transport a person or object from one point to another without traveling the physical space between them

You can also transport yourself

You can visit any places you want without spending money or time

The other superpowers were somewhat more limited. Mind-control was limited to controlling a single person, for five minutes at a time. Flight was limited to 100 miles per day. Super-strength was Captain America strength (the ability to press 800 pounds), not Superman-level strength. In comparison, the teleportation power does seem fairly unconstrained, so it's little wonder that it was valued the highest.

This study could definitely be built on, in at least two ways. First, if a study focused on a single superpower (flight, for example), it should be possible to recover the willingness-to-sacrifice for different aspects of the superpower - duration, maximum height, maximum flight speed, whether the superhero needs to remain awake in order to fly, and so on. Second, it would be interesting to know if there is a difference in the willingness-to-sacrifice between comic book fans (or superhero fans more generally) and other people.

These sorts of follow-up questions might even make a good project for a suitably interested (and motivated) Honours or Masters student. And before you think that the subject matter is not important, it is really the ability to apply the tools of non-market valuation (and discrete choice modelling) that is the important aspect of those sorts of projects. As well as just being a fun research question to think about!

[HT: Marginal Revolution, last year]

Monday, 9 June 2025

Where the prime-age population goes, so goes the economy

Globally, and especially in developed countries and in China and some other developing countries, the population is ageing rapidly. That population ageing is making a lot of people nervous because of its implications for the economy. In the future, a larger proportion of the population who are retired older people will need to be supported by a smaller proportion of the population in the labour force. However, such an 'old age support ratio' conception of the economic problem of population ageing only paints a partial picture. 

We can decompose GDP per capita as follows (as shown in this post):

[Y/P] = [Y/L] * [L/WA] * [WA/P]

where Y is output, P is population, L is the labour force, and WA is the working age population. This identity simply says that GDP per capita (or output per person, Y/P) is made up of labour productivity (or output per unit labour, Y/L), labour force participation (L/WA), and the share of the working age population in the total population (WA/P). As the population ages, WA/P decreases, and that should contribute to lower GDP per capita - that is, lower economic growth (unless, as noted in this post, it is offset by increasing productivity).

Both of those conceptions of the economic problem of population ageing are related to relative decline - the older population as a share of the total population increasing (what is referred to as structural ageing). The problem gets much worse if the size of the working age population declines in absolute terms. That is the problem that this 2024 working paper by Charles Kenny and George Yang (both Center for Global Development) looks at (with less technical summary here). Specifically, Kenny and Yang investigate the economic implications of a declining prime age population (those aged 15 to 65 years), focusing on:

10 year bond yields, consumer price indices, total and female labor force participation, GDP, government expenditures, government revenue, and stock returns.

Using data from the UN World Population Prospects, Kenny and Yang categorise countries into those where prime age population growth (PAPG) is positive, and those where PAPG is negative, and then compare the two groups. First though, the share of countries with positive and negative PAPG is instructive, in terms of demonstrating population ageing. Here is Figure 1 from the paper, which shows the number of countries with positive (pale blue) and negative (red) PAPG:

Prior to 1995, few countries experienced negative PAPG, but by 2060 more than half of countries will experience a declining prime age population. Does the degree of PAPG matter? Kenny and Yang show that it does, finding in a two-way fixed effects regression model that:

...higher PAPG is correlated with lower government expenditure, greater revenue, higher 10 year yields, and greater stock index returns, but suggests an insignificant effect on growth and labor force participation.

Interestingly, Kenny and Yang also find that higher PAPG is associated with higher inflation. Turning all of that around, lower PAPG is associated with higher government spending, lower government revenue, lower bond yields, lower stock returns, and lower inflation. That is consistent with governments having to spend more on pensions and health care, receiving lower tax revenues from a smaller labour force, and lower investment returns as portfolios are shifted to less risky options (bonds, rather than shares). 

Kenny and Yang also find a significant discontinuity between countries with positive and countries with negative PAPG. Even controlling for a linear effect of the level of PAPG, negative PAPG is associated with lower economic growth, higher government spending, higher inflation, lower 10-year bond yields, as well as lower female and total labour force participation rates.

Kenny and Yang conclude by noting that there are few effective strategies for mitigating the impact of declining prime-age population growth. One suggestion they make is migration. However, as I have noted before, migration cannot be a solution to population ageing. Kenny and Yang dismiss the idea of technological change, such as robots and AI, but it seems that it might be the only way to preserve high living standards (at least, as measured by GDP per capita), by rapidly increasing productivity.

Read more:

Sunday, 8 June 2025

The gradual rise of high-level gaming may have shifted alcohol and gaming from complements to substitutes

When I was growing up, my friends and I spent an awful lot of time gaming. In those days, that initially meant roleplaying games like Dungeons and Dragons or MERP, or tabletop war games like Renegade Legion: Centurion or BattleTech. By the time we got to university, we still gamed, but increasingly on computers, playing hot seat games like Warlords II or Robosport. Regardless of the game though, alcohol was a key accompaniment. If gaming had been less costly (in terms of opportunity costs), we would have done more gaming, and more drinking. Gaming and drinking were clearly complements.

Not any more it seems. According to this article in the Financial Times last month (paywalled):

Gaming, video streaming and social media have had a far bigger impact on alcohol consumption than Gen Z concerns over its effect on health, according to the head of one of the world’s largest brewers.

Atsushi Katsuki, chief executive of Japan’s Asahi, said “there’s no doubt” the rise of digital entertainment platforms had hit demand for his sector’s products far more than abstinence driven by concerns over the harmful impact of drinking.

“Alcohol used to occupy a much bigger share of people’s entertainment and joy,” he told the Financial Times. “In the past 10 years, the number of entertaining things has grown including gaming, so I believe alcohol’s share of fun, enjoyment and happiness has decreased.”

If drinking is something that consumers do instead of other entertainment options, rather than alongside other entertainment options, then alcohol has become a substitute, rather than a complement, for entertainment like gaming. That's what Katsuki appears to believe.

This change might be linked to changes in the way that people game, as much as changes in the way that people drink. This 2019 article (open access) found that low-level gaming is positively associated with problem drinking. So, for people engaging in low-level gaming, alcohol and gaming may be complements (and that was probably the case for my friends and I - gaming was primarily a social activity). That research also found that high-level gaming is negatively associated with problem drinking. So, for people engaging in high-level gaming, alcohol and gaming may be substitutes (high-level gamers don't drink and game).

So perhaps over time, as gamers have gradually become more serious about their gaming, more gamers fit into the high-level category than the low-level category. And observationally, more people are gaming than before. Taking those together, the overall population-level association between alcohol and gaming may have gradually switched from complement (people drinking and gaming together) to substitute (people drinking, or gaming, but not both).

And now that young people are viewing gaming and alcohol as substitutes, it appears that alcohol is losing out. It's no wonder that Asahi and other alcohol producers are worried.

Saturday, 7 June 2025

Book review: Noise

One of my research areas is population economics. As part of that stream of my research, I generate projections of future population. Those projections are used by many local councils in the Waikato region for their long-term planning. One of the things that I learned very early on was the sheer uncertainty associated with forecasting (indeed, Jacques Poot and I quantified this uncertainty in an article published in the Journal of Population Research in 2011). So, it is all very well to have some model that forecasts in an unbiased way (so that, on average, the forecast is correct). But you also need to take account of how much uncertainty there is in the forecast as well.

That is essentially the first lesson to be drawn from the 2021 book Noise, by Nobel Prize winner Daniel Kahneman, Olivier Sibony, and Cass Sunstein. They define bias as a systematic deviation from the target, while noise is random scatter. Both bias and noise are components of error in human judgment, but Kahneman et al. argue that while bias has attracted much attention, the role of noise is under-recognised. I tend to agree. When talking to decision-makers or policy people about population projections, they want to know how 'accurate' they are. By 'accurate', they are talking about the bias in the projections. They are usually not at all interested in hearing about the uncertainty in the projections (how noisy they are).

This book is about drawing attention to noise. The first two parts of the book describe the difference between bias and noise, and look at how to measure them. Kahneman et al. make use of a clever analogy - looking at the cluster of bullet holes in a target at a shooting range. How close on average those holes are to the centre of the target is a measure of bias. How spread out the holes are provides a measure of noise. They also point out that you can know how noisy decisions are without knowing anything about how biased they are. If you turn the target over, you can see the bullet holes, but not the target. So, you can still see the noise, even if the bias cannot be seen (I'll return to this point later).

The third part of the book looks at a particular type of human judgment: predictive judgments. Kahneman et al. then discuss the causes of noise, drawing heavily on psychology. That's because the main source of noise in human decision-making is the human part. Even faced with the same alternatives and the same information on which to base a decision, we may not choose the same alternative. Finally, the book looks at ways of reducing noise, and concludes by exploring some of the counterarguments and providing reasoned rebuttals.

This is a very thorough treatment of an important topic. And it was interesting to read this book after having just finished Gerd Gigerenzer's book Gut Feelings (which I reviewed here). There were some stark contrasts between the perspectives in the two books. For example, Kahneman et al. write that:

When they listen to their gut, decision makers hear the internal signal and feel the emotional reward it brings. This internal signal that a good judgment has been reached is the voice of confidence, of "knowing without knowing why." But an objective assessment of the evidence's true predictive power will rarely justify that level of confidence.

Shots fired! Gigerenzer's book was literally about how good decisions made using gut feelings generally were. While Gigerenzer argues that unconscious decision-making is often (but not always) successful, Kahneman et al. prefer that decision-makers make very conscious, slow decisions, appropriately weighing up the evidence. This almost mechanical approach to decision-making could be exemplified by algorithmic decisions. However, while Kahneman et al. acknowledge some clear benefits of an algorithmic approach, they also note that:

...an algorithm... could turn out to be as biased as human beings are. Indeed, in this regard, algorithms could be worse: since they eliminate noise, they could be more reliably biased than human judges.

Noise is clearly something that Kahneman et al. want to eliminate from decision-making, and they make a strong case for it. They conclude that:

The best amount of scatter is zero, even when the judgments are clearly biased.

On this point, I'm not sure that I fully agree. While it is undeniably good to have less bias in decision-making, the case is less clear for less noise, holding the amount of bias constant. Let's go back to the bullet holes in the target, viewed from behind (so that the target itself cannot be seen). The bias in the shooting cannot be seen, but the noise can. And the amount of noise gives some indication of how confident a decision-maker can be in what they are seeing. Seeing a tightly clustered set of bullet holes might increase the decision-makers confidence in the 'true' location of the centre of the target, whereas a more spread out set of bullet holes would give the decision-maker more pause. When there is unobserved (or unobservable) bias, it might be preferable to have more noise. There is a famous quote, often incorrectly attributed to John Maynard Keynes but actually from the late 19th-early 20th Century English philosopher Carveth Read, that "it is better to be vaguely right than exactly wrong". I think that applies here.

None of that is to say that reducing noise is a bad thing. And Kahneman et al. have identified a problem that is generally under-recognised. The book is probably longer than it needs to be to get its point across (I found this with Kahneman's earlier book Thinking, Fast and Slow (which I read before I started blogging, so there is no review). However, if you enjoyed that book, you will no doubt enjoy this one too. And, there is a lot to learn from this book in spite of its length. People who make decisions (that is, everyone) should at least be aware of noise, and this book provides one way of raising that awareness.

Friday, 6 June 2025

This week in research #78

Here's what caught my eye in research over the past week:

  • The Nobel lectures by Daron Acemoglu, Simon Johnson, and James Robinson
  • Rattsø and Sheard (open access) find positive overall effects of airports on population and employment growth, using data from the expansion of Norway's airport system between 1950 and 2019
  • Ederer, Goldsmith-Pinkham, and Jensen (with ungated earlier version here) analyse content from the Economics Job Market Rumors (EJMR) website from 2012 to 2023, and document a gradual increase in posts linking to Twitter, as well as rating references to different Twitter accounts according to how negative, misogynistic, or toxic they are
  • Ginther, Kahn, and Milakhina (with ungated earlier version here, or here) find that women are significantly disadvantaged in promotion to associate and full professor in economics departments
  • Buzard et al. (with ungated earlier version here) find that women report significantly lower interest in both taking economics classes and majoring or minoring in economics relative to their male peers, based on a survey of early-stage undergraduate students at two US universities

Tuesday, 3 June 2025

How prevalent is large language model use in the write-up of economics research?

Back in January, I poked fun at a paper on students' acceptance of ChatGPT that had parts that were clearly written by generative AI. And I recently read a working paper that was great (and I'll blog on it sometime soon), up until the Conclusion section, which was clearly written by generative AI. But how common is this? Academics worry about how often students are using AI to write assignments or essays, but how often are we doing so?

That is essentially the question addressed in this new article by Maryam Feyzollahi and Nima Rafizadeh (both University of Massachusetts Amherst), published in the journal Economics Letters (sorry, I don't see an ungated version online). Feyzollahi and Rafizadeh investigate the top 25 economics journals over the period from 2001 to 2024, and basically look for word choices that are characteristic of large language models (LLMs). As they explain:

We construct two equally-sized word sets for our analysis: treatment words that are characteristically associated with LLM-assisted writing, and control words that represent traditional academic writing patterns... The treatment words are selected based on two criteria. First, we analyze a large corpus of confirmed LLM-generated academic text to identify words that appear with systematically higher frequency compared to human writing. Second, we cross-reference our selections with existing literature on language model patterns... to validate our choices... The control words are selected based on two criteria. First, these words represent established economic and econometric concepts that have maintained consistent usage patterns in academic writing over our sample period. Second, they are semantically unrelated to our treatment words, ensuring that any potential changes in treatment word frequencies do not spillover to or correlate with control word usage through meaning associations.

I know that you're wondering about the word list, and it is provided in Table 2 from the paper:

That list seems more nuanced (I swear that ChatGPT did not write this sentence!) than the word choices that have previously highlighted as signals of LLM use, like "rich tapestry", "realm", or "mosaic". However, some old favourites like "delve" and "foster" do appear in the list, so clearly LLMs haven't completely evolved to avoid their characteristic phrases.

Feyzollahi and Rafizadeh compare the relative frequency of the treatment and control words. Their results are quite well illustrated in Figure 1 (a) from the paper, which shows how the use of the words "intricate" (a treatment word) and "coefficient" (a control word) have changed over time:

Maybe starting from 2023, research became more intricate, or there were more intricacies in the findings of research? Or more likely, LLMs suddenly started to play an increasing role in the write-up of research. Generalising from that comparison of just two words, Feyzollahi and Rafizadeh use a simple regression model and find:

...compelling evidence of increasing LLM adoption over time. When considering both post-treatment years... the analysis documents a significant increase of 4.76 percentage points in the frequency of LLM-associated terms, with the effect maintaining remarkable stability across all specifications.

And then when comparing 2023 and 2024, Feyzollahi and Rafizadeh find:

 ...an accelerating pattern of LLM adoption. The initial impact in 2023... shows an increase of 2.85 percentage points, while the effect more than doubles to 6.67 percentage points in 2024...

So, LLM use is small, but growing quickly in academic economics. And there are many reasons to believe that these results understate the true use of LLMs in the write-up of research in economics. It takes some time for research to get published, so there will likely be far more papers in the 'publication pipeline' that have used LLMs. Authors can re-write text that was drafted by an LLM in order to mask the LLM's contribution. LLMs may be getting better at writing in an 'academic style' that avoids the use of phrases that signal the use of an LLM (no more delving!).

Overall though, it is clear that LLMs are increasingly being used to write up research for publication. A relevant question to ask is: does LLM use reduce the quality of the underlying research? Personally, when I read a paper where an LLM has clearly been used in the writing, I chuckle to myself. However, I haven't as yet had cause to disbelieve the underlying results of the research. However, my reaction doesn't necessarily reflect the views of academics in general. Regardless, when an LLM is used that use should be transparently disclosed by the authors (indeed, John List reported results of a quick survey of his followers on LinkedIn recently, where only 14 percent of them suggested that the use of Claude in a research paper should not have been disclosed).

We should not be surprised that researchers are using LLMs. LLMs can increase our productivity by helping us to write up our research more quickly. In that sense, we face similar incentives to students, who are trying to complete their assignments and essays more quickly. Both students and researchers, though, should at the very least acknowledge their use of these tools.

Read more:

Monday, 2 June 2025

Some better news for AI tutors as a substitute for human tutors

Saturday's post made the case that we aren't there yet for AI tutors as a substitute for human tutors. However, there have been some success stories (see this post, for example). And in another example, this new working paper by Martín De Simone (World Bank) and co-authors shows that AI tutors can be very effective (and cost-effective) at teaching English in Nigeria. Specifically, the study:

...analyzes the effects of an after-school program in which students interacted with a large language model twice per week to improve their English skills, following the national curriculum. The intervention was implemented in Benin City, Nigeria, using Copilot, an LLM powered by the GPT-4 model at the time of implementation... The program was implemented over a six-week period between June and July 2024, targeting first-year senior secondary school students, who are typically 15 years old...

In the first session, teachers familiarized students with Microsoft Copilot, emphasizing both its educational benefits and potential risks, such as over-reliance on the model and the possibility of hallucinations and biased outputs. The goal was to foster responsible usage, encouraging students to complement their learning with the AI tool while retaining critical thinking skills.

Each subsequent session focused on a topic from the first-year English language curriculum, aligned with the material that students covered during their regular classes. The sessions began with a teacher-provided prompt, followed by free interaction between the student pairs and the AI tool...

The lesson guides and their prompts were carefully crafted to position the LLM as a tutor, focusing on facilitating learning rather than simply providing direct answers. 

The programme was run in nine schools, over a six-week period. The 'teacher-provided prompt' ensures that the students remain on-task, and is similar to the 'AI tutor as a substitute for a human tutor' that I discussed in Saturday's post. Unlike the research I discussed in that post, De Simone et al. are not interested in the fidelity of the AI model in keeping to the questions and answers it was provided. Instead, they look directly at student learning (which is what really matters).

Each student in the nine schools was invited to participate, and those that agreed were randomised to receive the AI tutor, or not. This randomised controlled trial (RCT) should give us high confidence in the results. Even though there was selection into the sample, the randomisation happened after the selection, so the results hold at least within the group of students who are willing to participate (which was 52 percent of eligible students). The results were dramatic:

First, we show that students selected to participate in the program score 0.31 standard deviation higher in the final assessment that was delivered at the end of the intervention. We find strong statistically-significant intent-to-treat (ITT) effects on all sections of that assessment: English skills (which included the majority of questions, 0.24 σ), digital skills (0.14 σ), AI skills (0.31 σ) and an Item Response Theory (IRT) composite score of each student’s exam (0.26 σ). We also show that the intervention yielded strong positive results on the regular English curricular exam of the third term.

So not only did the students perform better at the end of the RCT, that better performance carried through to better performance on a more general exam at the end of the term. It wasn't all good news though, as the results may increase learning inequality:

Treatment effects were positive and statistically significant across all levels of baseline performance, but stronger among students with better prior performance. Similarly, treatment effects were positive and statistically significant over the entire distribution of a proxy for socioeconomic status, but stronger among students with a higher one.

On the other hand:

...treatment effects were stronger among female students, compensating for a deficit in their baseline performance.

Still, finding strong positive effects for all groups of students is an important result, and the reduction in gender differences in English capability is important in this context. De Simone et al. then undertake a cost-benefit analysis, finding that:

...the program was highly cost-effective. The six-week pilot generated learning gains that take between 1.5 and 2 years in a business-as-usual scenario. The program achieved 3.2 equivalent years of schooling (EYOS) per $100 invested, surpassing many comparable interventions... When benchmarked against evidence from both low- and middle-income countries, the pilot program ranks among the most cost-effective solutions for addressing learning crises.

It is hard to argue against strong RCT evidence that is so positive in its impact. However, it is important to remember that context matters. This study was conducted with secondary school students, learning English, in Nigeria. The results are unlikely to generalise to all other learning contexts.

And on the subject of context, this study also made me think about the disciplinary contexts of the studies that have shown positive effects of AI tutors as substitutes for human tutors, compared with those that have shown negative (or null) effects. The studies that have shown positive effects have tended to be in physics, computer science, or (now) English language study. In contrast, the studies that have not had such positive findings have been in law or social sciences.

Generalising wildly, there is a distinction between the two groups of subject areas. Perhaps an AI tutor works well as a substitute for a human tutor when the subject area consists primarily of problems that are well-defined and answers can be specified in an objective way (like a maths problem, or learning a language), but not when the problems are more open-ended and answers are more subjective? That would make intuitive sense.

AI tutors that have been prompted to follow a tutorial or workshop script (with questions and answers specified in advance) are given boundaries. That will work best when the AI tutor and student don't stray too far off-script. Staying within the boundaries will be relatively easier if the question and the answer are well defined and objective.

However, in a tutorial or workshop where the answers are more subjective, the teachers who create the system prompt may find it more difficult to anticipate all of the directions in which the conversation between the student and AI tutor may go. The script may not be able to cover all possibilities, and even if the script is quite detailed, it may be more difficult for the AI tutor (and the student) to stay close to the script. In my experience, the longer the system prompt, the more likely it is that ChatGPT ignores part of the prompt. So, when the question and answer are more subjective, there may be more scope for the AI tutor to introduce irrelevant material, hallucinate, or steer students wrong. That might explain the results from Saturday's post.

If my speculations above are correct, then that has interesting implications for economics, where the questions that we ask students can easily encompass both the objective and subjective (and economics is not alone in that). Clearly, there is more research to be done here (including my own, but more on that in a future post). Understanding whether AI tutors will work best as a substitute or complement for human tutors (and if the answer is context-dependent, then the best contexts for each) is important for the future of education.

[HT: Ethan Mollick, via Marginal Revolution]

Read more: