Friday, 31 October 2025

This week in research #99

Here's what caught my eye in research over the past week (yet another very quiet week, it seems):

  • Miller et al. (with ungated earlier version here) use a conjoint survey experiment to examine the hiring preferences for lobbyists, finding that organised interests prefer lobbyists with policy-specific expertise and the necessary connections to get access to decision-makers, but they find little evidence that connections are more valuable than expertise
  • Li et al. find that an in-sample shift in de-seasoned weather from the coolest to the hottest semester reduces semester-long undergraduate student performance by 1.5 percent in Singapore, using data from 2005 to 2019

Thursday, 30 October 2025

How people use ChatGPT, for work and not

The economic impacts of AI (referencing my previous post) are driven by who uses AI tools, and how they use them. We got an sense of this from this working paper by Handa et al., which used data from Claude. However, by far the most widely used generative AI tool is ChatGPT, so I read this new NBER working paper by Aaron Chatterji (Duke University) and co-authors with a lot of interest (see also this blog post by David Deming, one of the coauthors). Many of the co-authors are with OpenAI, which gave them privileged access to user data from ChatGPT. Having said that though, the authors are very clear and very detailed in pointing out the steps they took to ensure data privacy was maintained (and in this matter, this paper is a model for others to follow).

Their main data is made up of:

...a random selection of messages sent to ChatGPT on consumer plans (Free, Plus, Pro) between May 2024 and June 2025.5 Messages from the user to chatbot are classified automatically using a number of different taxonomies: whether the message is used for paid work, the topic of conversation, and the type of interaction (asking, doing, or expressing), and the O*NET task the user is performing.

Using this dataset (and some related datasets, including one that matches users to the demographic details, while maintaining confidentiality), Chatterji et al. document a lot of important descriptive facts about ChatGPT users (on consumer plans), as well as trends over time.

First, they document the by-now well-known exponential growth of ChatGPT use over time, summarised in Figure 3 from the paper:

Not only has ChatGPT use grown over time, but it has also grown within every cohort of users over time (where cohorts are defined by how long ago users first started using ChatGPT). The early adopters still use ChatGPT the most, but every subsequent cohort has increased use over time, Looking at Figure 5 in the paper, it looks to me like there is a noticeable spike in about March-April 2025, when the o3 model was released:

Turning to the use of ChatGPT for work, Chatterji et al. report that:

...both types of queries grew rapidly between June 2024 and June 2025, however non-work-related messages grew faster: 53% of messages were not related to work in June 2024, which climbed to 73% by June 2025.

Interestingly, later cohorts of users have a greater share of non-work messages than earlier cohorts. However, the differences between cohorts are relatively small, and a majority of ChatGPT messages are non-work-related for every cohort. Nevertheless, there is a lot of ChatGPT-related working going on!

What sort of work? Chatterji et al. next look at the topics of conversations with ChatGPT, finding that for work-related messages:

About 40% of all work-related messages in July 2025 are Writing, by far the most common Conversation Topic. Practical Guidance is the second most common use case at 24%. Technical Help has declined from 18% of all work-related messages in July 2024 to just over 10% in July 2025.

'Writing' includes things like editing or summarising text, or translating. 'Practical Guidance' includes things like how-to advice, and tutoring or teaching. 'Technical Help' includes things like calculations, programming, or data analysis. Including non-work-related conversations:

The three most common Conversation Topics are Practical Guidance, Seeking Information, and Writing, collectively accounting for about 77% of all ChatGPT conversations.

'Seeking Information' is basically using ChatGPT as a replacement for web search (a use case that I have become particularly fond of ever since ChatGPT started routinely providing web links in its responses). Of interest to educators should be this:

Education is a major use case for ChatGPT. 10.2% of all user messages and 36% of Practical Guidance messages are requests for Tutoring or Teaching.

Of course, that won't count the ChatGPT conversations that relate to the completion of assessments, which are more likely to fall into the 'Writing' or 'Technical Help' categories.

Chatterji et al. then look at user intent, based on a categorisation of messages into 'asking' (seeking information from ChatGPT), 'doing' (asking ChatGPT to complete a task), and 'expressing' (anything else). For work-related messages, they find that:

Doing constitutes nearly 56% of work-related queries, compared to 35% for Asking and 9% for Expressing.

That contrasts with what they see when looking at all messages, where 49% of messages were 'asking' and 40% were 'doing'. Interestingly, 'doing' messages are declining as a share over time, while 'expressing' are increasing. I would have thought that 'asking' messages would have increased, but there is only slight evidence for that (obviously I am extrapolating from my own experience!).

The work activities results are quite detailed, so I won't discuss them in detail here. However, Chatterji et al. provide the following summary:

We find that about 81% of work-related messages are associated with two broad work activities: 1) obtaining, documenting, and interpreting information; and 2) making decisions, giving advice, solving problems, and thinking creatively.

Turning to the demographics of ChatGPT users and their use of ChatGPT, Chatterji et al. report that:

...a significant share (around 80%) of the weekly active users (WAU) in the first few months after ChatGPT was released were by users with typically masculine first names. However, in the first half of 2025, we see the share of active users with typically feminine and typically masculine names reach near-parity. By June 2025 we observe active users are more likely to have typically feminine names. This suggests that gender gaps in ChatGPT usage have closed substantially over time.

We also study differences in usage topics. Users with typically female first names are relatively more likely to send messages related to Writing and Practical Guidance. By contrast, users with typically male first names are more likely to use ChatGPT for Technical Help, Seeking Out Information, and Multimedia (e.g., modifying or creating images).

 Looking at differences by age group:

Among those who self-report their age, around 46% of the messages in our dataset are accounted for by users 18-25.

A higher share of messages are work-related for older users. Work-related messages comprised approximately 23% of messages for users under age 26, with this share increasing with age.

Perhaps younger users are more likely to disclose their age? Having said that, I don't think anyone would be surprised by those results. Nor would they be surprised by the results by level of education:

Educated users are much more likely to use ChatGPT for work. 37% of messages are work-related for users with less than a bachelor’s degree, compared to 46% for users with exactly a bachelor’s degree and 48% for those with some graduate education. Those differences are cut roughly in half after adjusting for other characteristics, but they are still statistically significant at the less than 1 percent level. Educated users are more likely to send work-related messages.

The results by occupation are more interesting. Chatterji et al. report that:

...the unadjusted work shares are 57% for computer-related occupations; 50% for management and business; 48% for engineering and science; 44% for other professional occupations; and only 40% for all non-professional occupations. Regression adjustment moves these figures around slightly, but the gaps by occupation remain highly statistically significant. Users in highly-paid professional occupations are more likely to send work-related messages.

The 'regression adjustment' refers to using the results from a multiple regression model that controls for age, gender, education, and some other variables. Looking at user intent and conversation topics by occupation, Chatterji et al. find that:

...users in highly paid professional occupations are more likely to use ChatGPT for Asking rather than Doing... This is especially true in scientific and technical occupations. 47% of the work-related messages sent by users employed in computer-related occupations are Asking messages, compared to only 32% for non-professional occupations. These differences shrink somewhat with regression adjustment, but remain highly statistically significant...

Writing is especially common for users employed in management and business occupations, accounting for 52% of all work-related messages. Writing is also relatively common in non-professional and other professional occupations like education and health care, accounting for 50% and 49% of work-related messages respectively. Technical Help constitutes 37% of all work-related messages for users employed in computer-related occupations, compared to 16% in engineering and science and only about 8% for all other categories.

Chatterji et al. note that:

Across all occupations, ChatGPT usage is broadly focused on seeking information and assistance with decision-making.

People use ChatGPT for what it is best suited for. So, what does this all mean for the economic impact of AI (and specifically, the economic impact of ChatGPT)? Chatterji et al. conclude that:

...our findings suggest that ChatGPT has a broad-based impact on the global economy. The fact that non-work usage is increasing faster suggests that the welfare gains from generative AI usage could be substantial... Within work usage, we find that users currently appear to derive value from using ChatGPT as an advisor or research assistant, not just a technology that performs job tasks directly. Still, ChatGPT likely improves worker output by providing decision support, which is especially important in knowledge-intensive jobs where productivity is increasing in the quality of decision-making.

None of that answers the stream of questions posed by Kevin Bryan (which I outlined in my previous post). Nevertheless, it is important to recognise both how widely used ChatGPT is (in case you've been living under a rock for the last three years), and how it is used, particularly by workers in their daily tasks. This research provides us with broad answers, which can now be supplemented with more detailed analyses of particular industries and occupations.

[HT: Marginal Revolution]

Tuesday, 28 October 2025

Kevin Bryan on the economic impacts of AI

I love the 1987 quote from Robert Solow that "You can see the computer age everywhere but in the productivity statistics" (from a New York Times column that you can read here). Arguably, you can recycle that quote to refer to generative AI now. Where are the economic impacts of AI? What should we be looking for? Are we looking in the wrong places?

Those are some of the questions that Kevin Bryan (University of Toronto) tries to address in this new paper, which reviews seven books from the last twelve years that cover some aspect of the economic impacts of AI. Why books, and not research articles? Bryan notes that:

...books play a unique role. Research articles construct a literature. Books summarize it; they situate research articles in a broader context; they draw out implications; they take stands. Not all books need to do all of this, but books are an important vector by which the aggregated knowledge of research journals reaches the public and non-subject-matter experts.

The seven books included in the review are as follows:

“The Second Machine Age” (Brynjolfsson and McAfee, 2014) offers an early argument that changes in computation and digitization were leading to an Industrial Revolution-sized economic shift; “Prediction Machines” and “Power and Prediction” (Agrawal et al. (2018) and Agrawal et al. (2022)) provide a particularly compelling framework for the basic economic feature of AI, its role in reducing the cost of prediction; “The Data Economy: Tools and Applications” (Baley and Veldkamp, 2025) covers the economic theory of data, an important input into that prediction; “The Skill Code” (Beane, 2024) and “Co-Intelligence” (Mollick, 2024) examine practical implementation challenges for AI, via sociology and management research, that are frequently misunderstood by industry practitioners; and “Situational Awareness” (Aschenbrenner, 2024), a book-length treatise self-published online for speed reasons, offers a view from Silicon Valley about the most transformative possibilities of AI.

Of interest to me, I have only read one of the books (Co-Intelligence, which I reviewed here back in July). I'm not going to summarise Bryan's reviews of the books, since it is really difficult to do so without repeating a lot of what he says in the review. If you're interested, you should read the paper. However, I do want to pick up one bit from the reviews, on Aschenbrenner's book. Of the reviewed authors, Aschenbrenner is the most bullish on the impacts of AI, and Bryan notes that:

...it is simply a fact that the view of the future expressed in “Situational Awareness” is closer to the modal view of AI researchers and the folks running the most prominent AI labs than less bold analyses of AI, even those that treat AI as a potential general purpose technology with substantial economic importance. If economists are to play a public role in the debate of AI, it is essential to at least understand the economic model in the heads of many of the people we are trying to communicate with.

Having read those seven books, Bryan is able to tease out a number of open questions on the economic impacts of AI. This might be the most interesting part, because it suggests where economists might have influence on the policy and practical conversations around AI and where economists might best help people, particularly policymakers but also businesspeople, to understand the implications and impacts of AI. Bryan presents the following questions:

How should monetary policy respond to simultaneous deflationary pressure from productivity gains and potential unemployment from labor demand shifts? What are the implications for interest rate policy when technological change accelerates dramatically? How is public debt affected? If AI adoption reduces employment while requiring large public investments in education, social protection, and infrastructure, how will governments finance these expenditures? What are the optimal tax policies for an economy where capital captures an increasing share of income?...

AI development exhibits strong network effects and scale economies that could create winner-take-all dynamics among countries. How should trade policies respond when AI capability determines comparative advantage? What are the implications for international capital flows when AI investment becomes central to national competitiveness? Is “sovereign AI” necessary? How should it be taxed, considering the inequality discussion? How can we, as with climate change, coordinate internationally on AI safety concerns? Game theorists have a role to play here.

If AI development requires trillion-dollar investments concentrated among a few companies, how will this affect financial system stability? How should banking regulations adapt as AI systems conduct increasing shares of financial transactions? What are the systemic risks when AI companies become too big to fail?

What should we measure? What early warning signs indicate negative effects or growth takeoff?...

How should science and innovation be structured? Are patents more or less important? How does information flow across firms? To what extent should we allow trade secrets in an AI-driven intelligence explosion? What should agencies like the NIH or NSF be doing differently? What should universities be teaching differently?

That's a lot of open questions. I don't have the answers to them, but hopefully there are already teams of economists working on answering them.

Obviously, the very last question hits especially close to home for me. However, Bryan presents a narrow view of the issues in university education in that single question. The open question extends beyond what universities should be teaching differently. We should also be considering how universities should teach differently, and how universities should assess differently. This requires us to understand and anticipate changes in the labour market for graduates, the changing role of internships and work-integrated learning, and whether an apprenticeship model of workplace learning still makes sense in an era where generative AI can replace much of the low-level work that new graduates previously did. We also need to consider how pervasive generative AI changes how students approach their learning, and whether generative AI democratises learning in a way that makes the mass-market model of university education obsolete (but where elite university education may still persist). Anyway, I'll be writing a little bit more on those topics in future posts.

[HT: Marginal Revolution]

Sunday, 26 October 2025

Milanovic's model of migration flows and migrants' rights, expanded

In my review of Branko Milanovic's book Capitalism, Alone yesterday, I noted that there were a couple of missed opportunities for the book to go deeper on certain topics. One of those was a simple model of migration flows and migrants' rights, presented in Chapter 4 of the book. Milanovic motivates this model with a discussion of the native-born population's view on migrants, based on the following proposition (emphasis is from the book):

...The native population is more likely to accept migrants the less likely the migrants are to permanently remain in the country and use all the benefits of citizenship.
This proposition introduces a negative relationship between (i) willingness to accept migrants and (ii) extension of migrants' rights...

Milanovic then goes on to illustrate this relationship with a simple diagram, noting that:

...it seems reasonable to believe that there is a kind of demand curve for migrants, where the demand is less when the cost of migrants, in terms of the rights and sharing of the citizenship premium they can claim, is greater.

This 'demand curve' relationship that Milanovic describes is shown in the diagram below (by the line D). Milanovic distinguishes between two cases, represented by two points on the demand curve in the diagram: (1) high on the curve (at point A), where migrants have extensive rights, but the native-born population would desire very few migrants; and (2) low on the curve (at point B), where migrants have few rights, and the native-born population are willing to accept more migrants.

However, here is where Milanovic misses an opportunity. Yes, there may be such a demand curve for migrant flows. However, there is also a corresponding supply curve, constructed from the decisions of the potential migrants themselves. Ceteris paribus (holding all else constant), migrants would desire to go to destinations where they would have greater citizenship rights. In other words, the supply of migrants in this model is upward sloping, as shown in the diagram below (by the line S).

Now consider Milanovic's two cases. The first case, where migrants are offered extensive rights, is illustrated in the diagram below. Consider migrant rights of R1. The native-born population desires very few migrants (MD1), but the number of migrants who want to migrate to such an attractive destination is high (MS1). There will be conflict. There is an excess supply of migrants (the difference between MD1 and MS1). The native-born population feels like they are being overwhelmed by migrants who are taking advantage of the rights of citizenship that they have not 'earned'. The native-born population agitates, and the government relents, eventually by offering fewer citizenship rights to migrants. This continues until rights reach R0. This is the equilibrium amount of citizenship rights. The equilibrium migration flow is M0.

Now consider Milanovic's second case, where migrants are offered few rights, which is illustrated in the diagram below. Consider migrant rights of R2. The native-born population is willing to accept many migrants (MD2), but the number of migrants who want to migrate to such an unattractive destination is low (MS2). There will be few migrants, and the economy may suffer as a result. There is an excess demand for migrants (the difference between MS2 and MD2). The government wants to attract more migrants, so they begin to offer migrants more rights. This continues until rights reach R0. This is the equilibrium amount of citizenship rights. The equilibrium migration flow is again M0.

Having established equilibrium migrant rights and migration flows, we can now use the model in much the same way as the standard model of demand and supply. Consider some exercises in comparative statics (the movement from one equilibrium to another). If there is an exogenous increase in the supply of migrants, such that more migrants are willing to migrate at each and every level of rights, then the supply curve shifts to the right. The equilibrium level of migrant rights will fall. If populist rhetoric reduces the willingness of the native-born population to accept migrants, then the demand curve shifts to the left. The equilibrium level of migrant rights will fall. And so on.

Obviously, the model is not a perfect description of the relationship between migrant rights and migration flows. However, Milanovic could easily have built up this model (as I have done above) and used it more extensively to explore the relationship here. By ignoring the role that migrants' choices play (the supply curve in the model above), Milanovic suggests that only the demand curve matters. That is, that only the choices of the native-born population will affect migration. Clearly, that is an incomplete description. The level of citizenship rights that migrants receive will depend on government actions, and the forces (upward or downward) that impact those actions depend on both the native-born population and the migrants.

Even this post has, I think, barely scratched the surface of the utility of a model like this, to understand the politics of migration flows and migrants' rights. I'm sure that there is much more that can be done with this.

Saturday, 25 October 2025

Book review: Capitalism, Alone

I've long been a fan of Branko Milanovic's careful and detailed work on global inequality. I've written several posts based on his work (most recently my review of his 2016 book Global Inequality, in 2023). So, I was interested to see his take on capitalism, as expressed in his 2019 book, Capitalism, Alone.

The title references that, after the fall of communism in the early 1990s, capitalism remained the only game in town. To support this claim though, Milanovic makes the case that China, Russia, Vietnam, and other countries with similar political systems are really capitalist. Milanovic draws on the political philosophy of Max Weber in defining 'political capitalism' as the state-led authoritarian capitalism exemplified by China, distinguishing it from the Western tradition of 'liberal meritocratic capitalism' exemplified by the US and western Europe.

The book starts by drawing the distinction between political capitalism and liberal meritocratic capitalism, illustrating the development of both with data on incomes and inequality, as you would expect given Milanovic's pedigree. I really enjoyed this section, especially where Milanovic outlines how both a liberal view of history and a Marxist view of history fail to explain key events. The liberal view expects capitalism to converge of liberal-democratic norms and peaceful growth, but that view fails to adequately explain World War I. On the other hand, the Marxist view that communism would replace capitalism was contradicted by the reversion of former communist countries in the Russian sphere to political capitalism. As Milanovic concludes:

We thus reach the conclusion that two of the most important events in the global history of the twentieth century, World War I and the fall of communism, cannot both be consistently explained within the liberal or Marxist paradigms. The liberal paradigm has problems with 1914, the Marxist paradigm with 1989.

The first few chapters are backward looking, and set a solid foundation. Milanovic then turns his attention to the interaction between capitalism and globalisation, as a segue into thinking about the future of capitalism. I found these latter chapters of the book to be somewhat uneven. Some parts are well thought out and interesting, such as the discussion on the free movement of factors of production, particularly migration (although I think more could have been made of a particular model that Milanovic uses, and I may follow up on that in a future post).

Other parts of the last two chapters seem to be a collection of anecdotes, musings, and speculation, lacking much in the way of theoretical or empirical grounding. This part of the book does provide a broad synthesis, but really lacks the depth of the earlier chapters, or of Milanovic's earlier writings. As just one example, in the section on migration and the welfare state, Milanovic discusses the political consequences of close links between the welfare state and citizenship. There is a related literature from on the work of Elinor Ostrom on the challenges of common governance when a group is made up of heterogeneous sub-groups (migrants and the native born) that Milanovic could have drawn on to give more depth to this section. Again, a missed opportunity.

Overall, I did enjoy reading the book. It is thought-provoking and even the latter sections were a nice read, despite their relative shallowness. However, reading the concluding chapter gives one a feeling that all is not well with capitalism, now and in the future. Milanovic tries to present a vision of 'the people's capitalism', but what I took away from that discussion was just how far things would need to move to make that vision a reality. And sadly, since the writing of this book we have gotten no closer to Milanovic's ideal, and in many respects we are farther from it than we have been in a long time. Capitalism may now be alone, but its victory is no assurance of a future worth celebrating.

Friday, 24 October 2025

This week in research #98

Here's what caught my eye in research over the past week (another very quiet week, it seems):

  • Wilkinson discusses New Zealand's Regulatory Standards Bill, concluding that the ultimate test of the Bill will be whether it can meaningfully improve regulatory quality while maintaining political sustainability
  • Mitrut et al. investigate how ethnic self-identification varies with education among the Roma minority in Romania, finding that Roma identification strongly declines with education, from 80% for those with no education to 40% for postsecondary graduates

Wednesday, 22 October 2025

If your writing is being evaluated by generative AI, you should be getting generative AI to do your writing

Back in 2023, I wrote about the impact that ChatGPT would have on online dating. I think I've seriously undersold the idea of large language models talking on our behalf to other large language models. The broader point is illustrated well in this new working paper by Jiannan Xu (University of Maryland), Gujie Li (National University of Singapore) and Jane Yi Jiang (Ohio State University). They look at the idea of 'AI self-preference' and its impact on hiring practices, defining AI self-preference as:

...the inclination of a model to favor content it generated itself over that written by humans or produced by alternative models...

So, if ChatGPT prefers resumes written by ChatGPT rather than those written by humans, that would be AI self-preference. Given the context of hiring, Xu et al.:

...examine whether LLMs, when deployed as evaluators, systematically favor resumes they generated themselves over otherwise equivalent resumes written by humans or produced by alternative models. To test this, we construct a largescale resume correspondence experiment using a real-world dataset of 2,245 human-written resumes, sourced from a professional resume-building platform prior to the widespread adoption of generative AI. For each resume, we generate multiple counterfactual versions using a range of state-of-the-art LLMs, including GPT-4o, GPT-4o-mini, GPT-4-turbo, LLaMA 3.3-70B, Mistral-7B, Qwen 2.5-72B, and Deepseek-V3. Having content quality controlled, we assess whether these LLMs exhibit systematic bias in favor of their own outputs when acting as evaluators.

There is a lot of depth in the paper, and I encourage you to read it. However, I just want to focus on their headline results, which come from getting each model to choose between a resume where it wrote the executive summary itself and a resume where the executive summary was written by a human (or, in other comparisons, written by another AI model). The only part of the resume that was AI-generated in each case was the executive summary. To be clear, Xu et al. didn't get each model to compare the exact same resume (with different executive summaries), but two different resumes, one with an AI-generated summary and the other written by a human. Anyway, these comparisons allow Xu et al. to determine whether each AI model prefers its own output over others. And the results are striking:

...most LLMs exhibit strong self-preferencing behavior. Notably, larger or more aligned models—such as GPT-4-turbo, GPT-4o, GPT-4o-mini, DeepSeek-V3, Qwen 2.5-72B, and LLaMA-3.3-70B—demonstrate an overwhelming preference for their own outputs, with self-selection rates exceeding 96%. These high rates translate into substantial statistical parity self-preference biases exceeding 92%. In contrast, smaller or less aligned models—such as Mistral-7B, LLaMA-3.2-3B, and LLaMA 3.2-1B—display substantially lower self-preferencing bias.

Only the smallest model, LLaMA 3.2-1B showed a preference for human-generated resumes. All other models preferred their own. When given the choice between a resume where they wrote the executive summary and one where a human wrote the executive summary, the resume with the AI-generated summary gets selected over 90 percent of the time. Xu et al. go on to show that this descriptive comparison continues to hold even after controlling for the quality of resume, as well as linguistic quality and textual similarity. In those comparisons:

Larger systems—such as GPT-4o, GPT-4-turbo, DeepSeek-V3, Qwen-2.5-72B, and LLaMA 3.3-70B—exhibit particularly strong bias, exceeding 68% even after controlling for content quality and reaching over 80% for GPT-4o, Qwen-2.5-72B, and LLaMA 3.3-70B.

So, the self-preference compared to humans isn't because the models write higher-quality summaries. They really do just prefer things that they wrote themselves. Which shouldn't be surprising - like any human, they write in a style that they prefer. And so, when evaluating they also choose the style that they prefer. However, turning to the self-preference for models own writing in comparison with the writing of other AI models. the results are more mixed. Some models show a self-preference while others do not.

Does any of this matter? Xu et al. show that their results have practical significance by running a simulation, which shows that:

...candidates using the same LLM as the evaluator are about 15–68% more likely to be shortlisted than equally qualified applicants submitting human-written resumes. The disadvantage is most severe in business-related fields such as accounting, sales, and finance, and less pronounced in areas like agriculture, arts, and automotive.

Finally, Xu et al. show that the self-preference can be mitigated using two strategies:

The first strategy uses system prompting to explicitly instruct models to ignore the origin of resumes and focus only on substantive content. The second strategy employs a majority voting ensemble, combining the evaluator model with smaller models that exhibit weaker self-recognition, thereby diluting the bias of any single LLM. Across all tested LLMs, these interventions reduce LLM-vs-Human self-preference by more than 60%...

On these mitigation strategies, it is important to note that they are strategies that only the evaluator is in a position to apply, not the person whose resume is being evaluated. Xu et al. are silent on what the evaluated person should do to avoid being disadvantaged by AI self-preference.

However, despite Xu et al.'s silence on this, the implications of their results are pretty clear for job applicants. Get an AI model to write your resume executive summary! If you write the summary yourself, then your resume will be disadvantaged relative to candidates who used generative AI. If you use generative AI, but a different model than the model that is doing the evaluating, your resume will still be advantaged relative to human-written resumes. But if you happen to use the same model as the model doing the evaluating, then you maximise your advantage. That suggests a good strategy may be to try and find out which generative AI the evaluators will be using. Do they use Google Workspace? Get Gemini to write your resume. Do they use Office 365? ChatGPT might be a better option.

This extends much further than job applications and resumes. Any writing that is likely to be evaluated by generative AI will be advantaged if it is also written by generative AI. Is your promotion application going to be vetted by generative AI? Get generative AI to write your promotion application for you. Is your award application going to be shortlisted using generative AI? Get generative AI to write your award application for you. Is your research paper going to be reviewed by generative AI? Get generative AI to write your research paper for you. Is your essay or dissertation going to be graded by generative AI? Get generative AI to write your essay or dissertation for you. [*]

And that brings us full circle. If your dating profile is going to be evaluated by your potential match's generative AI, get generative AI to write your dating profile. And if your responses to conversations in the dating app are being evaluated by generative AI? You guessed it. Generative AI should be writing your dating app conversations for you.

[HT: Marginal Revolution]

*****

[*] Note for my research students: Your dissertation will not be graded by generative AI. So, getting generative AI to write your dissertation for you is not a winning strategy.

Tuesday, 21 October 2025

Will large language models become gambling addicts?

Concerns about algorithmic bias predate the development of large language models. Cathy O'Neill wrote an entire book, Weapons of Math Destruction (which I reviewed here back in 2017), that outlines the problems. Many of the issues that O'Neill raised have become the worries that many commentators express about large language models. In particular, there is concern that because large language models are trained on a corpus of human writing, that they have internalised human biases. The worst aspects of that are trained out of models, but it is clear that training is imperfect, and some biases remain.

In a new working paper, Seungpil Lee, Donghyeon Shin, Yunjeong Lee, and Sundong Kim (all Gwangju Institute of Science and Technology) ask the question, can large language models develop gambling addiction? The question isn't as crazy as it at first seems. If large language models have internalised human biases, then perhaps they have internalised heuristics and behaviours as well, including humans' less than rational approach to gambling. Lee et al. focus on particular aspects of gambling behaviour:

From a behavioral perspective, the core features of gambling addiction are loss chasing and win chasing. Loss chasing refers to continuing to gamble to recover losses from gambling, and is one of the DSM-5 diagnostic criteria for gambling disorder... Win chasing is explained by the House Money Effect, where winnings from gambling are perceived not as one’s own money but as free money, leading to riskier betting...

They also note that:

Representative examples of gambling-related cognitive distortions include the following. First, misunderstandings about probability, including gambler’s fallacy (the belief that “it’s my turn to win” after a losing streak) and hot hand fallacy (the belief that a winning streak will continue)...

Lee et al. analyse gambling behaviour by getting a selection of large language models to repeatedly play a slot machine game, where the game had a 30 percent chance of paying out three times the amount that was staked. That game has a negative expected value, since on average over many plays of the game the average is negative 10 percent [*]. As they explain:

...this study applied a slot machine task with a negative expected value (−10%) to four different LLMs: GPT-4o-mini (OpenAI, 2024b), GPT-4.1-mini (OpenAI, 2024a), Gemini-2.5-Flash (Google, 2024), and Claude-3.5-Haiku (Anthropic, 2024). A 2 × 32 factorial design was employed to manipulate two factors: Betting Style (fixed $10 vs. variable $5–$100) and Prompt Composition (32 variations). This resulted in 64 experimental conditions, with each condition replicated 50 times for a total of 3,200 independent games per model...

The experimental procedure began with an initial capital of $100, with the slot machine set to a 30% win rate and a three times payout. The LLM was presented with a choice to either bet or quit; in rounds subsequent to the first game, information about the current balance and recent game history was also provided.

Lee et al. then look at the behaviour of each model, in particular whether each model engaged in win chasing (by increasing the size of bets, or being more likely to continue playing, if the model was on a winning streak), or loss chasing (the same, but for losing streaks). They create more complicated indices, but I think the simpler approach is more intuitive and pretty clear, as summarised in Figure 5 in the paper:

Across all the models on average (which is what the figure shows), there is substantial continuation (in the right panel). The models tend to want to play again, regardless of whether they are winning or losing, or on a streak. In the left panel, there is strong evidence of win chasing (the green bars). The models increase their bet when they have a win streak. Even worse, the longer a win streak continues, the more likely it is that models will increase the size of their bet. There is evidence of loss chasing as well (the red bars), but the effect isn't accelerating in the way that it is for win chasing. Large language models' gambling behaviour exhibits at least some of the features that human gambling behaviour does.

Interesting, the more complex the prompt, the more the models exhibited these behaviours. Lee et al. then go on to show that these effects are distinguishable in terms of 'distinct neural patterns' within the model. I won't pretend to understand the intricacies of the computer science there, but essentially they establish that it is the underlying model itself, rather than just the prompt, that drives the gambling behaviours. Lee et al. conclude that:

These findings reveal that AI systems have developed human-like addiction mechanisms at the neural level, not merely mimicking surface behaviors.

I guess the takeaway is that the more that generative AI is training on a corpus of human knowledge and writing, the more like humans they become. Perhaps White Zombie said it best [**]:

[HT: Marginal Revolution]

*****

[*] To calculate the expected value of this gamble, we take each outcome and multiply by the probability that it occurs, then add them up. In this case, the outcomes and probabilities are a 30 percent probability of receiving 300%, and a 70% probability of receiving zero. The expected value E(X) = 0.3 x 3 + 0.7 x 0 = 0.9. So, for every one dollar staked, the expected payout is 90 cents - the gambler on average receives less back than what they bet. This is characteristic of most real-world gambles such as Lotto, casino roulette, and so on.

[**] Fittingly, this song was inspired by the movie Blade Runner. The song's title, "More human than human" was the motto of the Tyrell Corporation in the movie. The song samples audio from the movie, and directly quotes it in a few places in the lyrics.

Monday, 20 October 2025

The impacts of the Norwegian middle school mobile phone ban

New Zealand introduced a mobile phone ban in schools in April 2024. The reason the government gave for the ban was that it would remove a source of distractions, and that would improve student achievement and wellbeing. Eighteen months on, it would be fair to say that views on the ban vary widely.

It's a little early to evaluate whether the New Zealand ban is working in terms of student achievement and wellbeing, but we do have evidence from other countries. One example is this discussion paper by Sara Abrahamsson (Norwegian Institute of Public Health), which investigates the impact of a ban on mobile phones in middle schools (grades 8 to 10) in Norway. There was no blanket policy on mobile phones across all of Norway, so Abrahamsson takes advantage of variation in mobile phone policies across schools and over time. Importantly, the study investigates the impacts on students' mental health, bullying, and educational outcomes. This involved using Norwegian Registry data on education (which includes middle school grades) and healthcare (which includes visits to GPs and psychologists), along with data on mobile phone policies collected through a survey of schools. The student sample covers students who finished grade 10 between 2010 and 2018, across 477 schools. The event study design essentially involves comparing students at schools with a mobile phone ban with students at schools without a mobile phone ban.

First, in terms of mental health, Abrahamsson finds that:

...banning smartphones reduces the number of consultations for psychological symptoms and diseases at specialist care, by about 2–3 visits during middle school years when exposed for full-time in middle school. Relative to pretreatment this is a significant decline by almost 60% in the number of visits. In addition, girls have fewer consultations with their GP due to issues related to psychological symptoms and diseases – a decline by 0.22 visits – or 29% decline relative to the pretreatment mean. However, I find no effect on students’ likelihood (extensive margin) of being diagnosed or treated by specialists or GPs for a psychological symptom and diseases.

Notice that the positive impacts are concentrated amongst girls. There are no statistically significant effects on boys' mental health outcomes. Turning to bullying, Abrahamsson only has school-level (not student-level) data. However, she finds that:

...banning smartphones lowers the incidence of bullying for both girls and boys when they are exposed from the start of their middle school years to a ban.

All positive so far. Does student achievement also improve? Abrahamsson finds that:

...post-ban, girls exposed to a smartphone ban from the start of middle school make gains in GPA, average grades set by teachers, and externally graded mathematics exams. Post-ban girls gain 0.08 standard deviations in GPA, and 0.09 standard deviations in teacher-awarded grades and have 0.22 standard deviations higher mathematics test scores compared to girls not exposed to a ban... Additionally, girls are 4-7 percentage points more likely to attend an academic high school track after experiencing a ban. This effect amounts to an 8–14% point increase in the probability of attending an academic high school track relative to the pre-ban years.

Again, there are no statistically significant effects for boys. All of this seems to fit with many people's priors, that mobile phones are a distraction for students, and particularly for female students. Digging a little deeper into the heterogeneity of effects, Abrahamsson finds that:

...health care take-up for psychological symptoms and diseases, GPA, teacher-awarded grades, and the probability of attending an academic high school track is larger for girls from low socioeconomic backgrounds.

In fact, the effects for girls from high socioeconomic backgrounds are statistically insignificant, so all of the significant overall effects are being driven by girls from low socioeconomic backgrounds. Abrahamsson doesn't offer a good explanation for these heterogeneous effects. Perhaps girls from high socioeconomic backgrounds face greater parental limits on mobile phone use, have different behavioural norms at school and/or at home, have more extracurricular activities that distract them from their distracting mobile phones, or have more effective personal coping strategies. Clearly, there is room for more follow-up work on why the benefits of the mobile phone ban are concentrated within the subgroup of girls from low socioeconomic backgrounds.

Abrahamsson concludes that:

...banning smartphones from the classroom is an inexpensive tool with sizable effects on student’s mental health and educational outcomes.

The conclusion is sound based on the results. Despite the different context, there are some important takeaways for New Zealand. No doubt researchers are already evaluating the mobile phone ban and its impacts. However, we need to know more than just whether it worked. We need to know whether it worked in different ways for different groups of students (by gender, and by socioeconomic background), and importantly we need to know why.

[HT: Marginal Revolution, last year]

Saturday, 18 October 2025

Book review: The Rise of the Western World

It is somewhat fitting that, in a week where Joel Mokyr won the Nobel Prize in economics, I was just finishing up reading a book on economic history. It wasn't one of Mokyr's books though (although they are on my to-be-read list now). It was The Rise of the Western World, by Mokyr's fellow Nobel laureate Douglass North, co-authored with Robert Thomas.

A lot of economic history focuses on the Industrial Revolution. North and Thomas focus their attention earlier, on the period between the High Middle Ages and the Enlightenment, between 900 CE and 1700 CE. This period is of interest because it was when western Europe emerged from a feudal society into more modern political states, and during which property rights over land increasingly developed. These changes formed the antecedents to the Industrial Revolution that was to come. North and Thomas summarise this in the introduction to the book:

Economic growth occurs if output grows faster than population. Given the described assumptions about the way people behave, economic growth will occur if property rights make it worthwhile to undertake socially productive activity. The creating, specifying and enacting of such property rights are costly, in a degree affected by the state of technology and organization. As the potential grows for private gains to exceed transaction costs, efforts will be made to establish such property rights. Governments take over the protection and enforcement of property rights because they can do so at a lower cost than private volunteer groups. However, the fiscal needs of government may induce the protection of certain property rights which hinder rather than promote growth; therefore we have no guarantee that productive institutional arrangements will emerge.

My ECONS102 students will probably recognise important elements in there from their lectures, including property rights, institutions, and transaction costs. Indeed, I learned a lot from reading this book that will help to better articulate and link those points, as well as bringing in not only this work, but the work on common governance by Elinor Ostrom.

The majority of the book works through the period chronologically, developing the ideas and presenting supporting data where needed. It finishes by considering differences in institutions and economic growth performance in the 'early modern period' between England, France, Spain, and the Netherlands. The writing can be a little dry, but it is neither heavily data-driven nor overly theoretical. However, economic theory definitely underpins the key ideas in the text. Consider this bit on growth in the Middle Ages:

...we suggest that a growing population was the exogenous variable that basically accounts for the growth and development of Western Europe during the high Middle Ages. An expanding population in a local area would eventually encounter diminishing returns to further increases in the size of the labor force. Part of the increased labor force would as a consequence migrate to take up virgin land in the wilderness, thus extending the frontiers of settlement. However, the density of habitation would still be greater in the older areas than on the frontier, and this differential, resulting in a variation of land-to-labor ratios between areas, when coupled with regional differences in natural resource endowments, would lead to different types of production. Such variances would allow profitable exchanges of products between regions. We submit, therefore, that the development and expansion of a market economy during the Middle Ages was a direct response to the opportunity to gain from the specialization and trade made feasible by population growth.

North and Thomas don't pause to explain the economic terms in their exposition. So, while this book is readable to a non-economist, it seems to me that only those with a bit of economic training will truly appreciate the ideas presented in the book. Indeed, that is almost certainly the audience that North and Thomas had in mind. For those with a little bit of economic knowledge, and an interest in economic history, there is a lot to be gained from reading this book. I recommend it, especially to anyone looking to see the story up to the point where the economic historians specialising in the Industrial Revolution take over.

Friday, 17 October 2025

This week in research #97

Here's what caught my eye in research over the past week (which was relatively quiet, it seems):

  • Hua and Humphreys find that Baseball Hall of Fame election causes players to live about two years longer than nominated players who are never elected to the Hall of Fame, which they interpret as evidence that higher socioeconomic status causes improvements in health
  • Whelan (open access) develops a theoretical model that shows that the presence of bettors with inside information reduces odds but does not necessarily exacerbate favourite-longshot bias in sports betting

Tuesday, 14 October 2025

Nobel Prize for Joel Mokyr, Philippe Aghion, and Peter Howitt

Joseph Schumpeter never won the Sveriges Riksbank Prize in Economic Sciences in Memory of Alfred Nobel (aka the Nobel Prize in Economics), because he passed away in 1950. However, his key ideas such as innovation-driven growth and creative destruction clearly planted the seeds that led to the work of this year's Nobel Prize winners, Joel Mokyr (Northwestern University), Philippe Aghion (Collège de France and INSEAD), and Peter Howitt (Brown University). Mokyr received his prize "for having identified the prerequisites for sustained growth through technological progress", while Aghion and Howitt received the prize "for the theory of sustained growth through creative destruction".

I have to admit that I have never read any of Aghion and Howitt's work, and although I am aware of Mokyr's work and have a keen interest in economic history (although it's not an area of active research for me), I haven't read in detail any of Mokyr's work either. The Nobel Prize Committee's citation noted:

Joel Mokyr used historical sources as one means to uncover the causes of sustained growth becoming the new normal. He demonstrated that if innovations are to succeed one another in a self-generating process, we not only need to know that something works, but we also need to have scientific explanations for why. The latter was often lacking prior to the industrial revolution, which made it difficult to build upon new discoveries and inventions. He also emphasised the importance of society being open to new ideas and allowing change.

Philippe Aghion and Peter Howitt also studied the mechanisms behind sustained growth. In an article from 1992, they constructed a mathematical model for what is called creative destruction: when a new and better product enters the market, the companies selling the older products lose out. The innovation represents something new and is thus creative. However, it is also destructive, as the company whose technology becomes passé is outcompeted.

In different ways, the laureates show how creative destruction creates conflicts that must be managed in a constructive manner. Otherwise, innovation will be blocked by established companies and interest groups that risk being put at a disadvantage.

John Hawkins also has an excellent article in The Conversation today that summarises their work. He also notes the lack of women, with this being the second consecutive prize awarded to a group of three men, and only three women in total having ever won the prize. Hawkins suggests that Rachel Griffith could have easily replaced Mokyr as a laureate. However, that does a disservice to Mokyr, whose stellar work, particularly on the importance of knowledge in explaining the Industrial Revolution and economic growth more generally, had been increasingly tipped for a prize in recent years.

Despite any gripes about who missed out this year, Mokyr, Aghion, and Howitt are worthy winners. You can read more about their work in the popular science summary on the Nobel website. Tyler Cowen also has some nice (and short) comments here.

Sunday, 12 October 2025

The supply side of media bias

In my ECONS102 class, we cover economic explanations for media bias. Drawing on past research from Matthew Gentzkow and others, we demonstrate that there are demand-side explanations (media bias arises because of a bias in the preferences of the news-consuming public) and supply-side explanations (media bias arises because media firms segment the market, and focus on different segments). I even shared my teaching approach in this article published in the Journal of Economic Education in 2023 (ungated earlier version here).

I'm always on the lookout for more research in this space, which can add to our understanding (and potentially flesh out more details for my teaching of media bias). One example is this 2024 working paper by Tin Cheuk Leung and Koleman Strumpf (both Wake Forest University). They focus on the supply side of the media market, and their method:

...evaluates the factors which shape article duration on a newspaper’s digital homepage. Homepage articles have a privileged position, as it is the starting point for most readers which in turn drives views and shares of the featured articles. One of the most important editorial decisions is selecting how long each article remains at this prized location. After adjusting for demand-side effects using reader engagement data, we use prolonged homepage presence of ideologically slanted articles as indicative of supply-side biases.

Leung and Strumpf focus on articles in the New York Times (a more liberal media outlet) and the Wall Street Journal (a more conservative media outlet), using data from August 2021 to May 2023 for the NYT, and from October 2022 to May 2023 for the WSJ. They also look at data from Twitter (about 22 million tweets for the NYT sample, and about 2 million tweets for the WSJ sample). First, they use textual analysis to determine the sentiment and political leaning of each article from each of the media outlets. From this, they find:

...a liberal bias in NYT articles, with an average pro-Democrat score of 0.6 (where 1 represents full pro-Democrat alignment and 0 indicates complete pro-Republican alignment), while WSJ articles exhibited more neutrality, with an average score of 0.5.

Next, Leung and Strumpf look an article's presence on the homepage of each media outlet, and how that relates to tweet count (as a proxy measure of the article's popularity). Specifically, they evaluate:

...the causal impact of homepage presence on tweet count, using time-of-day homepage updating patterns as an instrumental variable... There are distinctive hours of the day in which there are large purges of homepage content, likely associated with editor shift changes. These hours occur throughout the day and so are unrelated to article newsworthiness though they do shape which articles are on the homepage. After instrumenting we find that tweet counts for articles increased significantly when featured on the homepage (35% increase for NYT articles and 162% for WSJ articles).

Unsurprisingly then, articles that appear on the homepage are more popular, receiving far more attention from the public. Next, Leung and Strumpf apply survival models to investigate the factors associated with how long an article remains on the media outlet's homepage. First, unsurprisingly, popularity matters:

A key finding from our analysis is the significant impact of demand-side factors. There is a strong negative relationship between tweet counts at t−1 (a proxy for article popularity) and the hazard rate (the likelihood of an article being removed from the homepage). For the NYT, a one standard deviation increase in log tweets (1.37) leads to 2.5% decrease in the hazard rate. This effect is even more pronounced for WSJ, where a one standard deviation in log tweets (1.09) leads to a 10.0% decrease in the hazard rate. This indicates that more popular articles, as measured by tweet counts, tend to remain on the homepage longer, underscoring the role of reader demand in determining homepage duration.

The 'hazard rate' is the rate at which an article is removed from the homepage. So, a lower hazard rate means that the article remains on the homepage for longer. The demand side (popularity) matters, but the supply side matters as well:

Supply-side factors are particularly influential for the NYT. A one-standard-deviation increase in an article’s pro-Democrat score, indicating its political slant, is associated with a 2% decrease in the hazard rate, which translates into approximately an additional 30 minutes of homepage exposure. This relationship holds across different news tones, suggesting that a wide range of NYT’s editorial decisions on homepage duration are influenced by the article’s political alignment....

For the WSJ, the influence of supply-side factors is generally negligible, both overall and for each news tone. The pro-Democrat scores of articles do not significantly affect their duration on the homepage, contrasting with the NYT, where political slant plays a noticeable role.

So, the NYT's approach to its homepage is affected by the supply side, but the WSJ's approach is not. Leung and Strumpf suggest that:

This suggests that the WSJ might prioritize profit-maximizing objectives over ideological considerations (recall demand-side effects are also larger for the WSJ).

That would be consistent with the models I teach in my ECONS102 class. A profit-maximising media firm should base its' decision about where to locate its content on the liberal-conservative scale based on the relative demand from the news-consuming public, and not at all based on the editorial preferences of the owners or editors. Leung and Strumpf are essentially suggesting that this is true of the WSJ, but not the NYT.

Interestingly though, sentiment matters as well:

The sentiment of an article, captured through abstract sentiment scores, is notably linked to its duration on the homepage. For the NYT, a one standard deviation increase in the abstract sentiment scores is associated with a 2% decrease in the hazard rate, indicating that more positively toned articles are likely to remain on the homepage longer. The effect is even more pronounced for the WSJ, where a similar increase in sentiment scores corresponds to a 4% decrease in the hazard rate.

Whatever happened to 'if it bleeds, it leads'? Since more positively framed news stays on the homepage for longer, that old mantra may be on the way out. Overall, it seems that both demand and supply sides affect which articles receive prominence on the homepage of the NYT, and that for the WSJ it is mostly the demand side. In the latest version of their paper [*], Leung and Strumpf conclude by noting the importance of understanding the supply side of media bias, in an era when local newspapers are closing, increasing the risk of media bias.

[HT: Marginal Revolution, last year]

*****

[*] In the January 2024 version of the paper, which is the one I read some time back, Leung and Strumpf had clearly used ChatGPT (or some other LLM) to write the conclusion, with hilarious results, including a 'rich tapestry', a 'symbiotic relationship', an 'intricate balance', a 'multifaceted realm', 'delving', and a 'deep dive'. Thankfully, the latest version appears to have resolved this issue.

Saturday, 11 October 2025

Exploiting the quasi-rationality of online gamers buying loot boxes

If you've played online games at any time in the last few years, even the humblest mobile phone game, you've probably been confronted with 'loot boxes' - a virtual container in a game that, when opened, gives you a randomised set of rewards. These rewards might be cosmetic, they might be power-ups, or they might be in-game currency. Often, these loot boxes can be purchased with real money. There is increasing concern that loot boxes are a form of gambling, as the Belgian Government concluded back in 2018.

What drives gamers' willingness-to-pay for loot boxes, and are game developers able to use the design of loot boxes to induce gamers to overspend? Those are essentially the questions addressed in this recent article by Simon Cordes (University of Bonn), Markus Dertwinkel-Kalt (University of Münster), and Tobias Werner (Max Planck Institute for Human Development), published in the Journal of Economic Behavior and Organization (open access). Cordes et al. identify two key features of loot boxes. First, the odds of getting top rewards are often censored. They explain that:

As a specific example, consider the football simulation FIFA Ultimate Team, where gamers build a team of players that vary in strength. Gamers can buy packs that offer lotteries over players. The odds, however, are provided, if at all, only for a coarse set of intervals, bunching together players of very different strengths... At the extreme, the worst player in an interval is around 1000 times less valuable than the best player.

Second, gamers receive highly selective feedback on the rewards that other gamers have received. Cordes et al. note that:

In the mobile game Raid: Shadow Legends... for example, gamers receive a notification whenever another player wins a rare reward... This feature leads to a constant but selected stream of signals about rewards from loot boxes. As only rare rewards are reported, this provides them with a biased sample of the reward distribution.

Cordes et al. use an experimental design to tease out the extent to which censoring and selective feedback are associated with willingness-to-pay for loot boxes. As they explain:

Subjects repeatedly state their willingness-to-pay (WTP) for different monetary lotteries with three potential prizes, one of which is zero. In a Control condition, we transparently describe the odds of the lotteries and do not provide additional information to the subjects. We assume that this Control condition identifies a subject’s true WTP, and define overspending relative to this benchmark. We implement three treatments that capture the features of loot boxes discussed above. In Censored, subjects only learn the total probability of winning a non-zero prize, but not the exact probability of winning the highest prize. In Sample, we provide subjects with the full prize distribution and a selected sample thereof; that is, they observe the five highest outcomes in a sample of 400 draws. Finally, Joint combines both: subjects observe the censored prize distribution and a selected sample thereof. This last treatment resembles the current design of loot boxes most closely. Notably, our experimental design eliminates all features of loot boxes that may provide utility beyond winning a reward, such as a nice design or visual effects. Instead, we isolate the features of loot boxes that almost certainly do not affect a gamer’s material utility and can thus be interpreted as inducing mistakes.

Looking at their results, based on a sample of 617 respondents in the UK recruited through Prolific, Cordes et al. find that:

...both features substantially increase the willingness-to-pay for lotteries. Censoring the odds of a lottery increases a subject’s willingness-to-pay compared to a baseline treatment. Also, simply providing subjects with a selective sample of the reward distribution increases their willingness-to-pay. Combining censored odds with a selected sample increases the willingness-to-pay by 100%.

This explains why game developers use these tools. Selective feedback increased willingness-to-pay by 43 percent, censoring increased willingness-to-pay by 45 percent, and the combination of censoring and selective feedback doubles gamers' willingness-to-pay for loot boxes. All three experimental effects were statistically significant. Why does this happen? Cordes et al. examine the mechanisms underlying the difference in willingness-to-pay. Specifically, they asked research participants before each lottery how often they would win the top prize in 100 draws. When Cordes et al. then look at the results controlling for beliefs:

...the average WTPs in Censored, Sample, and Joint do not differ significantly from that in Control anymore. It demonstrates that censored odds and selected feedback increase the subjects’ WTPs by inflating their beliefs of winning a high prize.

In other words, the difference in willingness-to-pay is entirely driven by a difference in the belief of winning a top prize. Censoring and selective feedback manipulate gamers' beliefs about the probability of winning, causing them to overestimate their chances and therefore making them willing to pay more for the loot boxes. That suggests a simple solution as well: if gamers are provided with more transparent odds of winning, then that might reduce overspending. However, Cordes et al. conduct a 'robustness' experiment where they provide gamers with more information, and find that:

The additional information significantly decreases the average (conditional) belief of winning the high prize compared to Joint (𝑝 < 0.01). While the WTP in Info is also slightly below the one in Joint, the effect is not significant at the 10% level (𝑝 = 0.69). Importantly, both the belief and the WTP are significantly higher in Info compared to Control...

In other words, the extra information does appear to change the beliefs about winning the top prize, but doesn't affect the willingness-to-pay for the loot box. To me, it seems that the research participants might have intuited what the experiment was about and demonstrated that by changing their stated beliefs. However, their 'true' beliefs, as captured by their willingness-to-pay, were unaffected

Cordes et al. conclude that:

Our results support a case for regulating loot-box design, but it is not apparent what regulation would be effective. Current plans for regulation in Germany include labels for games with loot boxes... However, our results show that the design of loot boxes, rather than the random rewards they provide, encourages players to overspend. Hence, this regulation may not be effective in reducing overspending. While it should be easy to enforce a transparent display of odds, it is not clear that gamers will use this information when making their purchase decisions. Our robustness experiment, for instance, suggests that additional information may not affect WTPs. Moreover, even when learning the full probability distribution over many prizes, gamers might not act on it because it is simply too much information to be considered. Instead, regulators must find ways of communicating the odds of loot boxes in an easily understandable way...

I'm not convinced. Providing information is a great way of addressing the problem, if people were fully rational. People are not fully rational. They don't incorporate all available information when making a decision. At best, as Nobel Prize winner Herbert Simon observed, people are boundedly rational. The probability of winning the top prize needs to be made more salient for gamers, so that they will focus on that. Unfortunately, 'quasi-rational' decision-makers are also affected by positivity bias - they tend to be overly optimistic about things to do with themselves, including their probability of winning a top prize, even when confronted with the true probability. So, even if the probability is made more salient, perhaps gamers wouldn't take it into account anyway. It's all very challenging, and perhaps the best way for governments to protect gamers from overspending is to limit the number of loot boxes that can be purchased, or some other regulation.

Of course, this research was just one study. Perhaps future research will identify some silver bullet solution to gamers' quasi-rationality when it comes to loot boxes.

Friday, 10 October 2025

This week in research #96

Here's what caught my eye in research over the past week:

  • Andersson et al. (open access) analyse data from an experiment where they randomly assigned a male or female name to the instructions given by the online teachers in an introductory economics course in Sweden, and find that there is no bias against the female mentor in student evaluations of helpfulness, knowledge, or response time (an unusual result, given that student evaluations of teaching are well-known to exhibit gender bias)
  • Cuevas et al. (open access) use data on the frequency of 45,397 Facebook interests to study how the difference in revealed preferences between men and women changes with a country’s degree of gender equality, finding that, for preference dimensions that are systematically biased toward the same gender across the globe, differences between men and women are larger in more gender-equal countries, while for preference dimensions with a gender bias that varies across countries, the opposite holds
  • Gucciardi and Ruberti (open access) confirm that there is a home advantage in ATP tennis using data from 2000 to 2022, and that individualistic players exhibiting a stronger home advantage, unlike collectivistic ones

Wednesday, 8 October 2025

Governments need to be careful to avoid tax traps with high effective marginal tax rates

In yesterday's post, I referred to New Zealand's tax and transfer system, and its impact on inequality. One aspect I didn't refer to was the incentive effects of the system. These incentive effects are bound up in the effective marginal tax rate (EMTR), which is the amount of the next dollar of income a taxpayer earns that would be lost to taxation, decreases in rebates or subsidies, and decreases in government transfers (such as benefits, allowances, pensions, etc.) or other entitlements. A high EMTR creates a disincentive to earn more.

This is beautifully illustrated by this Financial Times article from March this year (paywalled):

How could a £1 pay rise leave you tens of thousands of pounds worse off? The answer is the childcare cliff edge in the UK tax system, which will get considerably steeper for higher-earning families from September.

The government’s expansion of free childcare provision in England this autumn means that working families with children aged under three will be able to claim 30 hours of government-funded childcare a week on top of the tax-free childcare scheme. Valuable benefits, but the bulk of this entitlement is lost if one parent’s adjusted net income is more than £100,000 per year.

In other words, earning more than £100,000 per year leads to a high EMTR. This is shown in the following figure from the article:

Notice how income after childcare expenses decreases markedly at £100,000 per year, and doesn't get back to the same level until income rises to nearly £150,000 per year. This creates a lot of negative incentives. The article gives several examples, one of which is:

Rob* works in tech. Since his daughter was born five years ago, he has turned down two promotions that would have taken his pay over £100,000 as he could not negotiate a high enough pay rise to compensate for the loss of childcare hours. Eventually, he quit his job and became a contractor. “This is riskier, but my earnings have jumped to the point where it is worth it,” he says. “My wife and I have decided to have no more children to maintain the quality of life we have with the one.”

The high EMTR causes people to avoid being promoted or take pay rises, to work less, change jobs, make riskier decisions, and avoid having more children. And all of that from a single example of one taxpayer.

The dumb thing is that this is not a new problem. The article notes that this threshold has been in place since 2017! That's more than long enough for the government to notice the negative incentive effects. The reason it has come to media attention now is that the threshold hasn't been changed in some time, and more and more families are being affected.

Governments need to be very cautious in setting up the tax and transfer system. While the system does generally reduce inequality, as yesterday's post showed for New Zealand, it can nevertheless create unintended consequences. Governments typically want people to work more and receive less assistance from the government. However, high EMTRs can create traps that keep people working less. The UK's childcare tax trap is unfortunately not unique in this.

Read more:

Tuesday, 7 October 2025

The impact of taxes and transfers on inequality in New Zealand

This week, my ECONS102 class covered inequality, and social security. Which is timely, because I have been meaning to blog about this Treasury Analytical Note from 2024, by Tod Wright and Hien Nguyen, for some time. Wright and Nguyen look at the distributional impact of taxes, transfers, and government spending (on healthcare and education).

Importantly, they distinguish between three conceptions of household income: (1) market income, which includes taxable income (including wages, income from self-employment and from investments) and non-taxable income (such as gifts and inheritances) [*]; (2) disposable income, which adjusts market income by subtracting direct taxes (such as income tax) and adding in transfers from government (such as income support payments); and (3) final income, which adjusts disposable income by subtracting indirect taxes (such as GST and excise taxes), and adding estimates of the government spending on health and education services that the household receives in kind. Looking at the difference in the income distribution (and measures on inequality) between market income, disposable income, and final income, gives a sense of how redistributive the tax and transfer system is.

I'm not going to get deep into the weeds on the methods. However, it is worth noting that the analysis is for the 2018/19 tax year, and makes use of Treasury's TAWA (Tax and Welfare Analysis) model, supplemented by data on indirect taxes paid by households from the Household Expenditure Survey (HES) The TAWA model is constructed from administrative data from Stats NZ's Integrated Data Infrastructure. For health and education spending:

We estimate education spending received by children and students based on their reported enrolment in educational institutions in HES. Health spending amounts are distributed over all individuals in HES in proportions determined by the Ministry of Health’s (MoH) Person-Based Funding Formula (PBFF) model... which assigns expected healthcare costs to a person based on their demographic characteristics.

The resulting income distributions are summarised in Figure 2 in the note, which shows the average income (under each of the three conceptions of income) for each income decile:

Notice that, for households in the bottom deciles, market income is low, disposable income is higher, and final income is highest. This reflects that they receive net transfers from the government (they receive more in transfers than they pay in direct taxes), so that disposable income is higher than market income. They also receive more in in-kind government spending (on health and education) than they pay in indirect taxes, so that final income is higher than disposable income. For high-income households, the pattern for market vs. disposable income is reversed (they pay more in direct taxes than they receive in transfers). However, only for the very top decile (the highest income households) does the payment of indirect taxes exceed the benefits received from in-kind transfers, so that final income is less than disposable income.

Overall, the distributions in Figure 2 show that taxes and transfers reduce inequality - there is less inequality in disposable income than market income, and less inequality in final income than disposable income. The effects of the different components of the tax and transfer system in reducing inequality is demonstrated in Figure 9 in the paper:

The coloured parts of the columns show the components that add to income (transfers, or income support, in orange; and in-kind benefits, in blue) and subtract from income (direct taxes, in grey; and indirect taxes, in yellow). The black point estimates in the centre of each column show the combined effect on income within that income decile. Income support declines by income decile, as you would expect, while in-kind benefits are fairly consistent. Direct and indirect taxes both grow with income. Overall, the bottom five quintiles (making up half of all households) receive more in transfers and in-kind benefits than they pay in taxes, while the top four quintiles (and especially the top quintile) pay more in taxes than they receive in transfers and in-kind benefits.

Finally, Wright and Nugyen show the effect on inequality, measured by the Gini Index, where:

...including income support benefits in the calculation results in the lowering of the Gini coefficient from its value of 45.6 ± 1.5 for market incomes to 35.8 ± 1.6 for gross incomes. The inclusion of direct taxes to form disposable incomes further reduces the Gini coefficient to 33.1 ± 1.5. The equalising effects of these contributions are partially offset by the inclusion of indirect taxes, which lead to a post-tax income Gini coefficient of 34.9 ± 1.6 – ie, whereas direct taxes reduce income inequality as quantified by the Gini coefficient, indirect taxes increase it. However, the inclusion of in-kind benefits in the final household income calculation has a significant redistributive impact, resulting in a drop in the Gini coefficient to 28.1 ± 1.4.

As you would expect given the data from the figures, the tax and transfer system substantially reduces measured inequality. That is exactly what it is expected to do, in a country with progressive income tax, a social safety net, and universal access to healthcare and education. There is far more detail in the analytical note, so if you are interested in how taxes and transfers affect the income distribution (or how they affect the distribution for retired and non-retired households separately), I encourage you to dig into it further.

[HT: Inside Government, and Offsetting Behaviour, both last year]

*****

[*] One major caveat for this analysis is that the non-taxable income excludes capital gains. It also includes imputed rent on owner-occupied dwellings, which should be included to better capture the distributional effects of home ownership.