Friday, 4 April 2025

This week in research #69

Here's what caught my eye in research over the past week:

  • Altindag, Cole, and Filiz (with ungated earlier version here) find that students' academic performance is better when their race matches their teachers, but that this is only true for students who are younger than their teacher, and not for students who are a similar age or older than their teacher (role models clearly matter)
  • Calamunci and Lonsky (open access) find that, between 1960 and 1993, an Interstate highway opening in a county led to an 8% rise in total index crime, driven by property crime (burglary, larceny, and motor vehicle theft)
  • Achard et al. (open access) find that individuals living close to newly installed refugee facilities in the Netherlands developed a more positive attitude towards ethnic minorities and became less supportive of anti-immigration parties compared to individuals living farther away

Thursday, 3 April 2025

Mobile phone providers and the repeated switching costs game

This week, my ECONS101 class covered pricing and business strategy, and one aspect of that is switching costs and customer lock-in. Switching costs are the costs of switching from one good or service to another (or from one provider to another). Customer lock-in occurs when customers find it difficult (costly) to change once they have started purchasing a particular good or service. The main cause of customer lock-in is, unsurprisingly, high switching costs.

As one example, consider this article from the New Zealand Herald last month:

A new Commerce Commission study has found the switching process between telecommunications providers is not working as well as it should for consumers...

The study found 50% of mobile switchers and 45% of broadband switchers ran into at least one issue when switching.

The experience was so bad that 29% of mobile switchers and 27% of broadband switchers said they wouldn’t want to switch again in future...

The commission’s latest consumer satisfaction report found that 31% of mobile consumers and 29% of broadband consumers have not switched because it requires ‘too much effort to change providers’...

Gilbertson said a lack of comprehensive protocols between the “gaining” service provider and the “losing” service provider was a central issue with the current switching process.

This led to a number of problems, including double billing, unexpected charges, and delays.

The difficulty of changing from one mobile phone provider to another is a form of switching cost. It's not a monetary cost, but the time, effort, and frustration experienced by consumers wanting to switch makes the process of switching costly. And because the process is costly, mobile phone consumers are locked into their current provider.

It is clear why a mobile phone provider would want to make it difficult (costly) for its consumers to switch away from it and use some other provider. However, why don't mobile phone providers try to make it easier to switch to using their service instead? Maybe they could have staff whose role is to help consumers to navigate the process of switching to their service. That would allow the mobile phone provider to attract consumers and capture a greater market share. The answer is provided by considering a little bit of game theory.

Consider the game below, with two mobile phone providers (A and B), each with two strategies ('Easy' to switch to, and 'Hard' to switch to). The payoffs are made-up numbers that might represent profits to the two providers.

To find the Nash equilibrium in this game, we use the 'best response method'. To do this, we track: for each player, for each strategy, what is the best response of the other player. Where both players are selecting a best response, they are doing the best they can, given the choice of the other player (this is the definition of Nash equilibrium). In this game, the best responses are:

  1. If Provider B chooses to make switching easy, Provider A's best response is to make switching easy (since 3 is a better payoff than 2) [we track the best responses with ticks, and not-best-responses with crosses; Note: I'm also tracking which payoffs I am comparing with numbers corresponding to the numbers in this list];
  2. If Provider B chooses to make switching hard, Provider A's best response is to make switching easy (since 8 is a better payoff than 6);
  3. If Provider A chooses to make switching easy, Provider B's best response is to make switching easy (since 3 is a better payoff than 2); and
  4. If Provider A chooses to make switching hard, Provider B's best response is to make switching easy (since 8 is a better payoff than 6).

Note that Provider A's best response is always to choose to make switching easy. This is their dominant strategy. Likewise, Provider B's best response is always to make switching easy, which makes it their dominant strategy as well. The single Nash equilibrium occurs where both players are playing a best response (where there are two ticks), which is where both providers make switching easy.

So, that seems to suggest that the mobile phone providers should be making switching to them easier. However, notice that both providers would be unambiguously better off if they chose to make switching hard (they would both receive a payoff of 6, instead of both receiving a payoff of 3). By both choosing to make switching easy, it makes both providers worse off. This is a prisoners' dilemma game (it's a dilemma because, when both players act in their own best interests, both are made worse off).

That's not the end of this story though, because the simple example above assumes that this is a non-repeated game. A non-repeated game is played once only, after which the two players go their separate ways, never to interact again. Most games in the real world are not like that - they are repeated games. In a repeated game, the outcome may differ from the equilibrium of the non-repeated game, because the players can learn to work together to obtain the best outcome.

So, given that this is a repeated game (because the providers are constantly deciding whether to make switching easier or not), both providers will realise that they are better off making switching harder, and receiving a higher payoff as a result. And unsurprisingly, that is what happens, and it doesn't require an explicit agreement between the players - the agreement is 'tacit' (it is understood by the providers without needing to be explicit). Each provider just needs to trust that the other providers will make switching hard (because there is an incentive for each provider to 'cheat' on this outcome). Any instance of cheating (by making switching easier) would be immediately known by the other providers, and the agreement would break down, making them all worse off. So, there is an incentive for all providers to keep switching hard for the consumers. Even a new entrant firm into the market, which might initially make it easy for consumers to switch to them in order to capture market share, would soon realise that they are then better off making switching more difficult (it is not so long ago (2009) that 2degrees was a new entrant in this market).

The Commerce Commission is correct that the difficulty of switching mobile phone providers (the switching cost) keeps consumers with their current provider (customer lock-in). The result is that the mobile phone providers can profit from increasing prices for their lock-in consumers. The only solution to this situation would be to find some way to force a breakdown of the tacit arrangement. Then the market would settle at the equilibrium of all providers making it easy to switch to them. This may be an instance where some regulation is necessary.

Tuesday, 1 April 2025

The emerging debate on Oprea's paper on complexity and Prospect Theory

Late last year, an article in the American Economic Review by Ryan Oprea caught my attention (and I blogged about it here). It purported to show that the key experimental results underlying Prospect Theory may in part be driven by the complexity of the experiments that are used to test them. These were extraordinary results. And when you publish a paper with extraordinary results, that could potentially overturn a large literature on a particular theory, then those results are going to attract substantial scrutiny. And indeed, that is what has happened with Oprea's paper.

The team at DataColada, most well-known for exposing the data fakery of Dan Ariely and Francesca Gino (and the resulting lawsuit, which was dismissed), have a new working paper, authored by Daniel Banki (ESADE Business School) and co-authors, looking at Oprea's results (see also the blog post on DataColada by Uri Simonsohn, one of the co-authors). To be clear before I discuss Banki et al.'s critique, they don't accuse Oprea of any misconduct. They mostly present an alternative view of the data and results that appears to contradict key conclusions that Oprea finds in his paper. Oprea has also provided a response to some of their critique.

I'm not going to summarise Oprea's original paper in detail, as you can read my comments on it here. However, the key result in the paper is that when presented with risky choices, research participants' behaviour was consistent with Prospect Theory, and when presented with choices that involved no risk at all but were complex in a similar way to the risky choices ('deterministic mirrors'), research participants' behaviour was also consistent with Prospect Theory. This suggests that a large part of the observed results that underlie Prospect Theory may arise because of the complexity of the choice tasks that research participants are presented with.

Banki et al. look at a number of 'comprehension questions' that Oprea presented research participants with, and note that:

...75% of participants made an error on at least one of the comprehension questions, such as erroneously indicating that the riskless mirror had risk.

Once the data from those research participants is excluded, Banki et al. show that research participant behaviour differs between lotteries and mirrors for the research participants who 'passed' the comprehension checks (by getting all four of the comprehension questions correct on their first try). This is captured in Figure 2 from Banki et al.'s paper:

The two panels on the left of Figure 2 show the results for the full sample, and notice that both lotteries (top panel) and mirrors (bottom panel) look similar in terms of results. In contrast, when the sample is restricted to those that 'passed' the comprehension checks, the results for lotteries and mirrors look very different. Which is what we would expect, if research participants are not 'fooled' by the complexity of the task.

Banki et al. provide a compelling reason why the results for the research participants who failed the comprehension checks looks the same for lotteries and mirrors: regression to the mean. As Simonsohn explains in the DataColada blog post, this arises because of the way that a multiple-price list works:

When the dependent variable is how much people value prospects, regression to the mean creates spurious evidence in line with prospect theory. When people answer randomly for 10% chance of $25, they overvalue it, because the “right” valuation is $2.50, and the scale mostly contains values that are higher than that. When people answer randomly for 90% chance of $25, they undervalue it, because the “right” valuation is $22.50 and the scale mostly contains values that are lower than that. Thus, random or careless responding will produce the same pattern predicted by prospect theory.

Oprea responds to both of these points, noting that:

...a range of imperfectly rational behaviors including noisy valuations, anchoring-and-adjustment heuristics, compromise heuristics and pull-to-the-center heuristics will all tend to produce prospect-theoretic patterns of behavior simply because of the nature of valuation. BSWW offer this possibility as an alternative to the Oprea (2024)’s account of his data, but in fact these are examples of exactly the types of cognitive shortcuts Oprea (2024) was designed to study.

In other words, Banki et al.'s results don't refute Oprea's results, but are very much in line with Oprea's. One thing that Oprea does take issue with is Banki et al.'s use of medians as the preferred measure of central tendency. Oprea uses the mean, and when reanalysing the data with the same exclusions as Banki et al., Oprea shows that the mean results look similar to the original paper. So, Banki et al.'s results are not simply driven by excluding the research participants who failed the comprehension checks, but also by switching from using the mean to using the median.

On that point, I'm inclined to agree with Banki et al. The median is often used in experimental economics, because it is less influenced by outliers. And if you look at Oprea's data, there are a lot of large outliers, which become quite influential observations when the mean is used as the summary statistic. However, the outliers are likely to be the observations you want to have the smallest effect on your results, not the largest effect.

Oprea also critiques Banki et al.'s interpretation of the comprehension questions. Oprea rightly notes that:

...it is important to emphasize that these training questions weren’t designed to measure beliefs (e.g., payoff confusion), and because of this they are poorly suited to the task BSWW repurpose it for, ex post. Indeed, evidence from the patterns of mistakes made in these questions suggests that overall training errors largely serve as a measure of the cognitive effort (an important ingredient in Oprea (2024)’s account) subjects apply to answering these questions, and that BSWW therefore substantially overestimate the level of payoff confusion with which subjects entered the experiment.

In other words, the 'comprehension questions' are not comprehension questions at all, but they are really 'training questions' that were used to train the research participants to understand the choice tasks that they would be presented with. And so, using those training questions overall as a measure of understanding misses the point, and seriously underestimates the amount of understanding of the task that research participants had by the time they had completed the training questions.

Oprea's response is good on this point. However, if the training questions had really done a good job of training the research participants, then all participants should have had a similar level of understanding by the end of the training questions, and there should be no detectable differences in behaviour between those with more, and those with fewer, 'failed' training questions. That wasn't the case - the behaviour of the research participants who made errors in training was much more likely to be the same for lotteries and mirrors than was the behaviour of research participants who made no errors. To clear this up, it would have been interesting to have research participants also complete 'comprehension questions' at the end of the experimental session, to see if they still understood the tasks they were being asked to complete. At that point, those failing the comprehension questions could be dropped from the dataset.

One point of Banki et al.'s critique that Oprea hasn't engaged with (yet, although he promises to do so in a future, more complete response), is their finding that a larger than 'usual' proportion of the research participants fail 'first order stochastic dominance' (FOSD). A failure of FOSD in this context means that a research participant valued a lottery (or mirror) lower than a similar lottery that was strictly better. For example, valuing a 90% chance of receiving $25 less than a 10% chance of receiving $25 is a failure of FOSD. Banki et al. show that:

We begin by examining G10 and G90. Violating FOSD here involves valuing the 10% prospect strictly more than the 90% one. Across all participants (N = 583), 14.8% violated FOSD for mirrors, and 13.9% for lotteries. These rates are quite high given that the prospects differ in expected value by a factor of nine.

Those failure rates are much higher than for other similar research studies. Banki et al. note an overall rate of 20.8 percent in the Oprea results, compared with an average of 3.4 percent across eight other highly cited studies. It will be interesting to see how Oprea responds to that point in the future.

This is an interesting debate so far. Oprea does a good job of summing up where this debate should probably go next:

Ultimately, however, these questions and ambiguities can only be fully resolved by further research. While BSWW’s critique has not convinced me that the interpretation offered in Oprea (2024) is mistaken, I am eager to see new experiments that deepen, alter, or even overturn this interpretation. First, concerns that the Oprea (2024)’s results are a consequence of the design being too confusing to yield insight can only really be resolved one way or another by followup experiments that vary his procedures, instructions and other design choices in such a way as to satisfy us that the Oprea (2024) results are (or are not) overfit to that design.

Indeed, more follow-up research is needed. Prospect Theory hasn't been overturned, yet (and as I noted in my earlier post, it is consistent with a lot of real-world behaviour). However, now we know that it may be vulnerable, and Oprea's paper provides a starting point for testing more thoroughly how much of the experimental results arise from complexity.

[HT: Riccardo Scarpa]

Read more:

Monday, 31 March 2025

Pricing like Ferrari

This week my ECONS101 class is covering pricing strategy. Essentially, this topic is about a lot of situations (supported by real-world examples) where firms may choose not to price at the single profit-maximising price. Most of the time, deviations from the profit-maximising price involve the firm pricing at a lower price than the profit-maximising price. The firm might set a lower price in order to generate goodwill and a long-term relationship with consumers, or to sell a greater quantity so that it can take advantage of moving down the learning curve (and achieving lower costs quicker), or to keep competitors out of the market (what economists refer to as limit pricing). The thing about all of those situations is that, by setting a lower price now, the firm earns more profits in the long run. It seems to me to be less clear that firms would want to set a higher price than the profit-maximising price now. Unless they face consumers like this:

Perhaps some consumers are simply willing to buy a good because it has a higher price. That is the basis of conspicuous consumption (which I have written about before here). However, I want to take this in a different direction, because firms can set a high price without needing to rely on conspicuous consumption, even when it seems like a possible explanation for what the firm is doing. Consider this example from the Wall Street Journal last month (ungated version here):

With a list price of $3.7 million, Ferrari’s new “hypercar” was revealed to the public in October with a twist: It wasn’t available for sale.

All 799 units of the low-slung, high-haunched F80 model—the most expensive production vehicle in Ferrari’s history—had been promised to top customers like Luc Poirier.

The Montreal real estate entrepreneur already owns 42 Ferraris. He said he felt “lucky” to be allowed to buy yet another.

“To be chosen by Ferrari for one of their hypercars is a true milestone for any collector,” he said.

Money isn’t enough to buy a top-of-the-range Ferrari. You need to be in a long-term relationship with the company.

By leveraging the rabid fandom of its customers through a business model based on uber-scarcity, the storied Italian company is enjoying a new golden age.

When goods are scarcer, the marginal consumer is willing to pay more for them. This is the 'Law of Demand' working in reverse. If the firm restricts the quantity it sells, then it moves up the demand curve and can sell at a higher price. However, by definition, setting a higher price than the profit-maximising price decreases profits. And, there doesn't seem to be a mechanism where over-pricing their cars gives Ferrari a long-term increase in profits. So, let's consider what they are actually doing.

Consider the market for regular, run-of-the-mill Ferraris. Because there are lots of substitutes for a regular, run-of-the-mill Ferrari, the demand for Ferraris is relatively elastic (shown by the flat demand curve D1). When Ferrari prices its cars, it sets the price so that it will sell the quantity where marginal revenue is exactly equal to marginal cost. That is the quantity Q*, and the price P1. The mark-up for Ferrari is the difference between P1 and marginal cost (MC). 

Ferrari could try setting the price higher than P1, but as noted above, this would decrease the quantity sold below the profit-maximising quantity Q*, and by definition this would decrease Ferrari's profits. So, how could Ferrari increase its profits from selling run-of-the-mill Ferraris? One way is to make demand less elastic (making the demand curve steeper). If the demand curve was steeper, like D0, then the profit maximising price would be P0 rather than P1, and the mark-up on run-of-the-mill Ferraris would be much higher. Selling run-of-the-mill Ferraris would be much more profitable.

If you are a seller, how can a firm make demand for its good less elastic? One of the factors that affects the price elasticity of demand is the number of close substitutes. If the firm can decrease the number of substitutes, or make its good less substitutable by other goods (reducing the number of close substitutes), then demand will be less elastic.

This is what Ferrari is doing by selling its most premium cars only to consumers "in a long-term relationship with the company". If you really want a Ferrari hypercar (or whatever the latest release Ferrari is), then you need to be buying run-of-the-mill Ferraris. That makes other luxury cars less close substitutes for a run-of-the-mill Ferrari, making demand for run-of-the-mill Ferraris less elastic, and allowing Ferrari to set a higher price for run-of-the-mill Ferraris. Since Ferrari sells a lot more run-of-the-mill Ferraris than hypercars, this is likely to be much more profitable for Ferrari overall:

Anyone with a few hundred thousand dollars to spare can buy a regular Ferrari as long as they are willing to wait a couple of years. While the standard models aren’t subject to strictly limited runs, the company still lives by Enzo Ferrari’s scarcity dictum: “Ferrari will always deliver one car less than the market demands.”

Limited-edition Ferraris are even scarcer, and you can’t just walk into your local showroom and buy one. These range from special versions of regular models to the design-oriented “Icona” and, most exclusively, once-in-a-decade hypercars like LaFerrari and the F80.

Such models help keep orders flowing for the company’s entire product range even though they account for a fraction of deliveries—just 7% last year. Collectors had on average bought 10 new Ferraris before qualifying to buy LaFerrari or an Icona, which means icon in Italian, according to Hagerty.

Maybe buying a premium Ferrari is conspicuous consumption, and maybe Ferrari is taking advantage of that. However, it is also using its premium Ferraris to increase the price and profitability of a run-of-the-mill Ferrari.

[HT: Marginal Revolution]

Sunday, 30 March 2025

Another study of MasterChef that doesn't tell us much because of survivorship bias

Data from sports and games can tell us a lot about decision-making and behaviour. That's because the framework within which decisions are made, and behaviour takes place, is well defined by the rules of the sport or game. That's why I really like to read studies in sports economics, and often post about them here. I also like to read studies that use data from game shows, where the framework is clearly defined.

While those sorts of studies can tell us a lot, they still need to be executed well, and unfortunately, that isn't always the case. Consider this post, where I outlined a clear problem of survivorship bias in the analysis of a paper using data from MasterChef. Sadly, that paper is not alone as one of the authors, Alberto Chong (Georgia State University) has made a similar mistake in a follow-up paper, again using the same dataset from MasterChef.

This new paper intends to look at the relationship between exposure to anger and performance. As Chong explains:

Being exposed to anger in others may provide a burst of energy and increase focus and determination, which maybe translated into increased performance. However, the opposite may also be true. Exposure to anger in others may cloud judgment, impair decision-making, and may end up decreasing performance. In short, understanding whether the link between these two variables is positive or negative is an empirical question.

And if you've ever watched MasterChef (the US version), you will know that anger is a key feature of the series. For example:

So, Chong looks at whether exposure to the angry reactions of the judges affects contestants' performance overall, including their final placement, as well as the number of challenges they placed in the top three, their probability of placing in the top three, and their probability of winning. The dataset covers all seasons of MasterChef from 2010 to 2020. Exposure to anger is measured as "the number of times that any of the contestants have been exposed to anger by any of the judges". Chong finds that:

...people who are exposed to anger appear to react positively to anger by improving their final placement in the competition likely as a result of increased focus and determination. In particular, we find that it is associated with contestants improving around 1.5 placement positions or higher in the final standings. We also find that the probability of winning the competition increases by around 2.2 percent.

However, there is a problem, and that problem is survivorship bias. Contestants who remain in the show for longer have more opportunity to be exposed to anger from the judges. So, even if angry reactions are completely randomly assigned to contestants, those who survive for more episodes will both attract more angry reactions and have a higher placing overall. There is a mechanistic relationship that drives a negative correlation between placing in the show and exposure to anger. The analysis needs to condition the exposure to anger on the number of opportunities for judges to be angry. So, rather than the number of times exposed to anger, the key explanatory variable should be the proportion of times the contestant is exposed to anger.

Now, in my previous post on analysis of this dataset I demonstrated using some randomly generated data why survivorship bias was a problem. I'm not going to do that this time, because the issue is substantively the same (even if the specific numbers will be different). However, as I noted then, this study is crying out for a replication along with the other one, and together they would make a great project for a motivated Honours or Masters student. Then these studies might live up to the ideal of telling us something about decision-making and behaviour.

[HT: Marginal Revolution]

Friday, 28 March 2025

This week in research #68

Here's what caught my eye in research over the past week (which, it seems, was a very quiet week!):

  • Baker et al. (ungated preprint) provide an excellent overview of difference-in-difference research designs (which I have referred to many times on this blog), including all of the issues that researchers need to be aware of when using this research design

Thursday, 27 March 2025

First results on the non-binary gender earnings gap in New Zealand

The gender gap in pay between men and women is well known. Much less is known about the gender gap (if any) between cisgender men and women, and gender diverse people. However, this new article by Christopher Carpenter (Vanderbilt University) and co-authors, published in the journal Economics Letters (open access, with non-technical summary here), gives us a starting point using novel data from New Zealand. Carpenter et al. make use of Stats NZ's Integrated Data Infrastructure (IDI), which links various administrative datasets. Specifically:

We use NZ Department of Internal Affairs (DIA) birth records to identify birth record sex, which only allow two options: ‘male’ or ‘female’. Next, we link DIA birth records with the NZTA Driver License Register and restrict our sample to individuals who had their driver license registration/renewal in 2021 or after when the NZTA driver license application allowed identification of men, women, and gender diverse people... We compare driver license gender with birth record sex to identify cisgender people (those whose birth record sex matches their driver license recorded gender), transgender people (those whose birth record sex does not match their driver license register gender and whose driver license register gender is either male or female), and gender diverse people (those whose driver license register gender indicates gender diverse).

This is a really smart approach to identifying gender diverse and transgender individuals in the administrative data. It will tend towards false negatives, because not everyone has a driver's licence, and not every gender diverse or transgender person will change their gender on the driver's licence. However, Carpenter et al. are up front about the measurement error that this creates.

Carpenter et al. then look at demographic and other characteristics and at labour market outcomes by gender, comparing transgender and gender diverse people with cisgender people, focusing on whether each person is NEET (not in employment, education, or training), and their taxable income reported to Inland Revenue. In terms of demographics, they find that:

...relative to cisgender women, transgender men are younger, less likely to be of European descent, less likely to be married or in a civil union, less likely to have children, more likely to live in Auckland or Wellington, less likely to have a tertiary qualification, and more likely to have a mental health prescription... gender diverse individuals whose birth record sex is female... are younger, less likely to be married, less likely to have had any children, more likely to have a mental health prescription, and more likely to be NEET than both transgender men and cisgender women. Regarding education, gender diverse individuals whose birth record sex is female are more likely than transgender men but less likely than cisgender women to have a tertiary qualification...

Relative to cisgender men, transgender women are younger, less likely to be of European descent, less likely to be married or in a civil union, less likely to have children, more likely to live in Auckland or Wellington, and more likely to have a mental health prescription than cisgender men... gender diverse individuals whose birth record sex is male... are younger, more likely to have tertiary education, and more likely to have a mental health prescription than both transgender women and cisgender men. Gender diverse individuals whose birth record sex is male are much more similar to transgender women than they are to cisgender men with respect to marital status, presence of children, and residence in Auckland or Wellington.

Interesting stuff, although superseded by data coming out of the 2023 Census, which for the first time collected comprehensive data on gender and sexual identity (more on that in a moment). Turning to labour market outcomes, Carpenter et al. find:

...strong evidence that gender minorities in New Zealand are much more likely to be NEET than otherwise similar cisgender people. We estimate that transgender women, gender diverse individuals whose birth record sex is male, and gender diverse individuals whose birth record sex is female are 10–12 percentage points more likely to be NEET than similarly situated cisgender men...

Turning to earnings... we again find that gender minorities earn significantly less than cisgender men with similar observable characteristics. Here, however, the differences for cisgender women – which indicate precise earnings gaps of about 33 % – are similar in magnitude to those estimated for transgender women and transgender men. In contrast, gender diverse individuals whose birth record sex is male and gender diverse individuals whose birth record sex is female both experience significantly larger earnings gaps compared to both cisgender men and cisgender women.

Those earnings gaps for gender diverse people are both over 50 percent. I don't think anyone will be particularly surprised by these results. It has long been suspected that gender diverse people face an earnings penalty, but there has been a lack of data to support this. However, the novel approach by Carpenter et al. has helped to fill in that particular research gap. The next step though, surely, must be to take advantage of the 2023 Census data, which gives much more detail on gender and sexual identity, with what is likely to be far less measurement error. According to the public Stats NZ data, there were over 17,000 people who were 'another gender' (other than male or female) in the 2023 Census (you can find this by browsing Aotearoa Data Explorer for 2023 Census data on gender). However, to disaggregate between different non-binary genders from there requires access to the data in the IDI. It will be interesting to see when the first analyses of that data come out. I'm very sure someone will be looking at it already.

Wednesday, 26 March 2025

GPT-4 tells us how literary characters would play the dictator game

Have you ever wondered what it would be like to interact with your favourite literary characters? What interesting conversation might we have with Elizabeth Bennet or Clarissa Dalloway? Or, who would win if we played a game of monopoly with Ebenezer Scrooge or Jay Gatsby? Large language models like ChatGPT can provide us with a partial answer to that question, because they can be prompted to take on any persona. And because of the wealth of information available in their training data, LLMs are likely to be very convincing at cosplaying famous literary characters.

So, I was really interested to read this new article by Gabriel Abrams (Sidwell Friends High School), published in the journal Digital Scholarship in the Humanities (ungated earlier version here). Abrams asked GPT-4 to play the role of a large number of famous literary characters when playing the 'dictator game'. To review, in the dictator game the player is given an amount of money, and can choose how much of that money to keep for themselves, and how much to give to another player. Essentially, the dictator game provides an estimate of fairness and altruism.

Abrams first asked GPT-4 to identify the 25 most well-known fictional characters in each century from the 17th Century to the 21st Century. Then, for each character, Abrams asked GPT-4 to play the dictator game, as well as to identify the particular personality traits that would affect the character's decision in the game. Abrams then took each personality trait and asked GPT-4 to assign it a valence (positive, neutral, or negative). Finally, Abrams summarised the results by Century, finding that:

There is a general and largely monotonic decrease in selfish behavior over centuries for literary characters. Fifty per cent of the decisions of characters from the 17th century are selfish compared to just 19 per cent from the 21st century...

Humans are more selfish than the AI characters with 51 per cent of humans making selfish decisions compared to 28 per cent of the characters...

So, over time literary characters have become less selfish, but overall the characters are more selfish than real humans. An interesting question, which can't be answered with this data, is whether the change in selfishness also reflects a decrease in selfishness in the population generally (because the selfishness of humans was measured in the 21st Century only). Interestingly, looking at personality traits:

Modeled characters’ personality traits generally have a strong positive valence. The weighted average valence across the 262 personality traits was a surprisingly high +0.47...

I associate many literary figures with their negative traits, and less so with positive traits. Maybe that's just me. Or maybe, the traits that GPT-4 thought were most relevant to the choice in the dictator game tended to be more positive traits. Given that the dictator game is really about altruism and fairness, then that might explain it. Over time, there hasn't been a clear trend in valence:

The 21st century had the highest valence at +0.74... The least positive centuries were the 17th and 19th with +0.28 and +0.29, respectively...

Abrams then turned to the specific personality traits, identifying the traits that were more common (overweighted) or less common (underweighted) in each century, compared with overall. This is summarised in Table 6 from the paper:

There are some interesting changes there, with empathetic shifting from being the most underweighted trait to being the most overweighted trait, while manipulative shifts in the opposite direction (from most overweighted to third-most underweighted). Interesting, and not necessarily what I would have expected. Abrams concludes that:

The Shakespearean characters of the 17th century make markedly more selfish decisions than those of Dickens, Dostoevsky, Hemingway and Joyce, who in turn are more selfish than those of Ishiguro and Ferrante in the 21st century.

Historical literary characters have a surprisingly strong net positive valence. It is possible that there is some selection bias. For instance, scholars or audiences may make classics of books with mainly attractive characters.

That makes sense. One thing that I found missing in the paper was a character-level assessment. It would have been interesting to see the results for favourite (and least favourite) characters individually, and see how they compare with what we might have expected. That could have been added to supplementary materials for the paper, and would have been an interesting read.

Nevertheless, this paper was an interesting exploration of just some of what LLMs can be used for in research. As I've noted before, LLMs have essentially killed off online data collection using tools like mTurk, because the mTurkers may simply use LLMs to respond to the survey or experiment. Researchers can now cut out the middleman, and use LLMs directly to cosplay for research participants based on any collection of characteristics (age, gender, ethnicity, location, etc.). The big question now is, when LLMs are used in this way, is some of the real underlying variation in human responses lost (because LLMs will tend to give a 'median' response for the group they are cosplaying)? The answer to that question will become clear as researchers continue on this path.

[HT: Marginal Revolution, back in 2023]

Monday, 24 March 2025

New Zealand's supermarket sector needs a hero

In yesterday's post, I discussed market power and competition, noting that when there is a lack of competition, firms have more market power, and that means higher mark-ups and higher prices for consumers. An example of a market where there appears to be a high degree of market power is the supermarket sector in New Zealand.

It wasn't always this way. When I was young, I remember there being a large number of different supermarket brands. In Tauranga in the mid-1990s, along Cameron Road between the CBD and Greerton there was Price Chopper (which was previously 3 Guys), Pak'nSave, Big Fresh, Foodtown, New World, and Countdown (and there may be others that I've forgotten, as well as several smaller superettes).

One of the main ways that a market ends up highly concentrated is to start with a market that has some degree of competition, but then some of the firms merge (or take each other over), leaving fewer firms and less competition. In the context of supermarkets in New Zealand, this process is outlined in this article in The Conversation by Lisa Asher, Catherin Sutton-Brady (both University of Sydney), and Drew Franklin (University of Auckland):

The current state of New Zealand’s supermarket sector – dominated by Woolworths (formerly Countdown), Foodstuffs North Island and Foodstuffs South Island – is a result of successive mergers and acquisitions along two tracks.

The first was Progressive Enterprises’ (owner of Foodtown, Countdown and 3 Guys banners) purchase of Woolworths New Zealand (which also owned Big Fresh and Price Chopper) in 2001.

Progressive Enterprises was sold to Woolworths Australia, its’ current owner, in 2005. In less than 25 years, six brands owned by multiple companies were whittled down to a single brand, Woolworths.

The second was the concentration of the “Foodstuffs cooperatives” network. This network once included four regional cooperatives and multiple banners including Mark'n Pak and Cut Price, as well as New World, PAK’nSave and Four Square.

The decision of the four legally separate cooperatives to include “Foodstuffs” in their company name blurred the lines between them. The companies looked similar but remained legally separate.

As a result of mergers, these four separate companies have now become Foodstuffs North Island – franchise limited share company, operating according to “cooperative principles” and Foodstuffs South Island, a legal cooperative.

And so now we find ourselves in a situation with three large supermarket firms, two of which (Foodstuffs North Island and Foodstuffs South Island) are effectively two arms of the same firm, and certainly aren't competing with each other because they operate only on their 'own' islands. With such a lack of competition many people, including Asher et al., are clamouring for change.

Increasing competition in the supermarket sector could take one of two forms. Asher et al. argue that inviting an international competitor into the market will take too long, citing the example of Aldi in Australia, which "took 20 years to reach scale as a third major player in that country". Their preference is 'forced divestiture', breaking up the existing supermarkets into smaller competing firms. Essentially, this would be something of a return to the situation prior to some of the mergers that have characterised the past 30 years of the supermarket sector in New Zealand, but would require a drastic legislative intervention from government.

However, before the government imposes such a dramatic change on this market, it really needs some solid analysis of the impacts of the change. Large supermarket firms benefit from economies of scale in purchasing, logistics and distribution, as well as back-office functions (like payroll, marketing, and finance). If smaller supermarket firms face higher costs because they can't take advantage of the economies of scale available to larger supermarket firms, then breaking up the supermarket chains into smaller chains could lead to even higher prices for consumers. On the other hand, smaller supermarket chains have less bargaining power with suppliers, which might mean that the supermarket suppliers receive better prices (but again, that means higher prices for consumers). Without some careful economic modelling, which has not been done to date, we can't make a clear-eyed assessment of the likely net change in consumer and producer welfare.

And we should be cautious. If forced divestiture starts to gain some political traction, you can bet that the supermarket chains will release some economic analysis that supports a position that breaking them up will be worse for consumers. And consumer advocates might even be able to support their own analyses, showing the opposite. What is needed is a truly independent assessment. And before you raise it, I doubt that we get that sort of independent assessment from the Commerce Commission. They know that their bills are paid by the government of the day, and they may respond accordingly.

Asher et al. ask us to "stop waiting of a foreign hero". What we need is an economist hero, with an independent analysis of the supermarket sector in hand.

Read more:

Sunday, 23 March 2025

Market power, competition, and the collapses of Bonza and Rex

Last week, my ECONS101 class covered market power and competition (as part of a larger topic introducing some of the principles of firm behaviour). This coming week, we'll be covering elasticity, which is closely related and builds on the key ideas of firm behaviour.

Market power is the ability of a seller (or sometimes a buyer) to influence market prices. The greater the amount of market power the seller (or buyer) has, the more they can raise their price above marginal cost. That is, sellers with greater market power will have a higher mark-up (which is the difference between price and marginal cost).

How do firms get market power? There are several ways, but the greatest contributor to market power is the extent of competition in the market. When firms face a lot of competition in their market, they will compete vigorously on price, and so their mark-up will be lower. When firms face less competition in their market, they don't have to compete on price to the same degree, and so their mark-up will be higher.

Another way of seeing this is to consider the price elasticity of demand. When there are many substitutes for a good, the demand for that good will be more elastic. If the seller raises their price, many of their consumers will buy (one of the many) substitutes instead (because those substitutes are now relatively cheaper). So, firms selling a good that has many substitutes (a good that has more elastic demand) will have a lower mark-up. And if a firm's good has many substitutes, that means a lot of competition.

On the other hand, when there are few substitutes for a good, the demand for that good will be less elastic. If the seller raises their price, few of their consumers will buy substitutes instead (because there are few substitutes available). So, firms selling a good that has few substitutes (a good that has less elastic demand) will have a higher mark-up. And if a firm's good has few substitutes, that means less competition.

Taking all of this together, when a market loses one or more of the competitors, and competition reduces, we should expect to see an increase in prices. A clear example of this is what happened when the Australian regional airlines Bonza and Rex closed down last year. As Doug Drury (Central Queensland University) wrote in The Conversation last November:

In 2024 alone, we’ve seen the high-profile collapse of both Bonza and Rex, airlines that once ignited hopes for much greater competition in the sector. Now, we’re beginning to see the predictable effects of their exit.

According to a quarterly report released on Tuesday by the Australian Competition and Consumer Commission (ACCC), domestic airfares on major city routes increased by 13.3% to September after Rex Airlines halted its capital city services at the end of July.

The collapse of the two low-cost domestic airlines in Australia reduced competition on domestic routes. Unsurprisingly, the lower competition means more market power for the remaining airlines, Qantas and Jetstar. And that greater market power has translated into higher prices for domestic airfares in Australia.

The importance of competition for prices cannot be overstated. As one other example, a lack of competition has been implicated in the perceived high prices in New Zealand supermarkets (a point I will come back to in my next post, but this is a topic I have written about before here). The Commerce Commission is understandably concerned whenever there is a lack of competition, and whenever competition will be substantially reduced by the merger of two or more firms (for example, see here and here about the recently rejected Foodstuffs supermarket merger). When there is a lack of competition, sellers have more market power, demand will be less elastic, and for both of those reasons we can expect prices to be higher.

Friday, 21 March 2025

This week in research #67

Here's what caught my eye in research over the past week:

  • Schreyer and Singleton (open access) find that Cristiano Ronaldo increased stadium attendance in the Saudi league, by an additional 20% of the seats in his home team's stadium when he played, 15% in the stadiums he visited, and by 3% where he did not even play
  • Harrison and Glaser find that laws that allow breweries to bypass distributors lead to higher brewery output and employment, and that this is primarily driven by a greater market entry of breweries
  • Ankel‐Peters, Fiala, and Neubauer (open access) review the impact of replications published as comments in the American Economic Review between 2010 and 2020, and find that the comments are barely cited, and they do not affect the original paper's citations, even when the replication diagnoses substantive problems with the original paper (does this show the level of revealed preference for replications?)

Wednesday, 19 March 2025

Book review: The Economist's View of the World

In 1973, the Western Economic Journal published a now-famous article (ungated here) by Axel Leijonhufvud (UCLA), entitled "Life Among the Econ". The article was an ethnographic study of the Econ tribe, and included such gems as:

Almost all of the travellers' reports that we have comment on the Econ as a "quarrelsome race" who "talk ill of their fellow behind his back," and so forth. Social cohesion is apparently maintained chiefly through shared distrust of outsiders.

And:

The young Econ, or "grad," is not admitted to adulthood until he has made a "modl" exhibiting a degree of workmanship acceptable to the elders of the "dept" in which he serves his apprenticeship. 

Obviously, Leijonhufvud's was an economist, not an anthropologist, and his article was hilarious satire. However, there is something to be said for taking an outside view of the discipline, and such a view might really help non-economists to understand economists. This is where most popular economics books go wrong - they are written by economists.

That is not the case for The Economist's View of the World, written by political scientist Steven E. Rhoads (apparently, not to be confused with the current New York senator Steven Rhoads - thanks Google). Rhoads is professor emeritus at the University of Virginia, where he taught economics to students in public administration. So, not only does he know economics well, but he also brings an outsider view (of sorts) to the subject. The book was first published in 1985, but I read the revised and updated "35th Anniversary Edition", which was published in 2021, and includes reference to more recent developments such as the rise of behavioural economics, the increasing salience of income inequality, and current policy debates.

The book starts with a non-technical introduction to three important concepts in economics: (1) opportunity cost; (2) marginalism; and (3) incentives. Those topics would also tend to be at the start of introductory economics textbooks as well. However, Rhoads doesn't get bogged down in the details of theories and models, and instead focuses on applications and illustrations. For example, on the issue of incentives:

To be sure, policy makers should be careful not to just implement the first incentive that comes to mind. To do so means to risk the fate of the poor little town of Abruzzi, Italy. The city was plagued by vipers, and the city fathers determined to solve the problem by offering a reward for any viper killed. Alas, the supply of vipers increased. Townspeople had started breeding them in their basements...

This is another example of the 'cobra effect' (which I have written about here and here), interestingly also involving snakes. After these initial chapters on the basics of the economic way of thinking, the book then pivots to a more policy focus, with a strong emphasis on outlining economists' perspectives on various aspects of public policy. Like the initial chapters, this section presents the view of an outsider explaining economics to other outsiders, and is mostly successful at doing so. Consider this bit on profit-taking by middlemen:

Some readers will be skeptical: what about the unfair way that companies scoop profits from the people who actually produce the products? Look at farmers; they get a fraction of the profits from their hard work...

Economists offer reasonable defenses for all of these much-maligned groups... Middlemen are a further development of the wealth-increasing division of labor. If they do not provide services worth their costs, retailers that do not use them will put out of business not just the middlemen but also the retailers they supply.

The policy aspects of this middle section of the book are accompanied by a big dose of public choice theory, which is not something that I am accustomed to seeing in popular economics books. It is no doubt why this book is very popular among libertarian economists. However, as the book increases the policy and public choice aspects, it loses the some of the dispassionate outlining of economists' perspectives. Later sections increasingly come across more as an attempt to convince the reader of those perspectives. This is where the popular economics books lose their audience, and I'm sad to say I think Rhoads also falls into this trap.

My other gripe about the book is that it is very US-centric. The book is not so much presenting the economist's view of the world, but rather the economist's view of the US. The rest of the world barely rates a mention. While Rhoads is clearly tailoring the book to a US audience, it does tend to present economists as favouring more libertarian and small-government outcomes to an extent that is not necessarily apparent among economists around the rest of the world.

The final section of the book is a gentle critique of economics, with a particular focus on measurement of wellbeing, and on political deliberation. Unlike other books that present critiques (such as the books I reviewed here and here), Rhoads does not rely on strawman arguments that bear little resemblance to economics as it is actually practiced. For that reason, his critiques can and should be taken much more seriously. This section could easily have been expanded (perhaps to an entire book), but nevertheless it seemed like a sensible way to finish.

I enjoyed reading this book, but the overly US-centric approach turned me off and I wouldn't recommend it to non-US readers. For those readers outside the US, despite their lack of outsider perspective, population economics books by Tim Harford (for example see here) or Diane Coyle (for example, see here) would be much better.

Monday, 17 March 2025

The potential contribution of generative AI to journal peer review

As the Managing Editor of a journal (the Australasian Journal of Regional Studies), I have been watching the artificial intelligence space with interest. One thing that AI could easily be used for is peer review. So far, I haven't seen any evidence that reviewers for my journal have been using AI to complete their reviews, but I know that it is becoming increasingly common (and was something that I did observe as a member of the Marsden Social Sciences panel, before it was disbanded). 

What could be so bad about AI completing peer review of research? The truth is that we don't know the answer to that question. As editors and researchers, we may have concerns about whether AI would do a quality job, whether it would be biased for or against certain types of research and certain types of researchers. But really, there hasn't been a lot of empirical support for these concerns, or against them.

That's why I was really interested to read two papers that contribute to our understanding of AI in peer review. The first is this 2024 article by Weixin Liang (Stanford University) and co-authors, published in the journal NEJM AI (ungated earlier version here). Their dataset of human feedback was based on over 8700 reviews of over 3000 accepted papers from Nature family journals (which had published their peer review reports), and over 6500 reviews of 1700 papers from the International Conference on Learning Representations (ICLR), a large machine learning conference (and where the authors had access to review reports for accepted as well as rejected papers). They quantitatively compared the human feedback with feedback generated by GPT-4.

Liang et al. found that, for the Nature journal dataset:

More than half (57.55%) of the comments raised by GPT-4 were raised by at least one human reviewer... This suggests a considerable overlap between LLM feedback and human feedback, indicating potential accuracy and usefulness of the system. When comparing LLM feedback with comments from each individual reviewer, approximately one third (30.85%) of GPT-4 raised comments overlapped with comments from an individual reviewer... The degree of overlap between two human reviewers was similar (28.58%), after controlling for the number of comments...

For the ICLR dataset, the results were similar, but the nature of the data allowed for more nuance:

Specifically, papers accepted with oral presentations (representing the top 5% of accepted papers) have an average overlap of 30.63% between LLM feedback and human feedback comments. The average overlap increases to 32.12% for papers accepted with a spotlight presentation (the top 25% of accepted papers), while rejected papers bear the highest average overlap at 47.09%. A similar trend was observed in the overlap between two human reviewers: 23.54% for papers accepted with oral presentations (top 5% accepted papers), 24.52% for papers accepted with spotlight presentations (top 25% accepted papers), and 43.80% for rejected papers.

So, GPT-4 was very good at identifying the worst papers (those that should be rejected), and had a similar extent of overlap in comments with a human reviewer as another human reviewer would. Turning to the types of comments, Liang et al. find that:

LLM comments on the implications of research 7.27 times more frequently than humans do. Conversely, LLM is 10.69 times less likely to comment on novelty than humans are... This variation highlights the potential advantages that a human-AI collaboration could provide. Rather than having LLM fully automate the scientific feedback process, humans can raise important points that LLM may overlook. Similarly, LLM could supplement human feedback by providing more comprehensive comments.

The takeaway message here is that GPT-4 is not really a substitute for a human reviewer, but is a useful complement to human reviewing. Finally, Liang et al. conducted a survey of 308 researchers across 110 US universities, who could upload some research and receive AI feedback. As Liang et al. explain:

Participants were surveyed about the extent to which they found the LLM feedback helpful in improving their work or understanding of a subject. The majority responded positively, with over 50.3% considering the feedback to be helpful, and 7.1% considering it to be very helpful... When compared with human feedback, while 17.5% of participants considered it to be inferior to human feedback, 41.9% considered it to be less helpful than many, but more helpful than some human feedback. Additionally, 20.1% considered it to be about the same level of helpfulness as human feedback, and 20.4% considered it to be even more helpful than human feedback...

In line with the helpfulness of the system, 50.5% of survey participants further expressed their willingness to reuse the system...

And interestingly:

Another participant wrote, “After writing a paper or a review, GPT could help me gain another perspective to re-check the paper.”

I hadn't really considered running my research papers through generative AI to see if it could provide feedback. However, now that I've heard about it, it is completely obvious that I should do so. And so should other researchers. It's a low-cost form of internal feedback. Indeed, Liang et al. conclude that:

...LLM feedback should be primarily used by researchers identify areas of improvements in their manuscripts prior to official submission.

The second paper is this new working paper by Pat Pataranutaporn (MIT), Nattavudh Powdthavee (Nanyang Technological University), and Pattie Maes (MIT). They undertook an experimental evaluation of AI peer review of economics research articles, in order to determine the ability of AI to distinguish the quality of research, and whether it would be biased by non-quality characteristics of the papers it reviewed.

To do this, Pataranutaporn et al.:

...randomly selected three papers each from Econometrica, Journal of Political Economy, and Quarterly Journal of Economics (“high-ranked journals” based on RePEc ranking) and three each from European Economic Review, Economica, and Oxford Bulletin of Economics and Statistics (“medium-ranked journals”). Additionally, we randomly selected three papers from each of the three lower-ranked journals not included in the RePEc ranking—Asian Economic and Financial Review, Journal of Applied Economics and Business, and Business and Economics Journal (“low-ranked journals”). To complete the dataset, we included three papers generated by GPT-o1 (“fake AI papers”), designed to match the standards of papers published in top-five economics journals.

They then:

...systematically varied each submission across three key dimensions: authors’ affiliation, prominence, and gender. For affiliation, each submission was attributed to authors affiliated with: i) top-ranked economics departments in the US and UK, including Harvard University, Massachusetts Institute of Technology (MIT), London School of Economics (LSE), and Warwick University, ii) leading universities outside the US and Europe, including Nanyang Technological University (NTU) in Singapore, University of Tokyo in Japan, University of Malaya in Malaysia, Chulalongkorn University in Thailand, and University of Cape Town in South Africa... and iii) no information about the authors’ affiliation, i.e., blind condition.

To introduce variation in academic reputation, we replaced the original authors of the base articles with a new set of authors categorized into the following groups: (i) prominent economists—the top 10 male and female economists from the RePEc top 25% list; (ii) lower-ranked economists—individuals ranked near the bottom of the RePEc top 25% list; (iii) non-academic individuals—randomly generated names with no professional affiliation; and (iv) anonymous authorship—papers where author names were omitted. For non-anonymous authorship, we further varied each submission by gender, ensuring an equal split (50% male, 50% female). Combining these variations resulted in 9,030 unique papers, each with distinct author characteristics...

Pataranutaporn et al. then asked GPT4o-mini to evaluated each of the 9030 unique papers across a number of dimensions, including whether it would be accepted or rejected at a top-five journal, the reviewer recommendation, the predicted number of citations, whether the paper would attract research funding, result in a research award, strengthen an application for tenure, and be part of a research agenda worthy of a Nobel Prize in economics. They found that:

...LLM is highly effective at distinguishing between submissions published in low-, medium-, and high-quality journals. This result highlights the LLM’s potential to reduce editorial workload and expedite the initial screening process significantly. However, it struggles to differentiate high-quality papers from AI-generated submissions crafted to resemble “top five” journal standards. We also find compelling evidence of a modest but consistent premium—approximately 2–3%—associated with papers authored by prominent individuals, male economists, or those affiliated with elite institutions compared to blind submissions. While these effects might seem small, they may still influence marginal publication decisions, especially when journals face binding constraints on publication slots.

So, on the one hand the AI tool does do a good job of identifying high-quality submissions. However, it can't tell them apart from high-quality AI-generated submissions. And, it has a small but statistically significant bias towards male authors. Both of these latter points are worrying, but again they suggest that a combination of human and AI reviewers might be a suitable path forward.

Pataranutaporn et al.'s paper is focused on solving a "peer review crisis". It has become increasingly difficult to find peer reviewers who are willing to spend the time to generate a high-quality review that will in turn help to improve the quality of published research. Generative AI could help to alleviate this, but we're clearly not entirely there yet. There is still an important role for humans in the peer review process, at least for now.

[HT: Marginal Revolution, for the Pataranutaporn et al. paper]

Sunday, 16 March 2025

Why tit-for-tat tariffs may not work against Trump

Last week, my ECONS101 class covered game theory. At the end of the final lecture, after we had been covering repeated games and tit-for-tat strategies, a really perceptive student asked me about Trump's tariffs. A lot of the rhetoric about tariffs has been posed in terms of tit-for-tat (see here and here, for example). The student's question got me thinking though, about why a tit-for-tat strategy may not work in this case.

Before we get that far though, we need to think about the tariff game, as outlined in the payoff table below. There are two players: USA and 'Other Country'. Each player has two strategies: high tariffs, or low tariffs (which includes no tariffs). The payoffs are expressed as "+" for good outcomes (and "++" is particularly good), and "--" for bad outcomes (and "--" is particularly bad), while zero is a neutral payoff. 

To find the Nash equilibrium in this game, we use the 'best response method'. To do this, we track: for each player, for each strategy, what is the best response of the other player. Where both players are selecting a best response, they are doing the best they can, given the choice of the other player (this is the definition of Nash equilibrium). In this game, the best responses are:

  1. If the other country chooses high tariffs, USA's best response is to choose high tariffs (since "0" is better than "--" as a payoff) [we track the best responses with ticks, and not-best-responses with crosses; Note: I'm also tracking which payoffs I am comparing with numbers corresponding to the numbers in this list];
  2. If the other country chooses low tariffs, USA's best response is to choose high tariffs (since "++" is better than "+" as a payoff);
  3. If USA chooses high tariffs, the other country's best response is to choose high tariffs (since "0" is better than "--" as a payoff); and
  4. If USA chooses low tariffs, the other country's best response is to choose high tariffs (since "++" is better than "+" as a payoff).

Notice that USA chooses high tariffs no matter what the other country does, high tariffs is a dominant strategy for the USA. Similarly, since the other country chooses high tariffs no matter what the USA does, high tariffs is a dominant strategy for the other country. Both countries will choose to play their dominant strategy (because it is always better than the other strategy, not matter what the other country chooses to do). The outcome where both countries choose high tariffs is the Nash equilibrium in this game (it is also a dominant strategy equilibrium, because both countries have a dominant strategy).

However, I'm sure that you can clearly see that both countries would be better off with low tariffs (since the payoff for each country would be "+", instead of "0"). This game is an example of the prisoner's dilemma (it's a dilemma because, when both countries act in their own best interests, both are made worse off).

However, it is important to remember that this game is a repeated game. It is played more than once, with the same players, and the same strategy choices. When a game is repeated, then the outcome may differ from the equilibrium of the non-repeated game, because the players can learn to work together to obtain the best outcome.

In a repeated prisoners' dilemma game like this, each player can encourage the other to cooperate by using the tit-for-tat strategy. That strategy, identified by Robert Axelrod in the 1980s, works by initially cooperating (low tariffs), and then in each play of game after the first, the player does whatever the other player did last time. So, if the USA chooses high tariffs, then the other country should punish them and choose high tariffs in the next play of the game. And if the USA chooses low tariffs, then the other country should reward them and choose low tariffs in the next play of the game. The tit-for-tat strategy works because it encourages the other player to cooperate. And that is what many people have been expecting of Trump. If you punish the USA for high tariffs by setting your own high tariffs, eventually they will realise their error and start cooperating with low tariffs again.

But there is a problem. The whole edifice of the tit-for-tat strategy assumes that Trump knows that he is playing the game outlined above, where there is an unambiguously better outcome for both countries, if they both choose low tariffs. By his past statements, it is absolutely clear that Trump thinks that trade is a zero-sum game (for example, see here or here).

So, what does the game look like if you believe that trade is zero-sum? Instead of there being gains from the low-tariff/low-tariff outcome, the payoffs become zero, as shown in the payoff table below.

Solving this game for the Nash equilibrium, the best responses are:

  1. If the other country chooses high tariffs, USA's best response is to choose high tariffs (since "0" is better than "--" as a payoff) [we track the best responses with ticks, and not-best-responses with crosses; Note: I'm also tracking which payoffs I am comparing with numbers corresponding to the numbers in this list];
  2. If the other country chooses low tariffs, USA's best response is to choose high tariffs (since "++" is better than "0" as a payoff);
  3. If USA chooses high tariffs, the other country's best response is to choose high tariffs (since "0" is better than "--" as a payoff); and
  4. If USA chooses low tariffs, the other country's best response is to choose high tariffs (since "++" is better than "0" as a payoff).

Notice that the game itself doesn't change. Imposing high tariffs is still a dominant strategy for both countries, and the outcome where both countries choose high tariffs is the only Nash equilibrium (and is also a dominant strategy equilibrium).

However, there is an important difference in this game when it is played as a repeated game. There is no incentive for players to cooperate. That's because cooperating results in a payoff of "0", just the same as not cooperating. And because of this, the tit-for-tat strategy would be pointless.

Which seems to be what other countries are finding, when dealing with Trump. Other countries may think they are playing the first game, where a tit-for-tat strategy may get Trump to reconsider. But if Trump thinks they are playing the second (zero-sum) game (and it seems that he does), then the tit-for-tat strategy is simply not going to work.

[HT: Sarah from my ECONS101 class]

Friday, 14 March 2025

This week in research #66

Here's what caught my eye in research over the past week:

  • Clements (open access) provides notes on getting a job, for PhD graduates in economics and finance
  • Armstrong et al. (open access) find that legalisation of cannabis in Canada has no statistically significant effect on alcohol sales overall, but beer sales decreased significantly and was offset by a significant increase in sales of other alcoholic beverages
  • Yechiam and Zeif provide another meta-analysis of loss aversion, concluding that findings of strong loss aversion are replicated when losses are smaller than gains and when gains and losses are presented in an ordered fashion, but for studies with symmetric gains and losses and no ordering of items, the loss aversion parameter is not statistically significant
  • Capponi and Frenken (open access) investigate the careers of 473 scientists at Dutch universities during the period 1815–1943, and find that 'inbreeding' (having a PhD supervised by a professor who holds a PhD from the same university and within the same discipline) generally enhances academic performance, but only in the early lifecycle stages of a new intellectual movement
  • Brooks et al. (open access) find that research produced by female finance academics is published in lower-rated journals and garners fewer citations, and that female-authored work in finance is ‘penalised’ more for its interdisciplinarity than similar research authored by men
  • El Tinay and Schor find that the top five economics journals have cumulatively only published 25 unique research articles on the topic of climate change from 1975 to 2023, and that they have failed to engage with the role and consequences of domestic and global inequality in the dynamics of climate change
  • Nielsen et al. estimate that Danes would be willing to pay €9.70 million to host a Formula One Grand Prix in Copenhagen, Denmark (not anywhere near what it costs, so don't expect a Danish Grand Prix again anytime soon)
  • Pitts and Evans examine the impact of name, image, and likeness (NIL) rights on college football recruiting, and find that the average of the top-ten NIL valuations for a university’s football players was correlated with the perceived quality of the players recruited (so good players are attracted to colleges that have higher valued players already)

Thursday, 13 March 2025

Hawks, doves, Israel and Iran

In The Conversation last October, Andrew Thomas (Deakin University) discussed the recent (at that time) military flare-up between Iran and Israel, likening it to a 'game of chicken':

Israel’s strike on military targets in Iran over the weekend is becoming a more routine occurrence in the decades-long rivalry between the two states...

There is a reason why direct military strikes between nations are rare, even between sworn enemies. When attacking another state, it is difficult to know exactly how they will respond, though a retaliatory strike is almost often expected.

This is because defence forces are not just used for fighting and winning wars – they are also vital to deterring them. When a fighting force is attacked, it’s important for it to strike back to maintain the perception it can deter future attacks and make a display of its capabilities. This is what is happening right now between Israel and Iran – neither side wants to appear weak.

If this is the case, where does the escalation end? De-escalation is essentially a game of chicken – one side has to be content with not responding to an attack to take the temperature down.

My ECONS101 class has been covering game theory this week, including the chicken game. In the traditional game of chicken there are two rivals in cars, one at each end of the same street. They drive towards each other at top speed, and whichever rival swerves away first loses the game. So, each rival can choose to speed ahead or swerve away, and each would prefer to speed ahead and win the game. However, the problem is that if both simply keep speeding ahead, it will end in a disastrous crash.

I also recently read the book Hidden Games, by Moshe Hoffman and Erez Yoeli (which I reviewed here). Hoffman and Yoeli have an interesting section in the book on the hawk-dove game, which essentially the chicken game but with a slightly different motivating context. In the hawk-dove game, two rivals are competing over some resource. Each rival can choose to be aggressive or submissive, and whichever rival is more aggressive will win the resource. Each rival would prefer to be aggressive and win the resource. However, if both are aggressive, it ends in a massively disastrous battle.

Coming back to the case of Iran and Israel, this is clearly an example of the hawk-dove game (or the chicken game, if you prefer). This game is laid out in the payoff table below, where the strategies for Israel and Iran are to be aggressive, or submissive. The payoffs are expressed as "+" for good outcomes, and "-" for bad outcomes (and "--" is particularly bad), while zero is a neutral payoff.

To find the Nash equilibrium in this game, we use the 'best response method'. To do this, we track: for each player, for each strategy, what is the best response of the other player. Where both players are selecting a best response, they are doing the best they can, given the choice of the other player (this is the definition of Nash equilibrium). In this game, the best responses are:

  1. If Iran chooses to be aggressive, Israel's best response is to be submissive (since "-" is better than "--" as a payoff - in other words, taking a bit of punishment is better than a massively disastrous war) [we track the best responses with ticks, and not-best-responses with crosses; Note: I'm also tracking which payoffs I am comparing with numbers corresponding to the numbers in this list];
  2. If Iran chooses to be submissive, Israel's best response is to be aggressive (since "+" is better than "0" as a payoff);
  3. If Israel chooses to be aggressive, Iran's best response is to be submissive (since "-" is better than "--" as a payoff); and
  4. If Israel chooses to be submissive, Iran's best response is to be aggressive (since "+" is better than "0" as a payoff).

In this scenario, there are no dominant strategies. Neither country has a strategy that is always better for them, no matter what the other country chooses to do. However, there are two Nash equilibriums (outcomes where both players are playing their best response), which occur when one country is aggressive, and the other is submissive.

The thing about the Iran-Israel hawk-dove game is that it isn't really a simultaneous game, as shown in the table above. It is a sequential game. Each player chooses whether to be aggressive or submissive, knowing what the other player chose to do previously. That sequential game is shown below. [*]

We can solve a sequential game using 'backward induction', which is essentially the same as the best response method, except we make sure we start with the last player, and work our way backwards through the game to work out what the first player should do. The resulting equilibrium that we find will be a 'subgame perfect Nash equilibrium'. In this case:

  1. If Israel chooses to be aggressive, Iran's best response is to be submissive (since "-" is better than "--" as a payoff) - now, since Iran would never choose to be aggressive when Israel has already been aggressive, Israel knows that the outcome will be that Iran is submissive;
  2. If Israel chooses to be submissive, Iran's best response is to be aggressive (since "+" is better than "0" as a payoff) - now, since Iran would never choose to be submissive when Israel has already been submissive, Israel knows that the outcome will be that Iran is aggressive;
  3. Israel will choose to be aggressive (since "+" is better than "-" as a payoff).

The subgame perfect Nash equilibrium is that Israel is aggressive, and Iran is submissive. As it turns out, that's sort of what happened. After an initial flurry of missile attacks, Iran stopped escalating the conflict.

One last thing to note is that this is a repeated game. Israel and Iran will find themselves in conflict often. The games as outlined above suggest that whichever country is the initial aggressor will end up getting their way, because the country moving second in the sequential game will be better off being submissive than retaliating. However, in repeated games, the outcome can often deviate from the equilibrium for strategic reasons.

Israel clearly doesn't want Iran to be aggressive, even though aggression would be good for Iran, if they moved first in this game. So, Israel wants to convince Iran not to make an aggressive first move. The only way that Israel can do that is to convince Iran that Iran would be worse off by making an aggressive first move. Israel needs to convince Iran that Israel will always retaliate with aggression. That would deter Iran from being aggressive as a first move. How does Israel achieve this? By developing a reputation for aggressively retaliating against any aggression. And indeed, that is what Israel has done (one need only look at Gaza or Lebanon for confirmation of this).

Israel's aggressive response to attacks by Iran, Gaza, and Lebanon is part of a strategic plan to deter future aggression against Israel. Many of us may not like it, but it's strategically rational. Whether it has a lasting effect remains to be seen.

*****

[*] I'm showing the game as having Israel move first. However, if you read Thomas's article, you'll see that 'who started it' is actually contested. I'm not taking a stand on that here, and in fact the game looks identical if Iran moves first.