Wednesday, 9 April 2025

Book review: The Economists' Hour

Once upon a time, economists were backroom advisers, crunching numbers and developing theories, but rarely in the limelight and certainly not the central actors in political decision-making. However, as Binyamin Appelbaum outlines in his 2019 book The Economists' Hour, that all changed in the late 1960s. The title of the book references the period from 1969 to 2008, a period of unprecedented policy change (in the US and in other countries), and a period where economists had the ear of the key governmental decision-makers. As Appelbaum notes in the introduction to the book:

This book is a biography of the revolution. Some leading figures are relatively well-known, like Milton Friedman, who had a greater influence on American life than any other economist of his era, and Arthur Laffer, who sketched a curve on a cocktail napkin in 974 that helped to make tax cuts a staple of Republican economic policy. Others may be less familiar, like Walter Oi, a blind economist who dictated to his wife and assistants some of the calculations that persuaded Nixon to end military conscription; Alfred Kahn, who deregulated air travel and rejoiced in the cramped and crowded cabins on commercial flights as the proof of his success; and Thomas Schelling, a game theorist who persuaded the Kennedy administration to install a hotline to the Kremlin - and who figured out a way to put a dollar value on human life.

That paragraph neatly sums up the book. Each chapter is devoted to one particular aspect of policy that changed as a result of the influence of economists. Before reading the book, I had no idea of the important role that economists played in ending military conscription in favour of volunteer armed forces. I was, however, well aware of economists' role in deregulation of airlines, as well as deregulation of interstate trucking in the US, and of financial markets, and the development of monetary policy and the independence of central banks. Some particular parts are surprising, such as the relatively late impact of economists on antitrust regulation (only from the 1960s). However, like other areas covered in the book, economists drove a radical change in policy in that space:

The rise of economics transformed the role of antitrust law in American life. During the second half of the twentieth century, economists gradually persuaded the federal judiciary - and, to a lesser extent, the Justice Department - to set aside the original goals of antitrust law and to substitute the single objective of providing goods and services to consumers at the lowest possible prices.

Appelbaum describes in some detail the contributions of the key players in each case, including economists as well as political decision-makers and their other advisors. Some figures, such as Friedman and various US presidents, make many appearances, and often similar ideas come up across multiple chapters. This repetition might turn some readers off. However, it is difficult to see how the book might have been constituted in any other way, because the thread of each case would easily be lost if all the material were presented chronologically.

The book is incredibly well researched, with nearly 90 pages of footnotes. As is sometimes the case in books like this, particularly for readers that are familiar with the general story, the footnotes present details that are of more interest than the text itself. For example, consider this footnote on Milton Friedman, and real and nominal interest rates:

This is another example of a battle Friedman won so completely that his victory is largely forgotten. He insisted during the 1950s and the 1960s that there was a significant difference between real and nominal rates. Conventional economists disagreed... Today the distinction between real and nominal rates is universally understood to be significant.

Indeed, we teach the difference between real and nominal interest rates (and the relationship between them known as the 'Fisher equation'), but Friedman's battle to have this recognised is largely forgotten.

I really enjoyed that Appelbaum didn't limit the book to only considering the US case. Economists had important roles in reshaping the economies in Chile and Taiwan, and in deregulating markets across the developed world. Appelbaum writes a lot about deregulation in Iceland. If there is one missing element to the book, it would be the relative lack of attention paid to economists' roles in the transitional economies of former Communist countries such as Poland, Hungary, of the Soviet Union. However, New Zealand does make an appearance a couple of times, including this bit:

In December 1989, New Zealand passed a law making price stability the sole responsibility of its central bank, sweeping away a 1964 law that, characteristically for its time, had instructed the central bank to pursue a laundry list of goals including economic growth, employment, social welfare, and trade promotion. The man picked to lead New Zealand's experiment was an economist named Don Brash, who ran one of the nation's largest banks and then one of its largest trade groups, the Kiwifruit Authority...

Appelbaum is careful not to provide an overly rosy view of the role of economists, and the impacts of these changes. Indeed, in the introduction he warns that:

This book is also a reckoning of the consequences...

Markets make it easier for people to get what they want when they want different things, a virtue that is particularly important in pluralistic societies which value diversity and freedom of choice. And economists have used markets to provide elegant solutions to salient problems, like closing a hole in the ozone layer and increasing the supply of kidneys available for transplant.

But the market revolution went too far. In the United States and in other developed nations, it has come at the expense of economic equality, of the health of liberal democracy, and of future generations.

And almost as quickly as it began, perhaps, the economists' hour was over:

The Economists' Hour did not survive the Great Recession. Perhaps it ended at 3:00 p.m. on Monday, October 13, 2008, when the chief executives of America's nine largest banks were escorted into a gilded room at the Treasury. The government had tried to support the banks by purchasing bonds in the open market, but the market had collapsed, so the government decided to save the financial system by taking ownership stakes in the largest financial firms.

Or perhaps it was one of a dozen other moments during the financial crisis; it doesn't really matter which. In the depths of the Great Recession, only the most foolhardy purists continued to insist that markets should be left to their own devices...

However, it would be fair to note that economists continue to have a strong influence in policy, in other countries if not in the US (as the current furore over tariffs attests).

I really enjoyed this book, and if you have an interest in understanding how economics (and economists) came to have such an important influence on policy, I am sure that you will enjoy it too. Highly recommended!

Tuesday, 8 April 2025

Supply curves slope upwards... Nigerian cocoa edition

The New Zealand Herald reported last month:

Booming cocoa prices are stirring interest in turning Nigeria into a bigger player in the sector, with hopes of challenging top producers Ivory Coast and Ghana, where crops have been ravaged by climate change and disease.

Nigeria has struggled to diversify its oil-dependent economy but investors have taken another look at cocoa beans after global prices soared to a record US$12,000 ($21,000) per tonne in December.

“The farmers have never had it so good,” Patrick Adebola, executive director at the Cocoa Research Institute of Nigeria, told AFP.

More than a dozen local firms have expressed interest in investing in or expanding their production this year, while the British Government’s development finance arm recently poured US$40.5 million into Nigerian agribusiness company Johnvents.

When the price of a good increases, sellers become willing and able to supply more of the good. In general, sellers want to increase their profits. When the price of a good increases, it becomes more profitable to sell it, and so sellers want to sell more of it. [*] This intuition is embedded in the supply curve, as shown in the diagram below. When the price of cocoa is P0, sellers want to sell Q0 tonnes of cocoa. But when the price increases to P1, sellers want to sell Q1 tonnes of cocoa.

What might have caused the increase in the global price of cocoa? The New Zealand Herald article explains that:

Ivory Coast is by far the world’s top grower, producing more than two million tonnes of cocoa beans in 2023, followed by Ghana at 650,000 tonnes.

But the two countries had poor harvests last year as crops were hit by bad weather and disease, causing a supply shortage that sent global prices to all-time highs.

I'll refrain from drawing the global market for cocoa, but suffice to say that the high global price of cocoa is attracting Nigerian farmers to produce more, illustrating that the supply curve for Nigerian cocoa is upward sloping.

*****

[*] There are at least two other explanations for why the supply curve is upward sloping, and both relate to opportunity costs. First, as sellers produce more of the good, the factors of production (raw materials, labour, capital, etc.) become more scarce and so become more expensive. Also, less relevant (and so more costly) inputs begin to be used to produce the good. So, the opportunity costs of production increase, and as the sellers produce more the minimum price they are willing to accept increases more because their marginal cost is increasing. Second, when the price is low the opportunity cost of not selling is low, but as the price rises the opportunity cost of not selling rises, encouraging the sellers to offer more for sale. In other words, as the price increases, the sellers do less of not selling (yes, that is a double negative, and it was intentional). As the price increases, the sellers want to sell more.

Sunday, 6 April 2025

Do economists act like the self-interested decision-makers from our models, and if so, why?

Economics models typically assume that decision-makers are self-interested, trying to maximise their own 'economic rent'. Does exposure to these models, and the assumption of self-interest, lead people who have studied economics to make more self-interested decisions? Or, are people who make more self-interested decisions more likely to study economics (perhaps because it accords with their already-established world view)?

These are questions that many studies have tried to grapple with (and which I have written about before, most recently in this 2023 post). What is needed is a good systematic review of the literature. We don't have that, but this 2019 article by Simon Hellmich (Bielefeld University), published in the journal The American Economist (sorry, I don't see an ungated version online), provides a review of the literature (up to 2019, of course).

Hellmich prefers the term "people trained in economics" rather than "economists", noting that much of the literature focuses on undergraduate students who have only taken one or a few courses in economics, and can hardly be considered "economists". Hellmich reviews the empirical literature that comes from both lab experiments and field experiments, although it is worth noting that most of the literature makes us of lab experiments. He draws three broad conclusions from the literature:

• People trained in economics behave more in accordance with the standard paradigms of their discipline in situations that are typically described in economic categories. They tend to prioritize their self-interest in games... but this is at least in part an outcome of their expectations about other peoples’ behavior and social interaction can strengthen their cooperativeness.

• Most of the experiments reviewed here involve economic decisions (i.e., involve the allocation of money); in most of the less obviously economic decisions, people trained in economics do not seem to be much less concerned with other people’s welfare and no more likely than other people to expect opportunism from other individuals. All in all... there is not much unambiguous support for the view that training in economics affects the fundamental preferences of people by making them more “selfish” or opportunistic.

• Most empirical evidence seems to be consistent with the self-selection assumption and more than half of the relevant studies—some of them providing high-quality evidence— seem to suggest that there are training effects... Probably both forces play a role.

In other words, the review doesn't really tell us much more than we already knew. People trained in economics behave in a more self-interested way, and part (or perhaps most) of the reason for that is the types of people who choose to train in economics. What Hellmich adds to this research question, though, is a concern about the way that previous research has tried to identify the effects, and in particular, the way that the research is framed (from the perspective of the research participants). He notes that:

...most of the experiments reviewed here lack sufficient consideration of the fact that human subjects in experiments do not mechanistically and passively respond to selected stimuli consciously created and controlled by the experimenter, and in so doing reflect their fundamental preferences. Instead, human subjects tend to interpret cues given to them—perhaps unconsciously— by the experimenter or the environment and what they might know about the theories underlying the experiment... In social dilemmas that involve decisions that are clearly identifiable as being of an economic nature (e.g., because they involve the allocation of money), people compete more than if this trait is less clear... In market-like contexts, there is broad acceptance of self-interest. It may even constitute the social norm to follow...

In other words, perhaps people trained in economics act differently in these experiments because the lab environment, and the wording of the decisions, induces them to apply their economics skills. This would explain why, in the field experiments conducted in more naturalistic settings, the behaviour of people trained in economics differs much less from other people than it does in the lab experiments. Hellmich is essentially arguing for more investigation of real-world decisions, and how they differ between people trained in economics and people who are not. That seems like a sensible suggestion.

However, the overwhelming result from Hellmich's review is that people trained in economics are "different" in meaningful ways (including higher levels of self-interest), and that difference should be recognised. He concludes that:

...as provisional steps, we should perhaps try to make students more aware of the fact that most economists understand key elements of neoclassical theory—like the homo economicus—as an instrument to explain macrophenomena rather than as a normative model of micro-behavior and how other elements of the “culture” of the discipline might make their judgments deviate from that of other groups.

In other words, our students (and other people) need to understand that self-interested behaviour is an assumption that we make in economic models, and not an ideal to strive for.

Read more:

Saturday, 5 April 2025

Qantas tries to execute a break-out of Air New Zealand's locked-in customers

As I noted in this post last weekcustomer lock-in occurs when consumers find it difficult (costly) to change once they have started purchasing a particular good or service. Having locked-in consumers is quite profitable for firms. They can raise their prices without fear of losing those consumers, or they can leverage their locked-in status to sell them other things.

Of course, if another firm wants to compete with a firm that has locked in its consumers, the competing firm may need to find some way of breaking those consumers out of being locked in. That usually involves trying to lower the switching costs that are keeping the consumers locked in. We saw an example of this late last year, when Qantas made a bid to lure away Air New Zealand's frequent flyers, as reported in the New Zealand Herald in November:

Qantas is targeting Air New Zealand’s upper-tier Airpoints members as it looks to grow its loyalty programme here beyond one million members.

As part of an aggressive push into New Zealand, Qantas will fast-track Gold members of other airline loyalty programmes into its scheme.

Those who hold Gold or higher equivalent status with other ‘‘select airlines’' can fast-track to Qantas Gold by earning 100 status credits in 90 days on flights with Qantas, Jetstar and partner airlines.

Gold status is usually obtained by earning 700 status credits in a membership year.

In addition, participating members will get access to the airline’s network of Qantas Club lounges and extra checked baggage during the 90-day fast-track offer...

Qantas is also targeting a wider range of New Zealanders to ensure they take advantage of points they already have.

Qantas Frequent Flyer will remove the $60 join fee on its website later this month.

Loyalty schemes, like frequent flyer programmes, lock consumers in because if they switch to a different programme, they lose the benefits that their current programme provides, and their frequent flyer points or airmiles will eventually expire (those are the switching costs). Qantas is trying to reduce those switching costs by fast-tracking Air New Zealand Gold Airpoints members to Qantas Gold, meaning that consumers who switch wouldn't lose their frequent flyer benefits (or wouldn't lose them for long). The switching costs aren't eliminated, because their Air New Zealand frequent flyer points will eventually expire, but they are substantially reduced. The lower cost of switching would probably attract at least some Air New Zealand frequent flyers to make the switch. As the article notes:

Qantas made a similar offer to Air NZ Gold members in 2020 which [Qantas Loyalty chief executive Andrew] Glance said had been successful.

Taking advantage of switching costs and customer lock-in is an important way that firms use to increase their profitability. It isn't surprising that firms have discovered countermeasures to restrict their competitors' ability to lock-in customers. What might be more surprising is that Air New Zealand didn't appear to retaliate by offering a similar deal for Qantas frequent flyers!

Friday, 4 April 2025

This week in research #69

Here's what caught my eye in research over the past week:

  • Altindag, Cole, and Filiz (with ungated earlier version here) find that students' academic performance is better when their race matches their teachers, but that this is only true for students who are younger than their teacher, and not for students who are a similar age or older than their teacher (role models clearly matter)
  • Calamunci and Lonsky (open access) find that, between 1960 and 1993, an Interstate highway opening in a county led to an 8% rise in total index crime, driven by property crime (burglary, larceny, and motor vehicle theft)
  • Achard et al. (open access) find that individuals living close to newly installed refugee facilities in the Netherlands developed a more positive attitude towards ethnic minorities and became less supportive of anti-immigration parties compared to individuals living farther away

Thursday, 3 April 2025

Mobile phone providers and the repeated switching costs game

This week, my ECONS101 class covered pricing and business strategy, and one aspect of that is switching costs and customer lock-in. Switching costs are the costs of switching from one good or service to another (or from one provider to another). Customer lock-in occurs when customers find it difficult (costly) to change once they have started purchasing a particular good or service. The main cause of customer lock-in is, unsurprisingly, high switching costs.

As one example, consider this article from the New Zealand Herald last month:

A new Commerce Commission study has found the switching process between telecommunications providers is not working as well as it should for consumers...

The study found 50% of mobile switchers and 45% of broadband switchers ran into at least one issue when switching.

The experience was so bad that 29% of mobile switchers and 27% of broadband switchers said they wouldn’t want to switch again in future...

The commission’s latest consumer satisfaction report found that 31% of mobile consumers and 29% of broadband consumers have not switched because it requires ‘too much effort to change providers’...

Gilbertson said a lack of comprehensive protocols between the “gaining” service provider and the “losing” service provider was a central issue with the current switching process.

This led to a number of problems, including double billing, unexpected charges, and delays.

The difficulty of changing from one mobile phone provider to another is a form of switching cost. It's not a monetary cost, but the time, effort, and frustration experienced by consumers wanting to switch makes the process of switching costly. And because the process is costly, mobile phone consumers are locked into their current provider.

It is clear why a mobile phone provider would want to make it difficult (costly) for its consumers to switch away from it and use some other provider. However, why don't mobile phone providers try to make it easier to switch to using their service instead? Maybe they could have staff whose role is to help consumers to navigate the process of switching to their service. That would allow the mobile phone provider to attract consumers and capture a greater market share. The answer is provided by considering a little bit of game theory.

Consider the game below, with two mobile phone providers (A and B), each with two strategies ('Easy' to switch to, and 'Hard' to switch to). The payoffs are made-up numbers that might represent profits to the two providers.

To find the Nash equilibrium in this game, we use the 'best response method'. To do this, we track: for each player, for each strategy, what is the best response of the other player. Where both players are selecting a best response, they are doing the best they can, given the choice of the other player (this is the definition of Nash equilibrium). In this game, the best responses are:

  1. If Provider B chooses to make switching easy, Provider A's best response is to make switching easy (since 3 is a better payoff than 2) [we track the best responses with ticks, and not-best-responses with crosses; Note: I'm also tracking which payoffs I am comparing with numbers corresponding to the numbers in this list];
  2. If Provider B chooses to make switching hard, Provider A's best response is to make switching easy (since 8 is a better payoff than 6);
  3. If Provider A chooses to make switching easy, Provider B's best response is to make switching easy (since 3 is a better payoff than 2); and
  4. If Provider A chooses to make switching hard, Provider B's best response is to make switching easy (since 8 is a better payoff than 6).

Note that Provider A's best response is always to choose to make switching easy. This is their dominant strategy. Likewise, Provider B's best response is always to make switching easy, which makes it their dominant strategy as well. The single Nash equilibrium occurs where both players are playing a best response (where there are two ticks), which is where both providers make switching easy.

So, that seems to suggest that the mobile phone providers should be making switching to them easier. However, notice that both providers would be unambiguously better off if they chose to make switching hard (they would both receive a payoff of 6, instead of both receiving a payoff of 3). By both choosing to make switching easy, it makes both providers worse off. This is a prisoners' dilemma game (it's a dilemma because, when both players act in their own best interests, both are made worse off).

That's not the end of this story though, because the simple example above assumes that this is a non-repeated game. A non-repeated game is played once only, after which the two players go their separate ways, never to interact again. Most games in the real world are not like that - they are repeated games. In a repeated game, the outcome may differ from the equilibrium of the non-repeated game, because the players can learn to work together to obtain the best outcome.

So, given that this is a repeated game (because the providers are constantly deciding whether to make switching easier or not), both providers will realise that they are better off making switching harder, and receiving a higher payoff as a result. And unsurprisingly, that is what happens, and it doesn't require an explicit agreement between the players - the agreement is 'tacit' (it is understood by the providers without needing to be explicit). Each provider just needs to trust that the other providers will make switching hard (because there is an incentive for each provider to 'cheat' on this outcome). Any instance of cheating (by making switching easier) would be immediately known by the other providers, and the agreement would break down, making them all worse off. So, there is an incentive for all providers to keep switching hard for the consumers. Even a new entrant firm into the market, which might initially make it easy for consumers to switch to them in order to capture market share, would soon realise that they are then better off making switching more difficult (it is not so long ago (2009) that 2degrees was a new entrant in this market).

The Commerce Commission is correct that the difficulty of switching mobile phone providers (the switching cost) keeps consumers with their current provider (customer lock-in). The result is that the mobile phone providers can profit from increasing prices for their lock-in consumers. The only solution to this situation would be to find some way to force a breakdown of the tacit arrangement. Then the market would settle at the equilibrium of all providers making it easy to switch to them. This may be an instance where some regulation is necessary.

Tuesday, 1 April 2025

The emerging debate on Oprea's paper on complexity and Prospect Theory

Late last year, an article in the American Economic Review by Ryan Oprea caught my attention (and I blogged about it here). It purported to show that the key experimental results underlying Prospect Theory may in part be driven by the complexity of the experiments that are used to test them. These were extraordinary results. And when you publish a paper with extraordinary results, that could potentially overturn a large literature on a particular theory, then those results are going to attract substantial scrutiny. And indeed, that is what has happened with Oprea's paper.

The team at DataColada, most well-known for exposing the data fakery of Dan Ariely and Francesca Gino (and the resulting lawsuit, which was dismissed), have a new working paper, authored by Daniel Banki (ESADE Business School) and co-authors, looking at Oprea's results (see also the blog post on DataColada by Uri Simonsohn, one of the co-authors). To be clear before I discuss Banki et al.'s critique, they don't accuse Oprea of any misconduct. They mostly present an alternative view of the data and results that appears to contradict key conclusions that Oprea finds in his paper. Oprea has also provided a response to some of their critique.

I'm not going to summarise Oprea's original paper in detail, as you can read my comments on it here. However, the key result in the paper is that when presented with risky choices, research participants' behaviour was consistent with Prospect Theory, and when presented with choices that involved no risk at all but were complex in a similar way to the risky choices ('deterministic mirrors'), research participants' behaviour was also consistent with Prospect Theory. This suggests that a large part of the observed results that underlie Prospect Theory may arise because of the complexity of the choice tasks that research participants are presented with.

Banki et al. look at a number of 'comprehension questions' that Oprea presented research participants with, and note that:

...75% of participants made an error on at least one of the comprehension questions, such as erroneously indicating that the riskless mirror had risk.

Once the data from those research participants is excluded, Banki et al. show that research participant behaviour differs between lotteries and mirrors for the research participants who 'passed' the comprehension checks (by getting all four of the comprehension questions correct on their first try). This is captured in Figure 2 from Banki et al.'s paper:

The two panels on the left of Figure 2 show the results for the full sample, and notice that both lotteries (top panel) and mirrors (bottom panel) look similar in terms of results. In contrast, when the sample is restricted to those that 'passed' the comprehension checks, the results for lotteries and mirrors look very different. Which is what we would expect, if research participants are not 'fooled' by the complexity of the task.

Banki et al. provide a compelling reason why the results for the research participants who failed the comprehension checks looks the same for lotteries and mirrors: regression to the mean. As Simonsohn explains in the DataColada blog post, this arises because of the way that a multiple-price list works:

When the dependent variable is how much people value prospects, regression to the mean creates spurious evidence in line with prospect theory. When people answer randomly for 10% chance of $25, they overvalue it, because the “right” valuation is $2.50, and the scale mostly contains values that are higher than that. When people answer randomly for 90% chance of $25, they undervalue it, because the “right” valuation is $22.50 and the scale mostly contains values that are lower than that. Thus, random or careless responding will produce the same pattern predicted by prospect theory.

Oprea responds to both of these points, noting that:

...a range of imperfectly rational behaviors including noisy valuations, anchoring-and-adjustment heuristics, compromise heuristics and pull-to-the-center heuristics will all tend to produce prospect-theoretic patterns of behavior simply because of the nature of valuation. BSWW offer this possibility as an alternative to the Oprea (2024)’s account of his data, but in fact these are examples of exactly the types of cognitive shortcuts Oprea (2024) was designed to study.

In other words, Banki et al.'s results don't refute Oprea's results, but are very much in line with Oprea's. One thing that Oprea does take issue with is Banki et al.'s use of medians as the preferred measure of central tendency. Oprea uses the mean, and when reanalysing the data with the same exclusions as Banki et al., Oprea shows that the mean results look similar to the original paper. So, Banki et al.'s results are not simply driven by excluding the research participants who failed the comprehension checks, but also by switching from using the mean to using the median.

On that point, I'm inclined to agree with Banki et al. The median is often used in experimental economics, because it is less influenced by outliers. And if you look at Oprea's data, there are a lot of large outliers, which become quite influential observations when the mean is used as the summary statistic. However, the outliers are likely to be the observations you want to have the smallest effect on your results, not the largest effect.

Oprea also critiques Banki et al.'s interpretation of the comprehension questions. Oprea rightly notes that:

...it is important to emphasize that these training questions weren’t designed to measure beliefs (e.g., payoff confusion), and because of this they are poorly suited to the task BSWW repurpose it for, ex post. Indeed, evidence from the patterns of mistakes made in these questions suggests that overall training errors largely serve as a measure of the cognitive effort (an important ingredient in Oprea (2024)’s account) subjects apply to answering these questions, and that BSWW therefore substantially overestimate the level of payoff confusion with which subjects entered the experiment.

In other words, the 'comprehension questions' are not comprehension questions at all, but they are really 'training questions' that were used to train the research participants to understand the choice tasks that they would be presented with. And so, using those training questions overall as a measure of understanding misses the point, and seriously underestimates the amount of understanding of the task that research participants had by the time they had completed the training questions.

Oprea's response is good on this point. However, if the training questions had really done a good job of training the research participants, then all participants should have had a similar level of understanding by the end of the training questions, and there should be no detectable differences in behaviour between those with more, and those with fewer, 'failed' training questions. That wasn't the case - the behaviour of the research participants who made errors in training was much more likely to be the same for lotteries and mirrors than was the behaviour of research participants who made no errors. To clear this up, it would have been interesting to have research participants also complete 'comprehension questions' at the end of the experimental session, to see if they still understood the tasks they were being asked to complete. At that point, those failing the comprehension questions could be dropped from the dataset.

One point of Banki et al.'s critique that Oprea hasn't engaged with (yet, although he promises to do so in a future, more complete response), is their finding that a larger than 'usual' proportion of the research participants fail 'first order stochastic dominance' (FOSD). A failure of FOSD in this context means that a research participant valued a lottery (or mirror) lower than a similar lottery that was strictly better. For example, valuing a 90% chance of receiving $25 less than a 10% chance of receiving $25 is a failure of FOSD. Banki et al. show that:

We begin by examining G10 and G90. Violating FOSD here involves valuing the 10% prospect strictly more than the 90% one. Across all participants (N = 583), 14.8% violated FOSD for mirrors, and 13.9% for lotteries. These rates are quite high given that the prospects differ in expected value by a factor of nine.

Those failure rates are much higher than for other similar research studies. Banki et al. note an overall rate of 20.8 percent in the Oprea results, compared with an average of 3.4 percent across eight other highly cited studies. It will be interesting to see how Oprea responds to that point in the future.

This is an interesting debate so far. Oprea does a good job of summing up where this debate should probably go next:

Ultimately, however, these questions and ambiguities can only be fully resolved by further research. While BSWW’s critique has not convinced me that the interpretation offered in Oprea (2024) is mistaken, I am eager to see new experiments that deepen, alter, or even overturn this interpretation. First, concerns that the Oprea (2024)’s results are a consequence of the design being too confusing to yield insight can only really be resolved one way or another by followup experiments that vary his procedures, instructions and other design choices in such a way as to satisfy us that the Oprea (2024) results are (or are not) overfit to that design.

Indeed, more follow-up research is needed. Prospect Theory hasn't been overturned, yet (and as I noted in my earlier post, it is consistent with a lot of real-world behaviour). However, now we know that it may be vulnerable, and Oprea's paper provides a starting point for testing more thoroughly how much of the experimental results arise from complexity.

[HT: Riccardo Scarpa]

Read more:

Monday, 31 March 2025

Pricing like Ferrari

This week my ECONS101 class is covering pricing strategy. Essentially, this topic is about a lot of situations (supported by real-world examples) where firms may choose not to price at the single profit-maximising price. Most of the time, deviations from the profit-maximising price involve the firm pricing at a lower price than the profit-maximising price. The firm might set a lower price in order to generate goodwill and a long-term relationship with consumers, or to sell a greater quantity so that it can take advantage of moving down the learning curve (and achieving lower costs quicker), or to keep competitors out of the market (what economists refer to as limit pricing). The thing about all of those situations is that, by setting a lower price now, the firm earns more profits in the long run. It seems to me to be less clear that firms would want to set a higher price than the profit-maximising price now. Unless they face consumers like this:

Perhaps some consumers are simply willing to buy a good because it has a higher price. That is the basis of conspicuous consumption (which I have written about before here). However, I want to take this in a different direction, because firms can set a high price without needing to rely on conspicuous consumption, even when it seems like a possible explanation for what the firm is doing. Consider this example from the Wall Street Journal last month (ungated version here):

With a list price of $3.7 million, Ferrari’s new “hypercar” was revealed to the public in October with a twist: It wasn’t available for sale.

All 799 units of the low-slung, high-haunched F80 model—the most expensive production vehicle in Ferrari’s history—had been promised to top customers like Luc Poirier.

The Montreal real estate entrepreneur already owns 42 Ferraris. He said he felt “lucky” to be allowed to buy yet another.

“To be chosen by Ferrari for one of their hypercars is a true milestone for any collector,” he said.

Money isn’t enough to buy a top-of-the-range Ferrari. You need to be in a long-term relationship with the company.

By leveraging the rabid fandom of its customers through a business model based on uber-scarcity, the storied Italian company is enjoying a new golden age.

When goods are scarcer, the marginal consumer is willing to pay more for them. This is the 'Law of Demand' working in reverse. If the firm restricts the quantity it sells, then it moves up the demand curve and can sell at a higher price. However, by definition, setting a higher price than the profit-maximising price decreases profits. And, there doesn't seem to be a mechanism where over-pricing their cars gives Ferrari a long-term increase in profits. So, let's consider what they are actually doing.

Consider the market for regular, run-of-the-mill Ferraris. Because there are lots of substitutes for a regular, run-of-the-mill Ferrari, the demand for Ferraris is relatively elastic (shown by the flat demand curve D1). When Ferrari prices its cars, it sets the price so that it will sell the quantity where marginal revenue is exactly equal to marginal cost. That is the quantity Q*, and the price P1. The mark-up for Ferrari is the difference between P1 and marginal cost (MC). 

Ferrari could try setting the price higher than P1, but as noted above, this would decrease the quantity sold below the profit-maximising quantity Q*, and by definition this would decrease Ferrari's profits. So, how could Ferrari increase its profits from selling run-of-the-mill Ferraris? One way is to make demand less elastic (making the demand curve steeper). If the demand curve was steeper, like D0, then the profit maximising price would be P0 rather than P1, and the mark-up on run-of-the-mill Ferraris would be much higher. Selling run-of-the-mill Ferraris would be much more profitable.

If you are a seller, how can a firm make demand for its good less elastic? One of the factors that affects the price elasticity of demand is the number of close substitutes. If the firm can decrease the number of substitutes, or make its good less substitutable by other goods (reducing the number of close substitutes), then demand will be less elastic.

This is what Ferrari is doing by selling its most premium cars only to consumers "in a long-term relationship with the company". If you really want a Ferrari hypercar (or whatever the latest release Ferrari is), then you need to be buying run-of-the-mill Ferraris. That makes other luxury cars less close substitutes for a run-of-the-mill Ferrari, making demand for run-of-the-mill Ferraris less elastic, and allowing Ferrari to set a higher price for run-of-the-mill Ferraris. Since Ferrari sells a lot more run-of-the-mill Ferraris than hypercars, this is likely to be much more profitable for Ferrari overall:

Anyone with a few hundred thousand dollars to spare can buy a regular Ferrari as long as they are willing to wait a couple of years. While the standard models aren’t subject to strictly limited runs, the company still lives by Enzo Ferrari’s scarcity dictum: “Ferrari will always deliver one car less than the market demands.”

Limited-edition Ferraris are even scarcer, and you can’t just walk into your local showroom and buy one. These range from special versions of regular models to the design-oriented “Icona” and, most exclusively, once-in-a-decade hypercars like LaFerrari and the F80.

Such models help keep orders flowing for the company’s entire product range even though they account for a fraction of deliveries—just 7% last year. Collectors had on average bought 10 new Ferraris before qualifying to buy LaFerrari or an Icona, which means icon in Italian, according to Hagerty.

Maybe buying a premium Ferrari is conspicuous consumption, and maybe Ferrari is taking advantage of that. However, it is also using its premium Ferraris to increase the price and profitability of a run-of-the-mill Ferrari.

[HT: Marginal Revolution]

Sunday, 30 March 2025

Another study of MasterChef that doesn't tell us much because of survivorship bias

Data from sports and games can tell us a lot about decision-making and behaviour. That's because the framework within which decisions are made, and behaviour takes place, is well defined by the rules of the sport or game. That's why I really like to read studies in sports economics, and often post about them here. I also like to read studies that use data from game shows, where the framework is clearly defined.

While those sorts of studies can tell us a lot, they still need to be executed well, and unfortunately, that isn't always the case. Consider this post, where I outlined a clear problem of survivorship bias in the analysis of a paper using data from MasterChef. Sadly, that paper is not alone as one of the authors, Alberto Chong (Georgia State University) has made a similar mistake in a follow-up paper, again using the same dataset from MasterChef.

This new paper intends to look at the relationship between exposure to anger and performance. As Chong explains:

Being exposed to anger in others may provide a burst of energy and increase focus and determination, which maybe translated into increased performance. However, the opposite may also be true. Exposure to anger in others may cloud judgment, impair decision-making, and may end up decreasing performance. In short, understanding whether the link between these two variables is positive or negative is an empirical question.

And if you've ever watched MasterChef (the US version), you will know that anger is a key feature of the series. For example:

So, Chong looks at whether exposure to the angry reactions of the judges affects contestants' performance overall, including their final placement, as well as the number of challenges they placed in the top three, their probability of placing in the top three, and their probability of winning. The dataset covers all seasons of MasterChef from 2010 to 2020. Exposure to anger is measured as "the number of times that any of the contestants have been exposed to anger by any of the judges". Chong finds that:

...people who are exposed to anger appear to react positively to anger by improving their final placement in the competition likely as a result of increased focus and determination. In particular, we find that it is associated with contestants improving around 1.5 placement positions or higher in the final standings. We also find that the probability of winning the competition increases by around 2.2 percent.

However, there is a problem, and that problem is survivorship bias. Contestants who remain in the show for longer have more opportunity to be exposed to anger from the judges. So, even if angry reactions are completely randomly assigned to contestants, those who survive for more episodes will both attract more angry reactions and have a higher placing overall. There is a mechanistic relationship that drives a negative correlation between placing in the show and exposure to anger. The analysis needs to condition the exposure to anger on the number of opportunities for judges to be angry. So, rather than the number of times exposed to anger, the key explanatory variable should be the proportion of times the contestant is exposed to anger.

Now, in my previous post on analysis of this dataset I demonstrated using some randomly generated data why survivorship bias was a problem. I'm not going to do that this time, because the issue is substantively the same (even if the specific numbers will be different). However, as I noted then, this study is crying out for a replication along with the other one, and together they would make a great project for a motivated Honours or Masters student. Then these studies might live up to the ideal of telling us something about decision-making and behaviour.

[HT: Marginal Revolution]

Friday, 28 March 2025

This week in research #68

Here's what caught my eye in research over the past week (which, it seems, was a very quiet week!):

  • Baker et al. (ungated preprint) provide an excellent overview of difference-in-difference research designs (which I have referred to many times on this blog), including all of the issues that researchers need to be aware of when using this research design

Thursday, 27 March 2025

First results on the non-binary gender earnings gap in New Zealand

The gender gap in pay between men and women is well known. Much less is known about the gender gap (if any) between cisgender men and women, and gender diverse people. However, this new article by Christopher Carpenter (Vanderbilt University) and co-authors, published in the journal Economics Letters (open access, with non-technical summary here), gives us a starting point using novel data from New Zealand. Carpenter et al. make use of Stats NZ's Integrated Data Infrastructure (IDI), which links various administrative datasets. Specifically:

We use NZ Department of Internal Affairs (DIA) birth records to identify birth record sex, which only allow two options: ‘male’ or ‘female’. Next, we link DIA birth records with the NZTA Driver License Register and restrict our sample to individuals who had their driver license registration/renewal in 2021 or after when the NZTA driver license application allowed identification of men, women, and gender diverse people... We compare driver license gender with birth record sex to identify cisgender people (those whose birth record sex matches their driver license recorded gender), transgender people (those whose birth record sex does not match their driver license register gender and whose driver license register gender is either male or female), and gender diverse people (those whose driver license register gender indicates gender diverse).

This is a really smart approach to identifying gender diverse and transgender individuals in the administrative data. It will tend towards false negatives, because not everyone has a driver's licence, and not every gender diverse or transgender person will change their gender on the driver's licence. However, Carpenter et al. are up front about the measurement error that this creates.

Carpenter et al. then look at demographic and other characteristics and at labour market outcomes by gender, comparing transgender and gender diverse people with cisgender people, focusing on whether each person is NEET (not in employment, education, or training), and their taxable income reported to Inland Revenue. In terms of demographics, they find that:

...relative to cisgender women, transgender men are younger, less likely to be of European descent, less likely to be married or in a civil union, less likely to have children, more likely to live in Auckland or Wellington, less likely to have a tertiary qualification, and more likely to have a mental health prescription... gender diverse individuals whose birth record sex is female... are younger, less likely to be married, less likely to have had any children, more likely to have a mental health prescription, and more likely to be NEET than both transgender men and cisgender women. Regarding education, gender diverse individuals whose birth record sex is female are more likely than transgender men but less likely than cisgender women to have a tertiary qualification...

Relative to cisgender men, transgender women are younger, less likely to be of European descent, less likely to be married or in a civil union, less likely to have children, more likely to live in Auckland or Wellington, and more likely to have a mental health prescription than cisgender men... gender diverse individuals whose birth record sex is male... are younger, more likely to have tertiary education, and more likely to have a mental health prescription than both transgender women and cisgender men. Gender diverse individuals whose birth record sex is male are much more similar to transgender women than they are to cisgender men with respect to marital status, presence of children, and residence in Auckland or Wellington.

Interesting stuff, although superseded by data coming out of the 2023 Census, which for the first time collected comprehensive data on gender and sexual identity (more on that in a moment). Turning to labour market outcomes, Carpenter et al. find:

...strong evidence that gender minorities in New Zealand are much more likely to be NEET than otherwise similar cisgender people. We estimate that transgender women, gender diverse individuals whose birth record sex is male, and gender diverse individuals whose birth record sex is female are 10–12 percentage points more likely to be NEET than similarly situated cisgender men...

Turning to earnings... we again find that gender minorities earn significantly less than cisgender men with similar observable characteristics. Here, however, the differences for cisgender women – which indicate precise earnings gaps of about 33 % – are similar in magnitude to those estimated for transgender women and transgender men. In contrast, gender diverse individuals whose birth record sex is male and gender diverse individuals whose birth record sex is female both experience significantly larger earnings gaps compared to both cisgender men and cisgender women.

Those earnings gaps for gender diverse people are both over 50 percent. I don't think anyone will be particularly surprised by these results. It has long been suspected that gender diverse people face an earnings penalty, but there has been a lack of data to support this. However, the novel approach by Carpenter et al. has helped to fill in that particular research gap. The next step though, surely, must be to take advantage of the 2023 Census data, which gives much more detail on gender and sexual identity, with what is likely to be far less measurement error. According to the public Stats NZ data, there were over 17,000 people who were 'another gender' (other than male or female) in the 2023 Census (you can find this by browsing Aotearoa Data Explorer for 2023 Census data on gender). However, to disaggregate between different non-binary genders from there requires access to the data in the IDI. It will be interesting to see when the first analyses of that data come out. I'm very sure someone will be looking at it already.

Wednesday, 26 March 2025

GPT-4 tells us how literary characters would play the dictator game

Have you ever wondered what it would be like to interact with your favourite literary characters? What interesting conversation might we have with Elizabeth Bennet or Clarissa Dalloway? Or, who would win if we played a game of monopoly with Ebenezer Scrooge or Jay Gatsby? Large language models like ChatGPT can provide us with a partial answer to that question, because they can be prompted to take on any persona. And because of the wealth of information available in their training data, LLMs are likely to be very convincing at cosplaying famous literary characters.

So, I was really interested to read this new article by Gabriel Abrams (Sidwell Friends High School), published in the journal Digital Scholarship in the Humanities (ungated earlier version here). Abrams asked GPT-4 to play the role of a large number of famous literary characters when playing the 'dictator game'. To review, in the dictator game the player is given an amount of money, and can choose how much of that money to keep for themselves, and how much to give to another player. Essentially, the dictator game provides an estimate of fairness and altruism.

Abrams first asked GPT-4 to identify the 25 most well-known fictional characters in each century from the 17th Century to the 21st Century. Then, for each character, Abrams asked GPT-4 to play the dictator game, as well as to identify the particular personality traits that would affect the character's decision in the game. Abrams then took each personality trait and asked GPT-4 to assign it a valence (positive, neutral, or negative). Finally, Abrams summarised the results by Century, finding that:

There is a general and largely monotonic decrease in selfish behavior over centuries for literary characters. Fifty per cent of the decisions of characters from the 17th century are selfish compared to just 19 per cent from the 21st century...

Humans are more selfish than the AI characters with 51 per cent of humans making selfish decisions compared to 28 per cent of the characters...

So, over time literary characters have become less selfish, but overall the characters are more selfish than real humans. An interesting question, which can't be answered with this data, is whether the change in selfishness also reflects a decrease in selfishness in the population generally (because the selfishness of humans was measured in the 21st Century only). Interestingly, looking at personality traits:

Modeled characters’ personality traits generally have a strong positive valence. The weighted average valence across the 262 personality traits was a surprisingly high +0.47...

I associate many literary figures with their negative traits, and less so with positive traits. Maybe that's just me. Or maybe, the traits that GPT-4 thought were most relevant to the choice in the dictator game tended to be more positive traits. Given that the dictator game is really about altruism and fairness, then that might explain it. Over time, there hasn't been a clear trend in valence:

The 21st century had the highest valence at +0.74... The least positive centuries were the 17th and 19th with +0.28 and +0.29, respectively...

Abrams then turned to the specific personality traits, identifying the traits that were more common (overweighted) or less common (underweighted) in each century, compared with overall. This is summarised in Table 6 from the paper:

There are some interesting changes there, with empathetic shifting from being the most underweighted trait to being the most overweighted trait, while manipulative shifts in the opposite direction (from most overweighted to third-most underweighted). Interesting, and not necessarily what I would have expected. Abrams concludes that:

The Shakespearean characters of the 17th century make markedly more selfish decisions than those of Dickens, Dostoevsky, Hemingway and Joyce, who in turn are more selfish than those of Ishiguro and Ferrante in the 21st century.

Historical literary characters have a surprisingly strong net positive valence. It is possible that there is some selection bias. For instance, scholars or audiences may make classics of books with mainly attractive characters.

That makes sense. One thing that I found missing in the paper was a character-level assessment. It would have been interesting to see the results for favourite (and least favourite) characters individually, and see how they compare with what we might have expected. That could have been added to supplementary materials for the paper, and would have been an interesting read.

Nevertheless, this paper was an interesting exploration of just some of what LLMs can be used for in research. As I've noted before, LLMs have essentially killed off online data collection using tools like mTurk, because the mTurkers may simply use LLMs to respond to the survey or experiment. Researchers can now cut out the middleman, and use LLMs directly to cosplay for research participants based on any collection of characteristics (age, gender, ethnicity, location, etc.). The big question now is, when LLMs are used in this way, is some of the real underlying variation in human responses lost (because LLMs will tend to give a 'median' response for the group they are cosplaying)? The answer to that question will become clear as researchers continue on this path.

[HT: Marginal Revolution, back in 2023]

Monday, 24 March 2025

New Zealand's supermarket sector needs a hero

In yesterday's post, I discussed market power and competition, noting that when there is a lack of competition, firms have more market power, and that means higher mark-ups and higher prices for consumers. An example of a market where there appears to be a high degree of market power is the supermarket sector in New Zealand.

It wasn't always this way. When I was young, I remember there being a large number of different supermarket brands. In Tauranga in the mid-1990s, along Cameron Road between the CBD and Greerton there was Price Chopper (which was previously 3 Guys), Pak'nSave, Big Fresh, Foodtown, New World, and Countdown (and there may be others that I've forgotten, as well as several smaller superettes).

One of the main ways that a market ends up highly concentrated is to start with a market that has some degree of competition, but then some of the firms merge (or take each other over), leaving fewer firms and less competition. In the context of supermarkets in New Zealand, this process is outlined in this article in The Conversation by Lisa Asher, Catherin Sutton-Brady (both University of Sydney), and Drew Franklin (University of Auckland):

The current state of New Zealand’s supermarket sector – dominated by Woolworths (formerly Countdown), Foodstuffs North Island and Foodstuffs South Island – is a result of successive mergers and acquisitions along two tracks.

The first was Progressive Enterprises’ (owner of Foodtown, Countdown and 3 Guys banners) purchase of Woolworths New Zealand (which also owned Big Fresh and Price Chopper) in 2001.

Progressive Enterprises was sold to Woolworths Australia, its’ current owner, in 2005. In less than 25 years, six brands owned by multiple companies were whittled down to a single brand, Woolworths.

The second was the concentration of the “Foodstuffs cooperatives” network. This network once included four regional cooperatives and multiple banners including Mark'n Pak and Cut Price, as well as New World, PAK’nSave and Four Square.

The decision of the four legally separate cooperatives to include “Foodstuffs” in their company name blurred the lines between them. The companies looked similar but remained legally separate.

As a result of mergers, these four separate companies have now become Foodstuffs North Island – franchise limited share company, operating according to “cooperative principles” and Foodstuffs South Island, a legal cooperative.

And so now we find ourselves in a situation with three large supermarket firms, two of which (Foodstuffs North Island and Foodstuffs South Island) are effectively two arms of the same firm, and certainly aren't competing with each other because they operate only on their 'own' islands. With such a lack of competition many people, including Asher et al., are clamouring for change.

Increasing competition in the supermarket sector could take one of two forms. Asher et al. argue that inviting an international competitor into the market will take too long, citing the example of Aldi in Australia, which "took 20 years to reach scale as a third major player in that country". Their preference is 'forced divestiture', breaking up the existing supermarkets into smaller competing firms. Essentially, this would be something of a return to the situation prior to some of the mergers that have characterised the past 30 years of the supermarket sector in New Zealand, but would require a drastic legislative intervention from government.

However, before the government imposes such a dramatic change on this market, it really needs some solid analysis of the impacts of the change. Large supermarket firms benefit from economies of scale in purchasing, logistics and distribution, as well as back-office functions (like payroll, marketing, and finance). If smaller supermarket firms face higher costs because they can't take advantage of the economies of scale available to larger supermarket firms, then breaking up the supermarket chains into smaller chains could lead to even higher prices for consumers. On the other hand, smaller supermarket chains have less bargaining power with suppliers, which might mean that the supermarket suppliers receive better prices (but again, that means higher prices for consumers). Without some careful economic modelling, which has not been done to date, we can't make a clear-eyed assessment of the likely net change in consumer and producer welfare.

And we should be cautious. If forced divestiture starts to gain some political traction, you can bet that the supermarket chains will release some economic analysis that supports a position that breaking them up will be worse for consumers. And consumer advocates might even be able to support their own analyses, showing the opposite. What is needed is a truly independent assessment. And before you raise it, I doubt that we get that sort of independent assessment from the Commerce Commission. They know that their bills are paid by the government of the day, and they may respond accordingly.

Asher et al. ask us to "stop waiting of a foreign hero". What we need is an economist hero, with an independent analysis of the supermarket sector in hand.

Read more:

Sunday, 23 March 2025

Market power, competition, and the collapses of Bonza and Rex

Last week, my ECONS101 class covered market power and competition (as part of a larger topic introducing some of the principles of firm behaviour). This coming week, we'll be covering elasticity, which is closely related and builds on the key ideas of firm behaviour.

Market power is the ability of a seller (or sometimes a buyer) to influence market prices. The greater the amount of market power the seller (or buyer) has, the more they can raise their price above marginal cost. That is, sellers with greater market power will have a higher mark-up (which is the difference between price and marginal cost).

How do firms get market power? There are several ways, but the greatest contributor to market power is the extent of competition in the market. When firms face a lot of competition in their market, they will compete vigorously on price, and so their mark-up will be lower. When firms face less competition in their market, they don't have to compete on price to the same degree, and so their mark-up will be higher.

Another way of seeing this is to consider the price elasticity of demand. When there are many substitutes for a good, the demand for that good will be more elastic. If the seller raises their price, many of their consumers will buy (one of the many) substitutes instead (because those substitutes are now relatively cheaper). So, firms selling a good that has many substitutes (a good that has more elastic demand) will have a lower mark-up. And if a firm's good has many substitutes, that means a lot of competition.

On the other hand, when there are few substitutes for a good, the demand for that good will be less elastic. If the seller raises their price, few of their consumers will buy substitutes instead (because there are few substitutes available). So, firms selling a good that has few substitutes (a good that has less elastic demand) will have a higher mark-up. And if a firm's good has few substitutes, that means less competition.

Taking all of this together, when a market loses one or more of the competitors, and competition reduces, we should expect to see an increase in prices. A clear example of this is what happened when the Australian regional airlines Bonza and Rex closed down last year. As Doug Drury (Central Queensland University) wrote in The Conversation last November:

In 2024 alone, we’ve seen the high-profile collapse of both Bonza and Rex, airlines that once ignited hopes for much greater competition in the sector. Now, we’re beginning to see the predictable effects of their exit.

According to a quarterly report released on Tuesday by the Australian Competition and Consumer Commission (ACCC), domestic airfares on major city routes increased by 13.3% to September after Rex Airlines halted its capital city services at the end of July.

The collapse of the two low-cost domestic airlines in Australia reduced competition on domestic routes. Unsurprisingly, the lower competition means more market power for the remaining airlines, Qantas and Jetstar. And that greater market power has translated into higher prices for domestic airfares in Australia.

The importance of competition for prices cannot be overstated. As one other example, a lack of competition has been implicated in the perceived high prices in New Zealand supermarkets (a point I will come back to in my next post, but this is a topic I have written about before here). The Commerce Commission is understandably concerned whenever there is a lack of competition, and whenever competition will be substantially reduced by the merger of two or more firms (for example, see here and here about the recently rejected Foodstuffs supermarket merger). When there is a lack of competition, sellers have more market power, demand will be less elastic, and for both of those reasons we can expect prices to be higher.

Friday, 21 March 2025

This week in research #67

Here's what caught my eye in research over the past week:

  • Schreyer and Singleton (open access) find that Cristiano Ronaldo increased stadium attendance in the Saudi league, by an additional 20% of the seats in his home team's stadium when he played, 15% in the stadiums he visited, and by 3% where he did not even play
  • Harrison and Glaser find that laws that allow breweries to bypass distributors lead to higher brewery output and employment, and that this is primarily driven by a greater market entry of breweries
  • Ankel‐Peters, Fiala, and Neubauer (open access) review the impact of replications published as comments in the American Economic Review between 2010 and 2020, and find that the comments are barely cited, and they do not affect the original paper's citations, even when the replication diagnoses substantive problems with the original paper (does this show the level of revealed preference for replications?)

Wednesday, 19 March 2025

Book review: The Economist's View of the World

In 1973, the Western Economic Journal published a now-famous article (ungated here) by Axel Leijonhufvud (UCLA), entitled "Life Among the Econ". The article was an ethnographic study of the Econ tribe, and included such gems as:

Almost all of the travellers' reports that we have comment on the Econ as a "quarrelsome race" who "talk ill of their fellow behind his back," and so forth. Social cohesion is apparently maintained chiefly through shared distrust of outsiders.

And:

The young Econ, or "grad," is not admitted to adulthood until he has made a "modl" exhibiting a degree of workmanship acceptable to the elders of the "dept" in which he serves his apprenticeship. 

Obviously, Leijonhufvud's was an economist, not an anthropologist, and his article was hilarious satire. However, there is something to be said for taking an outside view of the discipline, and such a view might really help non-economists to understand economists. This is where most popular economics books go wrong - they are written by economists.

That is not the case for The Economist's View of the World, written by political scientist Steven E. Rhoads (apparently, not to be confused with the current New York senator Steven Rhoads - thanks Google). Rhoads is professor emeritus at the University of Virginia, where he taught economics to students in public administration. So, not only does he know economics well, but he also brings an outsider view (of sorts) to the subject. The book was first published in 1985, but I read the revised and updated "35th Anniversary Edition", which was published in 2021, and includes reference to more recent developments such as the rise of behavioural economics, the increasing salience of income inequality, and current policy debates.

The book starts with a non-technical introduction to three important concepts in economics: (1) opportunity cost; (2) marginalism; and (3) incentives. Those topics would also tend to be at the start of introductory economics textbooks as well. However, Rhoads doesn't get bogged down in the details of theories and models, and instead focuses on applications and illustrations. For example, on the issue of incentives:

To be sure, policy makers should be careful not to just implement the first incentive that comes to mind. To do so means to risk the fate of the poor little town of Abruzzi, Italy. The city was plagued by vipers, and the city fathers determined to solve the problem by offering a reward for any viper killed. Alas, the supply of vipers increased. Townspeople had started breeding them in their basements...

This is another example of the 'cobra effect' (which I have written about here and here), interestingly also involving snakes. After these initial chapters on the basics of the economic way of thinking, the book then pivots to a more policy focus, with a strong emphasis on outlining economists' perspectives on various aspects of public policy. Like the initial chapters, this section presents the view of an outsider explaining economics to other outsiders, and is mostly successful at doing so. Consider this bit on profit-taking by middlemen:

Some readers will be skeptical: what about the unfair way that companies scoop profits from the people who actually produce the products? Look at farmers; they get a fraction of the profits from their hard work...

Economists offer reasonable defenses for all of these much-maligned groups... Middlemen are a further development of the wealth-increasing division of labor. If they do not provide services worth their costs, retailers that do not use them will put out of business not just the middlemen but also the retailers they supply.

The policy aspects of this middle section of the book are accompanied by a big dose of public choice theory, which is not something that I am accustomed to seeing in popular economics books. It is no doubt why this book is very popular among libertarian economists. However, as the book increases the policy and public choice aspects, it loses the some of the dispassionate outlining of economists' perspectives. Later sections increasingly come across more as an attempt to convince the reader of those perspectives. This is where the popular economics books lose their audience, and I'm sad to say I think Rhoads also falls into this trap.

My other gripe about the book is that it is very US-centric. The book is not so much presenting the economist's view of the world, but rather the economist's view of the US. The rest of the world barely rates a mention. While Rhoads is clearly tailoring the book to a US audience, it does tend to present economists as favouring more libertarian and small-government outcomes to an extent that is not necessarily apparent among economists around the rest of the world.

The final section of the book is a gentle critique of economics, with a particular focus on measurement of wellbeing, and on political deliberation. Unlike other books that present critiques (such as the books I reviewed here and here), Rhoads does not rely on strawman arguments that bear little resemblance to economics as it is actually practiced. For that reason, his critiques can and should be taken much more seriously. This section could easily have been expanded (perhaps to an entire book), but nevertheless it seemed like a sensible way to finish.

I enjoyed reading this book, but the overly US-centric approach turned me off and I wouldn't recommend it to non-US readers. For those readers outside the US, despite their lack of outsider perspective, population economics books by Tim Harford (for example see here) or Diane Coyle (for example, see here) would be much better.

Monday, 17 March 2025

The potential contribution of generative AI to journal peer review

As the Managing Editor of a journal (the Australasian Journal of Regional Studies), I have been watching the artificial intelligence space with interest. One thing that AI could easily be used for is peer review. So far, I haven't seen any evidence that reviewers for my journal have been using AI to complete their reviews, but I know that it is becoming increasingly common (and was something that I did observe as a member of the Marsden Social Sciences panel, before it was disbanded). 

What could be so bad about AI completing peer review of research? The truth is that we don't know the answer to that question. As editors and researchers, we may have concerns about whether AI would do a quality job, whether it would be biased for or against certain types of research and certain types of researchers. But really, there hasn't been a lot of empirical support for these concerns, or against them.

That's why I was really interested to read two papers that contribute to our understanding of AI in peer review. The first is this 2024 article by Weixin Liang (Stanford University) and co-authors, published in the journal NEJM AI (ungated earlier version here). Their dataset of human feedback was based on over 8700 reviews of over 3000 accepted papers from Nature family journals (which had published their peer review reports), and over 6500 reviews of 1700 papers from the International Conference on Learning Representations (ICLR), a large machine learning conference (and where the authors had access to review reports for accepted as well as rejected papers). They quantitatively compared the human feedback with feedback generated by GPT-4.

Liang et al. found that, for the Nature journal dataset:

More than half (57.55%) of the comments raised by GPT-4 were raised by at least one human reviewer... This suggests a considerable overlap between LLM feedback and human feedback, indicating potential accuracy and usefulness of the system. When comparing LLM feedback with comments from each individual reviewer, approximately one third (30.85%) of GPT-4 raised comments overlapped with comments from an individual reviewer... The degree of overlap between two human reviewers was similar (28.58%), after controlling for the number of comments...

For the ICLR dataset, the results were similar, but the nature of the data allowed for more nuance:

Specifically, papers accepted with oral presentations (representing the top 5% of accepted papers) have an average overlap of 30.63% between LLM feedback and human feedback comments. The average overlap increases to 32.12% for papers accepted with a spotlight presentation (the top 25% of accepted papers), while rejected papers bear the highest average overlap at 47.09%. A similar trend was observed in the overlap between two human reviewers: 23.54% for papers accepted with oral presentations (top 5% accepted papers), 24.52% for papers accepted with spotlight presentations (top 25% accepted papers), and 43.80% for rejected papers.

So, GPT-4 was very good at identifying the worst papers (those that should be rejected), and had a similar extent of overlap in comments with a human reviewer as another human reviewer would. Turning to the types of comments, Liang et al. find that:

LLM comments on the implications of research 7.27 times more frequently than humans do. Conversely, LLM is 10.69 times less likely to comment on novelty than humans are... This variation highlights the potential advantages that a human-AI collaboration could provide. Rather than having LLM fully automate the scientific feedback process, humans can raise important points that LLM may overlook. Similarly, LLM could supplement human feedback by providing more comprehensive comments.

The takeaway message here is that GPT-4 is not really a substitute for a human reviewer, but is a useful complement to human reviewing. Finally, Liang et al. conducted a survey of 308 researchers across 110 US universities, who could upload some research and receive AI feedback. As Liang et al. explain:

Participants were surveyed about the extent to which they found the LLM feedback helpful in improving their work or understanding of a subject. The majority responded positively, with over 50.3% considering the feedback to be helpful, and 7.1% considering it to be very helpful... When compared with human feedback, while 17.5% of participants considered it to be inferior to human feedback, 41.9% considered it to be less helpful than many, but more helpful than some human feedback. Additionally, 20.1% considered it to be about the same level of helpfulness as human feedback, and 20.4% considered it to be even more helpful than human feedback...

In line with the helpfulness of the system, 50.5% of survey participants further expressed their willingness to reuse the system...

And interestingly:

Another participant wrote, “After writing a paper or a review, GPT could help me gain another perspective to re-check the paper.”

I hadn't really considered running my research papers through generative AI to see if it could provide feedback. However, now that I've heard about it, it is completely obvious that I should do so. And so should other researchers. It's a low-cost form of internal feedback. Indeed, Liang et al. conclude that:

...LLM feedback should be primarily used by researchers identify areas of improvements in their manuscripts prior to official submission.

The second paper is this new working paper by Pat Pataranutaporn (MIT), Nattavudh Powdthavee (Nanyang Technological University), and Pattie Maes (MIT). They undertook an experimental evaluation of AI peer review of economics research articles, in order to determine the ability of AI to distinguish the quality of research, and whether it would be biased by non-quality characteristics of the papers it reviewed.

To do this, Pataranutaporn et al.:

...randomly selected three papers each from Econometrica, Journal of Political Economy, and Quarterly Journal of Economics (“high-ranked journals” based on RePEc ranking) and three each from European Economic Review, Economica, and Oxford Bulletin of Economics and Statistics (“medium-ranked journals”). Additionally, we randomly selected three papers from each of the three lower-ranked journals not included in the RePEc ranking—Asian Economic and Financial Review, Journal of Applied Economics and Business, and Business and Economics Journal (“low-ranked journals”). To complete the dataset, we included three papers generated by GPT-o1 (“fake AI papers”), designed to match the standards of papers published in top-five economics journals.

They then:

...systematically varied each submission across three key dimensions: authors’ affiliation, prominence, and gender. For affiliation, each submission was attributed to authors affiliated with: i) top-ranked economics departments in the US and UK, including Harvard University, Massachusetts Institute of Technology (MIT), London School of Economics (LSE), and Warwick University, ii) leading universities outside the US and Europe, including Nanyang Technological University (NTU) in Singapore, University of Tokyo in Japan, University of Malaya in Malaysia, Chulalongkorn University in Thailand, and University of Cape Town in South Africa... and iii) no information about the authors’ affiliation, i.e., blind condition.

To introduce variation in academic reputation, we replaced the original authors of the base articles with a new set of authors categorized into the following groups: (i) prominent economists—the top 10 male and female economists from the RePEc top 25% list; (ii) lower-ranked economists—individuals ranked near the bottom of the RePEc top 25% list; (iii) non-academic individuals—randomly generated names with no professional affiliation; and (iv) anonymous authorship—papers where author names were omitted. For non-anonymous authorship, we further varied each submission by gender, ensuring an equal split (50% male, 50% female). Combining these variations resulted in 9,030 unique papers, each with distinct author characteristics...

Pataranutaporn et al. then asked GPT4o-mini to evaluated each of the 9030 unique papers across a number of dimensions, including whether it would be accepted or rejected at a top-five journal, the reviewer recommendation, the predicted number of citations, whether the paper would attract research funding, result in a research award, strengthen an application for tenure, and be part of a research agenda worthy of a Nobel Prize in economics. They found that:

...LLM is highly effective at distinguishing between submissions published in low-, medium-, and high-quality journals. This result highlights the LLM’s potential to reduce editorial workload and expedite the initial screening process significantly. However, it struggles to differentiate high-quality papers from AI-generated submissions crafted to resemble “top five” journal standards. We also find compelling evidence of a modest but consistent premium—approximately 2–3%—associated with papers authored by prominent individuals, male economists, or those affiliated with elite institutions compared to blind submissions. While these effects might seem small, they may still influence marginal publication decisions, especially when journals face binding constraints on publication slots.

So, on the one hand the AI tool does do a good job of identifying high-quality submissions. However, it can't tell them apart from high-quality AI-generated submissions. And, it has a small but statistically significant bias towards male authors. Both of these latter points are worrying, but again they suggest that a combination of human and AI reviewers might be a suitable path forward.

Pataranutaporn et al.'s paper is focused on solving a "peer review crisis". It has become increasingly difficult to find peer reviewers who are willing to spend the time to generate a high-quality review that will in turn help to improve the quality of published research. Generative AI could help to alleviate this, but we're clearly not entirely there yet. There is still an important role for humans in the peer review process, at least for now.

[HT: Marginal Revolution, for the Pataranutaporn et al. paper]