Sunday, 30 March 2025

Another study of MasterChef that doesn't tell us much because of survivorship bias

Data from sports and games can tell us a lot about decision-making and behaviour. That's because the framework within which decisions are made, and behaviour takes place, is well defined by the rules of the sport or game. That's why I really like to read studies in sports economics, and often post about them here. I also like to read studies that use data from game shows, where the framework is clearly defined.

While those sorts of studies can tell us a lot, they still need to be executed well, and unfortunately, that isn't always the case. Consider this post, where I outlined a clear problem of survivorship bias in the analysis of a paper using data from MasterChef. Sadly, that paper is not alone as one of the authors, Alberto Chong (Georgia State University) has made a similar mistake in a follow-up paper, again using the same dataset from MasterChef.

This new paper intends to look at the relationship between exposure to anger and performance. As Chong explains:

Being exposed to anger in others may provide a burst of energy and increase focus and determination, which maybe translated into increased performance. However, the opposite may also be true. Exposure to anger in others may cloud judgment, impair decision-making, and may end up decreasing performance. In short, understanding whether the link between these two variables is positive or negative is an empirical question.

And if you've ever watched MasterChef (the US version), you will know that anger is a key feature of the series. For example:

So, Chong looks at whether exposure to the angry reactions of the judges affects contestants' performance overall, including their final placement, as well as the number of challenges they placed in the top three, their probability of placing in the top three, and their probability of winning. The dataset covers all seasons of MasterChef from 2010 to 2020. Exposure to anger is measured as "the number of times that any of the contestants have been exposed to anger by any of the judges". Chong finds that:

...people who are exposed to anger appear to react positively to anger by improving their final placement in the competition likely as a result of increased focus and determination. In particular, we find that it is associated with contestants improving around 1.5 placement positions or higher in the final standings. We also find that the probability of winning the competition increases by around 2.2 percent.

However, there is a problem, and that problem is survivorship bias. Contestants who remain in the show for longer have more opportunity to be exposed to anger from the judges. So, even if angry reactions are completely randomly assigned to contestants, those who survive for more episodes will both attract more angry reactions and have a higher placing overall. There is a mechanistic relationship that drives a negative correlation between placing in the show and exposure to anger. The analysis needs to condition the exposure to anger on the number of opportunities for judges to be angry. So, rather than the number of times exposed to anger, the key explanatory variable should be the proportion of times the contestant is exposed to anger.

Now, in my previous post on analysis of this dataset I demonstrated using some randomly generated data why survivorship bias was a problem. I'm not going to do that this time, because the issue is substantively the same (even if the specific numbers will be different). However, as I noted then, this study is crying out for a replication along with the other one, and together they would make a great project for a motivated Honours or Masters student. Then these studies might live up to the ideal of telling us something about decision-making and behaviour.

[HT: Marginal Revolution]

Friday, 28 March 2025

This week in research #68

Here's what caught my eye in research over the past week (which, it seems, was a very quiet week!):

  • Baker et al. (ungated preprint) provide an excellent overview of difference-in-difference research designs (which I have referred to many times on this blog), including all of the issues that researchers need to be aware of when using this research design

Thursday, 27 March 2025

First results on the non-binary gender earnings gap in New Zealand

The gender gap in pay between men and women is well known. Much less is known about the gender gap (if any) between cisgender men and women, and gender diverse people. However, this new article by Christopher Carpenter (Vanderbilt University) and co-authors, published in the journal Economics Letters (open access, with non-technical summary here), gives us a starting point using novel data from New Zealand. Carpenter et al. make use of Stats NZ's Integrated Data Infrastructure (IDI), which links various administrative datasets. Specifically:

We use NZ Department of Internal Affairs (DIA) birth records to identify birth record sex, which only allow two options: ‘male’ or ‘female’. Next, we link DIA birth records with the NZTA Driver License Register and restrict our sample to individuals who had their driver license registration/renewal in 2021 or after when the NZTA driver license application allowed identification of men, women, and gender diverse people... We compare driver license gender with birth record sex to identify cisgender people (those whose birth record sex matches their driver license recorded gender), transgender people (those whose birth record sex does not match their driver license register gender and whose driver license register gender is either male or female), and gender diverse people (those whose driver license register gender indicates gender diverse).

This is a really smart approach to identifying gender diverse and transgender individuals in the administrative data. It will tend towards false negatives, because not everyone has a driver's licence, and not every gender diverse or transgender person will change their gender on the driver's licence. However, Carpenter et al. are up front about the measurement error that this creates.

Carpenter et al. then look at demographic and other characteristics and at labour market outcomes by gender, comparing transgender and gender diverse people with cisgender people, focusing on whether each person is NEET (not in employment, education, or training), and their taxable income reported to Inland Revenue. In terms of demographics, they find that:

...relative to cisgender women, transgender men are younger, less likely to be of European descent, less likely to be married or in a civil union, less likely to have children, more likely to live in Auckland or Wellington, less likely to have a tertiary qualification, and more likely to have a mental health prescription... gender diverse individuals whose birth record sex is female... are younger, less likely to be married, less likely to have had any children, more likely to have a mental health prescription, and more likely to be NEET than both transgender men and cisgender women. Regarding education, gender diverse individuals whose birth record sex is female are more likely than transgender men but less likely than cisgender women to have a tertiary qualification...

Relative to cisgender men, transgender women are younger, less likely to be of European descent, less likely to be married or in a civil union, less likely to have children, more likely to live in Auckland or Wellington, and more likely to have a mental health prescription than cisgender men... gender diverse individuals whose birth record sex is male... are younger, more likely to have tertiary education, and more likely to have a mental health prescription than both transgender women and cisgender men. Gender diverse individuals whose birth record sex is male are much more similar to transgender women than they are to cisgender men with respect to marital status, presence of children, and residence in Auckland or Wellington.

Interesting stuff, although superseded by data coming out of the 2023 Census, which for the first time collected comprehensive data on gender and sexual identity (more on that in a moment). Turning to labour market outcomes, Carpenter et al. find:

...strong evidence that gender minorities in New Zealand are much more likely to be NEET than otherwise similar cisgender people. We estimate that transgender women, gender diverse individuals whose birth record sex is male, and gender diverse individuals whose birth record sex is female are 10–12 percentage points more likely to be NEET than similarly situated cisgender men...

Turning to earnings... we again find that gender minorities earn significantly less than cisgender men with similar observable characteristics. Here, however, the differences for cisgender women – which indicate precise earnings gaps of about 33 % – are similar in magnitude to those estimated for transgender women and transgender men. In contrast, gender diverse individuals whose birth record sex is male and gender diverse individuals whose birth record sex is female both experience significantly larger earnings gaps compared to both cisgender men and cisgender women.

Those earnings gaps for gender diverse people are both over 50 percent. I don't think anyone will be particularly surprised by these results. It has long been suspected that gender diverse people face an earnings penalty, but there has been a lack of data to support this. However, the novel approach by Carpenter et al. has helped to fill in that particular research gap. The next step though, surely, must be to take advantage of the 2023 Census data, which gives much more detail on gender and sexual identity, with what is likely to be far less measurement error. According to the public Stats NZ data, there were over 17,000 people who were 'another gender' (other than male or female) in the 2023 Census (you can find this by browsing Aotearoa Data Explorer for 2023 Census data on gender). However, to disaggregate between different non-binary genders from there requires access to the data in the IDI. It will be interesting to see when the first analyses of that data come out. I'm very sure someone will be looking at it already.

Wednesday, 26 March 2025

GPT-4 tells us how literary characters would play the dictator game

Have you ever wondered what it would be like to interact with your favourite literary characters? What interesting conversation might we have with Elizabeth Bennet or Clarissa Dalloway? Or, who would win if we played a game of monopoly with Ebenezer Scrooge or Jay Gatsby? Large language models like ChatGPT can provide us with a partial answer to that question, because they can be prompted to take on any persona. And because of the wealth of information available in their training data, LLMs are likely to be very convincing at cosplaying famous literary characters.

So, I was really interested to read this new article by Gabriel Abrams (Sidwell Friends High School), published in the journal Digital Scholarship in the Humanities (ungated earlier version here). Abrams asked GPT-4 to play the role of a large number of famous literary characters when playing the 'dictator game'. To review, in the dictator game the player is given an amount of money, and can choose how much of that money to keep for themselves, and how much to give to another player. Essentially, the dictator game provides an estimate of fairness and altruism.

Abrams first asked GPT-4 to identify the 25 most well-known fictional characters in each century from the 17th Century to the 21st Century. Then, for each character, Abrams asked GPT-4 to play the dictator game, as well as to identify the particular personality traits that would affect the character's decision in the game. Abrams then took each personality trait and asked GPT-4 to assign it a valence (positive, neutral, or negative). Finally, Abrams summarised the results by Century, finding that:

There is a general and largely monotonic decrease in selfish behavior over centuries for literary characters. Fifty per cent of the decisions of characters from the 17th century are selfish compared to just 19 per cent from the 21st century...

Humans are more selfish than the AI characters with 51 per cent of humans making selfish decisions compared to 28 per cent of the characters...

So, over time literary characters have become less selfish, but overall the characters are more selfish than real humans. An interesting question, which can't be answered with this data, is whether the change in selfishness also reflects a decrease in selfishness in the population generally (because the selfishness of humans was measured in the 21st Century only). Interestingly, looking at personality traits:

Modeled characters’ personality traits generally have a strong positive valence. The weighted average valence across the 262 personality traits was a surprisingly high +0.47...

I associate many literary figures with their negative traits, and less so with positive traits. Maybe that's just me. Or maybe, the traits that GPT-4 thought were most relevant to the choice in the dictator game tended to be more positive traits. Given that the dictator game is really about altruism and fairness, then that might explain it. Over time, there hasn't been a clear trend in valence:

The 21st century had the highest valence at +0.74... The least positive centuries were the 17th and 19th with +0.28 and +0.29, respectively...

Abrams then turned to the specific personality traits, identifying the traits that were more common (overweighted) or less common (underweighted) in each century, compared with overall. This is summarised in Table 6 from the paper:

There are some interesting changes there, with empathetic shifting from being the most underweighted trait to being the most overweighted trait, while manipulative shifts in the opposite direction (from most overweighted to third-most underweighted). Interesting, and not necessarily what I would have expected. Abrams concludes that:

The Shakespearean characters of the 17th century make markedly more selfish decisions than those of Dickens, Dostoevsky, Hemingway and Joyce, who in turn are more selfish than those of Ishiguro and Ferrante in the 21st century.

Historical literary characters have a surprisingly strong net positive valence. It is possible that there is some selection bias. For instance, scholars or audiences may make classics of books with mainly attractive characters.

That makes sense. One thing that I found missing in the paper was a character-level assessment. It would have been interesting to see the results for favourite (and least favourite) characters individually, and see how they compare with what we might have expected. That could have been added to supplementary materials for the paper, and would have been an interesting read.

Nevertheless, this paper was an interesting exploration of just some of what LLMs can be used for in research. As I've noted before, LLMs have essentially killed off online data collection using tools like mTurk, because the mTurkers may simply use LLMs to respond to the survey or experiment. Researchers can now cut out the middleman, and use LLMs directly to cosplay for research participants based on any collection of characteristics (age, gender, ethnicity, location, etc.). The big question now is, when LLMs are used in this way, is some of the real underlying variation in human responses lost (because LLMs will tend to give a 'median' response for the group they are cosplaying)? The answer to that question will become clear as researchers continue on this path.

[HT: Marginal Revolution, back in 2023]

Monday, 24 March 2025

New Zealand's supermarket sector needs a hero

In yesterday's post, I discussed market power and competition, noting that when there is a lack of competition, firms have more market power, and that means higher mark-ups and higher prices for consumers. An example of a market where there appears to be a high degree of market power is the supermarket sector in New Zealand.

It wasn't always this way. When I was young, I remember there being a large number of different supermarket brands. In Tauranga in the mid-1990s, along Cameron Road between the CBD and Greerton there was Price Chopper (which was previously 3 Guys), Pak'nSave, Big Fresh, Foodtown, New World, and Countdown (and there may be others that I've forgotten, as well as several smaller superettes).

One of the main ways that a market ends up highly concentrated is to start with a market that has some degree of competition, but then some of the firms merge (or take each other over), leaving fewer firms and less competition. In the context of supermarkets in New Zealand, this process is outlined in this article in The Conversation by Lisa Asher, Catherin Sutton-Brady (both University of Sydney), and Drew Franklin (University of Auckland):

The current state of New Zealand’s supermarket sector – dominated by Woolworths (formerly Countdown), Foodstuffs North Island and Foodstuffs South Island – is a result of successive mergers and acquisitions along two tracks.

The first was Progressive Enterprises’ (owner of Foodtown, Countdown and 3 Guys banners) purchase of Woolworths New Zealand (which also owned Big Fresh and Price Chopper) in 2001.

Progressive Enterprises was sold to Woolworths Australia, its’ current owner, in 2005. In less than 25 years, six brands owned by multiple companies were whittled down to a single brand, Woolworths.

The second was the concentration of the “Foodstuffs cooperatives” network. This network once included four regional cooperatives and multiple banners including Mark'n Pak and Cut Price, as well as New World, PAK’nSave and Four Square.

The decision of the four legally separate cooperatives to include “Foodstuffs” in their company name blurred the lines between them. The companies looked similar but remained legally separate.

As a result of mergers, these four separate companies have now become Foodstuffs North Island – franchise limited share company, operating according to “cooperative principles” and Foodstuffs South Island, a legal cooperative.

And so now we find ourselves in a situation with three large supermarket firms, two of which (Foodstuffs North Island and Foodstuffs South Island) are effectively two arms of the same firm, and certainly aren't competing with each other because they operate only on their 'own' islands. With such a lack of competition many people, including Asher et al., are clamouring for change.

Increasing competition in the supermarket sector could take one of two forms. Asher et al. argue that inviting an international competitor into the market will take too long, citing the example of Aldi in Australia, which "took 20 years to reach scale as a third major player in that country". Their preference is 'forced divestiture', breaking up the existing supermarkets into smaller competing firms. Essentially, this would be something of a return to the situation prior to some of the mergers that have characterised the past 30 years of the supermarket sector in New Zealand, but would require a drastic legislative intervention from government.

However, before the government imposes such a dramatic change on this market, it really needs some solid analysis of the impacts of the change. Large supermarket firms benefit from economies of scale in purchasing, logistics and distribution, as well as back-office functions (like payroll, marketing, and finance). If smaller supermarket firms face higher costs because they can't take advantage of the economies of scale available to larger supermarket firms, then breaking up the supermarket chains into smaller chains could lead to even higher prices for consumers. On the other hand, smaller supermarket chains have less bargaining power with suppliers, which might mean that the supermarket suppliers receive better prices (but again, that means higher prices for consumers). Without some careful economic modelling, which has not been done to date, we can't make a clear-eyed assessment of the likely net change in consumer and producer welfare.

And we should be cautious. If forced divestiture starts to gain some political traction, you can bet that the supermarket chains will release some economic analysis that supports a position that breaking them up will be worse for consumers. And consumer advocates might even be able to support their own analyses, showing the opposite. What is needed is a truly independent assessment. And before you raise it, I doubt that we get that sort of independent assessment from the Commerce Commission. They know that their bills are paid by the government of the day, and they may respond accordingly.

Asher et al. ask us to "stop waiting of a foreign hero". What we need is an economist hero, with an independent analysis of the supermarket sector in hand.

Read more:

Sunday, 23 March 2025

Market power, competition, and the collapses of Bonza and Rex

Last week, my ECONS101 class covered market power and competition (as part of a larger topic introducing some of the principles of firm behaviour). This coming week, we'll be covering elasticity, which is closely related and builds on the key ideas of firm behaviour.

Market power is the ability of a seller (or sometimes a buyer) to influence market prices. The greater the amount of market power the seller (or buyer) has, the more they can raise their price above marginal cost. That is, sellers with greater market power will have a higher mark-up (which is the difference between price and marginal cost).

How do firms get market power? There are several ways, but the greatest contributor to market power is the extent of competition in the market. When firms face a lot of competition in their market, they will compete vigorously on price, and so their mark-up will be lower. When firms face less competition in their market, they don't have to compete on price to the same degree, and so their mark-up will be higher.

Another way of seeing this is to consider the price elasticity of demand. When there are many substitutes for a good, the demand for that good will be more elastic. If the seller raises their price, many of their consumers will buy (one of the many) substitutes instead (because those substitutes are now relatively cheaper). So, firms selling a good that has many substitutes (a good that has more elastic demand) will have a lower mark-up. And if a firm's good has many substitutes, that means a lot of competition.

On the other hand, when there are few substitutes for a good, the demand for that good will be less elastic. If the seller raises their price, few of their consumers will buy substitutes instead (because there are few substitutes available). So, firms selling a good that has few substitutes (a good that has less elastic demand) will have a higher mark-up. And if a firm's good has few substitutes, that means less competition.

Taking all of this together, when a market loses one or more of the competitors, and competition reduces, we should expect to see an increase in prices. A clear example of this is what happened when the Australian regional airlines Bonza and Rex closed down last year. As Doug Drury (Central Queensland University) wrote in The Conversation last November:

In 2024 alone, we’ve seen the high-profile collapse of both Bonza and Rex, airlines that once ignited hopes for much greater competition in the sector. Now, we’re beginning to see the predictable effects of their exit.

According to a quarterly report released on Tuesday by the Australian Competition and Consumer Commission (ACCC), domestic airfares on major city routes increased by 13.3% to September after Rex Airlines halted its capital city services at the end of July.

The collapse of the two low-cost domestic airlines in Australia reduced competition on domestic routes. Unsurprisingly, the lower competition means more market power for the remaining airlines, Qantas and Jetstar. And that greater market power has translated into higher prices for domestic airfares in Australia.

The importance of competition for prices cannot be overstated. As one other example, a lack of competition has been implicated in the perceived high prices in New Zealand supermarkets (a point I will come back to in my next post, but this is a topic I have written about before here). The Commerce Commission is understandably concerned whenever there is a lack of competition, and whenever competition will be substantially reduced by the merger of two or more firms (for example, see here and here about the recently rejected Foodstuffs supermarket merger). When there is a lack of competition, sellers have more market power, demand will be less elastic, and for both of those reasons we can expect prices to be higher.

Friday, 21 March 2025

This week in research #67

Here's what caught my eye in research over the past week:

  • Schreyer and Singleton (open access) find that Cristiano Ronaldo increased stadium attendance in the Saudi league, by an additional 20% of the seats in his home team's stadium when he played, 15% in the stadiums he visited, and by 3% where he did not even play
  • Harrison and Glaser find that laws that allow breweries to bypass distributors lead to higher brewery output and employment, and that this is primarily driven by a greater market entry of breweries
  • Ankel‐Peters, Fiala, and Neubauer (open access) review the impact of replications published as comments in the American Economic Review between 2010 and 2020, and find that the comments are barely cited, and they do not affect the original paper's citations, even when the replication diagnoses substantive problems with the original paper (does this show the level of revealed preference for replications?)

Wednesday, 19 March 2025

Book review: The Economist's View of the World

In 1973, the Western Economic Journal published a now-famous article (ungated here) by Axel Leijonhufvud (UCLA), entitled "Life Among the Econ". The article was an ethnographic study of the Econ tribe, and included such gems as:

Almost all of the travellers' reports that we have comment on the Econ as a "quarrelsome race" who "talk ill of their fellow behind his back," and so forth. Social cohesion is apparently maintained chiefly through shared distrust of outsiders.

And:

The young Econ, or "grad," is not admitted to adulthood until he has made a "modl" exhibiting a degree of workmanship acceptable to the elders of the "dept" in which he serves his apprenticeship. 

Obviously, Leijonhufvud's was an economist, not an anthropologist, and his article was hilarious satire. However, there is something to be said for taking an outside view of the discipline, and such a view might really help non-economists to understand economists. This is where most popular economics books go wrong - they are written by economists.

That is not the case for The Economist's View of the World, written by political scientist Steven E. Rhoads (apparently, not to be confused with the current New York senator Steven Rhoads - thanks Google). Rhoads is professor emeritus at the University of Virginia, where he taught economics to students in public administration. So, not only does he know economics well, but he also brings an outsider view (of sorts) to the subject. The book was first published in 1985, but I read the revised and updated "35th Anniversary Edition", which was published in 2021, and includes reference to more recent developments such as the rise of behavioural economics, the increasing salience of income inequality, and current policy debates.

The book starts with a non-technical introduction to three important concepts in economics: (1) opportunity cost; (2) marginalism; and (3) incentives. Those topics would also tend to be at the start of introductory economics textbooks as well. However, Rhoads doesn't get bogged down in the details of theories and models, and instead focuses on applications and illustrations. For example, on the issue of incentives:

To be sure, policy makers should be careful not to just implement the first incentive that comes to mind. To do so means to risk the fate of the poor little town of Abruzzi, Italy. The city was plagued by vipers, and the city fathers determined to solve the problem by offering a reward for any viper killed. Alas, the supply of vipers increased. Townspeople had started breeding them in their basements...

This is another example of the 'cobra effect' (which I have written about here and here), interestingly also involving snakes. After these initial chapters on the basics of the economic way of thinking, the book then pivots to a more policy focus, with a strong emphasis on outlining economists' perspectives on various aspects of public policy. Like the initial chapters, this section presents the view of an outsider explaining economics to other outsiders, and is mostly successful at doing so. Consider this bit on profit-taking by middlemen:

Some readers will be skeptical: what about the unfair way that companies scoop profits from the people who actually produce the products? Look at farmers; they get a fraction of the profits from their hard work...

Economists offer reasonable defenses for all of these much-maligned groups... Middlemen are a further development of the wealth-increasing division of labor. If they do not provide services worth their costs, retailers that do not use them will put out of business not just the middlemen but also the retailers they supply.

The policy aspects of this middle section of the book are accompanied by a big dose of public choice theory, which is not something that I am accustomed to seeing in popular economics books. It is no doubt why this book is very popular among libertarian economists. However, as the book increases the policy and public choice aspects, it loses the some of the dispassionate outlining of economists' perspectives. Later sections increasingly come across more as an attempt to convince the reader of those perspectives. This is where the popular economics books lose their audience, and I'm sad to say I think Rhoads also falls into this trap.

My other gripe about the book is that it is very US-centric. The book is not so much presenting the economist's view of the world, but rather the economist's view of the US. The rest of the world barely rates a mention. While Rhoads is clearly tailoring the book to a US audience, it does tend to present economists as favouring more libertarian and small-government outcomes to an extent that is not necessarily apparent among economists around the rest of the world.

The final section of the book is a gentle critique of economics, with a particular focus on measurement of wellbeing, and on political deliberation. Unlike other books that present critiques (such as the books I reviewed here and here), Rhoads does not rely on strawman arguments that bear little resemblance to economics as it is actually practiced. For that reason, his critiques can and should be taken much more seriously. This section could easily have been expanded (perhaps to an entire book), but nevertheless it seemed like a sensible way to finish.

I enjoyed reading this book, but the overly US-centric approach turned me off and I wouldn't recommend it to non-US readers. For those readers outside the US, despite their lack of outsider perspective, population economics books by Tim Harford (for example see here) or Diane Coyle (for example, see here) would be much better.

Monday, 17 March 2025

The potential contribution of generative AI to journal peer review

As the Managing Editor of a journal (the Australasian Journal of Regional Studies), I have been watching the artificial intelligence space with interest. One thing that AI could easily be used for is peer review. So far, I haven't seen any evidence that reviewers for my journal have been using AI to complete their reviews, but I know that it is becoming increasingly common (and was something that I did observe as a member of the Marsden Social Sciences panel, before it was disbanded). 

What could be so bad about AI completing peer review of research? The truth is that we don't know the answer to that question. As editors and researchers, we may have concerns about whether AI would do a quality job, whether it would be biased for or against certain types of research and certain types of researchers. But really, there hasn't been a lot of empirical support for these concerns, or against them.

That's why I was really interested to read two papers that contribute to our understanding of AI in peer review. The first is this 2024 article by Weixin Liang (Stanford University) and co-authors, published in the journal NEJM AI (ungated earlier version here). Their dataset of human feedback was based on over 8700 reviews of over 3000 accepted papers from Nature family journals (which had published their peer review reports), and over 6500 reviews of 1700 papers from the International Conference on Learning Representations (ICLR), a large machine learning conference (and where the authors had access to review reports for accepted as well as rejected papers). They quantitatively compared the human feedback with feedback generated by GPT-4.

Liang et al. found that, for the Nature journal dataset:

More than half (57.55%) of the comments raised by GPT-4 were raised by at least one human reviewer... This suggests a considerable overlap between LLM feedback and human feedback, indicating potential accuracy and usefulness of the system. When comparing LLM feedback with comments from each individual reviewer, approximately one third (30.85%) of GPT-4 raised comments overlapped with comments from an individual reviewer... The degree of overlap between two human reviewers was similar (28.58%), after controlling for the number of comments...

For the ICLR dataset, the results were similar, but the nature of the data allowed for more nuance:

Specifically, papers accepted with oral presentations (representing the top 5% of accepted papers) have an average overlap of 30.63% between LLM feedback and human feedback comments. The average overlap increases to 32.12% for papers accepted with a spotlight presentation (the top 25% of accepted papers), while rejected papers bear the highest average overlap at 47.09%. A similar trend was observed in the overlap between two human reviewers: 23.54% for papers accepted with oral presentations (top 5% accepted papers), 24.52% for papers accepted with spotlight presentations (top 25% accepted papers), and 43.80% for rejected papers.

So, GPT-4 was very good at identifying the worst papers (those that should be rejected), and had a similar extent of overlap in comments with a human reviewer as another human reviewer would. Turning to the types of comments, Liang et al. find that:

LLM comments on the implications of research 7.27 times more frequently than humans do. Conversely, LLM is 10.69 times less likely to comment on novelty than humans are... This variation highlights the potential advantages that a human-AI collaboration could provide. Rather than having LLM fully automate the scientific feedback process, humans can raise important points that LLM may overlook. Similarly, LLM could supplement human feedback by providing more comprehensive comments.

The takeaway message here is that GPT-4 is not really a substitute for a human reviewer, but is a useful complement to human reviewing. Finally, Liang et al. conducted a survey of 308 researchers across 110 US universities, who could upload some research and receive AI feedback. As Liang et al. explain:

Participants were surveyed about the extent to which they found the LLM feedback helpful in improving their work or understanding of a subject. The majority responded positively, with over 50.3% considering the feedback to be helpful, and 7.1% considering it to be very helpful... When compared with human feedback, while 17.5% of participants considered it to be inferior to human feedback, 41.9% considered it to be less helpful than many, but more helpful than some human feedback. Additionally, 20.1% considered it to be about the same level of helpfulness as human feedback, and 20.4% considered it to be even more helpful than human feedback...

In line with the helpfulness of the system, 50.5% of survey participants further expressed their willingness to reuse the system...

And interestingly:

Another participant wrote, “After writing a paper or a review, GPT could help me gain another perspective to re-check the paper.”

I hadn't really considered running my research papers through generative AI to see if it could provide feedback. However, now that I've heard about it, it is completely obvious that I should do so. And so should other researchers. It's a low-cost form of internal feedback. Indeed, Liang et al. conclude that:

...LLM feedback should be primarily used by researchers identify areas of improvements in their manuscripts prior to official submission.

The second paper is this new working paper by Pat Pataranutaporn (MIT), Nattavudh Powdthavee (Nanyang Technological University), and Pattie Maes (MIT). They undertook an experimental evaluation of AI peer review of economics research articles, in order to determine the ability of AI to distinguish the quality of research, and whether it would be biased by non-quality characteristics of the papers it reviewed.

To do this, Pataranutaporn et al.:

...randomly selected three papers each from Econometrica, Journal of Political Economy, and Quarterly Journal of Economics (“high-ranked journals” based on RePEc ranking) and three each from European Economic Review, Economica, and Oxford Bulletin of Economics and Statistics (“medium-ranked journals”). Additionally, we randomly selected three papers from each of the three lower-ranked journals not included in the RePEc ranking—Asian Economic and Financial Review, Journal of Applied Economics and Business, and Business and Economics Journal (“low-ranked journals”). To complete the dataset, we included three papers generated by GPT-o1 (“fake AI papers”), designed to match the standards of papers published in top-five economics journals.

They then:

...systematically varied each submission across three key dimensions: authors’ affiliation, prominence, and gender. For affiliation, each submission was attributed to authors affiliated with: i) top-ranked economics departments in the US and UK, including Harvard University, Massachusetts Institute of Technology (MIT), London School of Economics (LSE), and Warwick University, ii) leading universities outside the US and Europe, including Nanyang Technological University (NTU) in Singapore, University of Tokyo in Japan, University of Malaya in Malaysia, Chulalongkorn University in Thailand, and University of Cape Town in South Africa... and iii) no information about the authors’ affiliation, i.e., blind condition.

To introduce variation in academic reputation, we replaced the original authors of the base articles with a new set of authors categorized into the following groups: (i) prominent economists—the top 10 male and female economists from the RePEc top 25% list; (ii) lower-ranked economists—individuals ranked near the bottom of the RePEc top 25% list; (iii) non-academic individuals—randomly generated names with no professional affiliation; and (iv) anonymous authorship—papers where author names were omitted. For non-anonymous authorship, we further varied each submission by gender, ensuring an equal split (50% male, 50% female). Combining these variations resulted in 9,030 unique papers, each with distinct author characteristics...

Pataranutaporn et al. then asked GPT4o-mini to evaluated each of the 9030 unique papers across a number of dimensions, including whether it would be accepted or rejected at a top-five journal, the reviewer recommendation, the predicted number of citations, whether the paper would attract research funding, result in a research award, strengthen an application for tenure, and be part of a research agenda worthy of a Nobel Prize in economics. They found that:

...LLM is highly effective at distinguishing between submissions published in low-, medium-, and high-quality journals. This result highlights the LLM’s potential to reduce editorial workload and expedite the initial screening process significantly. However, it struggles to differentiate high-quality papers from AI-generated submissions crafted to resemble “top five” journal standards. We also find compelling evidence of a modest but consistent premium—approximately 2–3%—associated with papers authored by prominent individuals, male economists, or those affiliated with elite institutions compared to blind submissions. While these effects might seem small, they may still influence marginal publication decisions, especially when journals face binding constraints on publication slots.

So, on the one hand the AI tool does do a good job of identifying high-quality submissions. However, it can't tell them apart from high-quality AI-generated submissions. And, it has a small but statistically significant bias towards male authors. Both of these latter points are worrying, but again they suggest that a combination of human and AI reviewers might be a suitable path forward.

Pataranutaporn et al.'s paper is focused on solving a "peer review crisis". It has become increasingly difficult to find peer reviewers who are willing to spend the time to generate a high-quality review that will in turn help to improve the quality of published research. Generative AI could help to alleviate this, but we're clearly not entirely there yet. There is still an important role for humans in the peer review process, at least for now.

[HT: Marginal Revolution, for the Pataranutaporn et al. paper]

Sunday, 16 March 2025

Why tit-for-tat tariffs may not work against Trump

Last week, my ECONS101 class covered game theory. At the end of the final lecture, after we had been covering repeated games and tit-for-tat strategies, a really perceptive student asked me about Trump's tariffs. A lot of the rhetoric about tariffs has been posed in terms of tit-for-tat (see here and here, for example). The student's question got me thinking though, about why a tit-for-tat strategy may not work in this case.

Before we get that far though, we need to think about the tariff game, as outlined in the payoff table below. There are two players: USA and 'Other Country'. Each player has two strategies: high tariffs, or low tariffs (which includes no tariffs). The payoffs are expressed as "+" for good outcomes (and "++" is particularly good), and "--" for bad outcomes (and "--" is particularly bad), while zero is a neutral payoff. 

To find the Nash equilibrium in this game, we use the 'best response method'. To do this, we track: for each player, for each strategy, what is the best response of the other player. Where both players are selecting a best response, they are doing the best they can, given the choice of the other player (this is the definition of Nash equilibrium). In this game, the best responses are:

  1. If the other country chooses high tariffs, USA's best response is to choose high tariffs (since "0" is better than "--" as a payoff) [we track the best responses with ticks, and not-best-responses with crosses; Note: I'm also tracking which payoffs I am comparing with numbers corresponding to the numbers in this list];
  2. If the other country chooses low tariffs, USA's best response is to choose high tariffs (since "++" is better than "+" as a payoff);
  3. If USA chooses high tariffs, the other country's best response is to choose high tariffs (since "0" is better than "--" as a payoff); and
  4. If USA chooses low tariffs, the other country's best response is to choose high tariffs (since "++" is better than "+" as a payoff).

Notice that USA chooses high tariffs no matter what the other country does, high tariffs is a dominant strategy for the USA. Similarly, since the other country chooses high tariffs no matter what the USA does, high tariffs is a dominant strategy for the other country. Both countries will choose to play their dominant strategy (because it is always better than the other strategy, not matter what the other country chooses to do). The outcome where both countries choose high tariffs is the Nash equilibrium in this game (it is also a dominant strategy equilibrium, because both countries have a dominant strategy).

However, I'm sure that you can clearly see that both countries would be better off with low tariffs (since the payoff for each country would be "+", instead of "0"). This game is an example of the prisoner's dilemma (it's a dilemma because, when both countries act in their own best interests, both are made worse off).

However, it is important to remember that this game is a repeated game. It is played more than once, with the same players, and the same strategy choices. When a game is repeated, then the outcome may differ from the equilibrium of the non-repeated game, because the players can learn to work together to obtain the best outcome.

In a repeated prisoners' dilemma game like this, each player can encourage the other to cooperate by using the tit-for-tat strategy. That strategy, identified by Robert Axelrod in the 1980s, works by initially cooperating (low tariffs), and then in each play of game after the first, the player does whatever the other player did last time. So, if the USA chooses high tariffs, then the other country should punish them and choose high tariffs in the next play of the game. And if the USA chooses low tariffs, then the other country should reward them and choose low tariffs in the next play of the game. The tit-for-tat strategy works because it encourages the other player to cooperate. And that is what many people have been expecting of Trump. If you punish the USA for high tariffs by setting your own high tariffs, eventually they will realise their error and start cooperating with low tariffs again.

But there is a problem. The whole edifice of the tit-for-tat strategy assumes that Trump knows that he is playing the game outlined above, where there is an unambiguously better outcome for both countries, if they both choose low tariffs. By his past statements, it is absolutely clear that Trump thinks that trade is a zero-sum game (for example, see here or here).

So, what does the game look like if you believe that trade is zero-sum? Instead of there being gains from the low-tariff/low-tariff outcome, the payoffs become zero, as shown in the payoff table below.

Solving this game for the Nash equilibrium, the best responses are:

  1. If the other country chooses high tariffs, USA's best response is to choose high tariffs (since "0" is better than "--" as a payoff) [we track the best responses with ticks, and not-best-responses with crosses; Note: I'm also tracking which payoffs I am comparing with numbers corresponding to the numbers in this list];
  2. If the other country chooses low tariffs, USA's best response is to choose high tariffs (since "++" is better than "0" as a payoff);
  3. If USA chooses high tariffs, the other country's best response is to choose high tariffs (since "0" is better than "--" as a payoff); and
  4. If USA chooses low tariffs, the other country's best response is to choose high tariffs (since "++" is better than "0" as a payoff).

Notice that the game itself doesn't change. Imposing high tariffs is still a dominant strategy for both countries, and the outcome where both countries choose high tariffs is the only Nash equilibrium (and is also a dominant strategy equilibrium).

However, there is an important difference in this game when it is played as a repeated game. There is no incentive for players to cooperate. That's because cooperating results in a payoff of "0", just the same as not cooperating. And because of this, the tit-for-tat strategy would be pointless.

Which seems to be what other countries are finding, when dealing with Trump. Other countries may think they are playing the first game, where a tit-for-tat strategy may get Trump to reconsider. But if Trump thinks they are playing the second (zero-sum) game (and it seems that he does), then the tit-for-tat strategy is simply not going to work.

[HT: Sarah from my ECONS101 class]

Friday, 14 March 2025

This week in research #66

Here's what caught my eye in research over the past week:

  • Clements (open access) provides notes on getting a job, for PhD graduates in economics and finance
  • Armstrong et al. (open access) find that legalisation of cannabis in Canada has no statistically significant effect on alcohol sales overall, but beer sales decreased significantly and was offset by a significant increase in sales of other alcoholic beverages
  • Yechiam and Zeif provide another meta-analysis of loss aversion, concluding that findings of strong loss aversion are replicated when losses are smaller than gains and when gains and losses are presented in an ordered fashion, but for studies with symmetric gains and losses and no ordering of items, the loss aversion parameter is not statistically significant
  • Capponi and Frenken (open access) investigate the careers of 473 scientists at Dutch universities during the period 1815–1943, and find that 'inbreeding' (having a PhD supervised by a professor who holds a PhD from the same university and within the same discipline) generally enhances academic performance, but only in the early lifecycle stages of a new intellectual movement
  • Brooks et al. (open access) find that research produced by female finance academics is published in lower-rated journals and garners fewer citations, and that female-authored work in finance is ‘penalised’ more for its interdisciplinarity than similar research authored by men
  • El Tinay and Schor find that the top five economics journals have cumulatively only published 25 unique research articles on the topic of climate change from 1975 to 2023, and that they have failed to engage with the role and consequences of domestic and global inequality in the dynamics of climate change
  • Nielsen et al. estimate that Danes would be willing to pay €9.70 million to host a Formula One Grand Prix in Copenhagen, Denmark (not anywhere near what it costs, so don't expect a Danish Grand Prix again anytime soon)
  • Pitts and Evans examine the impact of name, image, and likeness (NIL) rights on college football recruiting, and find that the average of the top-ten NIL valuations for a university’s football players was correlated with the perceived quality of the players recruited (so good players are attracted to colleges that have higher valued players already)

Thursday, 13 March 2025

Hawks, doves, Israel and Iran

In The Conversation last October, Andrew Thomas (Deakin University) discussed the recent (at that time) military flare-up between Iran and Israel, likening it to a 'game of chicken':

Israel’s strike on military targets in Iran over the weekend is becoming a more routine occurrence in the decades-long rivalry between the two states...

There is a reason why direct military strikes between nations are rare, even between sworn enemies. When attacking another state, it is difficult to know exactly how they will respond, though a retaliatory strike is almost often expected.

This is because defence forces are not just used for fighting and winning wars – they are also vital to deterring them. When a fighting force is attacked, it’s important for it to strike back to maintain the perception it can deter future attacks and make a display of its capabilities. This is what is happening right now between Israel and Iran – neither side wants to appear weak.

If this is the case, where does the escalation end? De-escalation is essentially a game of chicken – one side has to be content with not responding to an attack to take the temperature down.

My ECONS101 class has been covering game theory this week, including the chicken game. In the traditional game of chicken there are two rivals in cars, one at each end of the same street. They drive towards each other at top speed, and whichever rival swerves away first loses the game. So, each rival can choose to speed ahead or swerve away, and each would prefer to speed ahead and win the game. However, the problem is that if both simply keep speeding ahead, it will end in a disastrous crash.

I also recently read the book Hidden Games, by Moshe Hoffman and Erez Yoeli (which I reviewed here). Hoffman and Yoeli have an interesting section in the book on the hawk-dove game, which essentially the chicken game but with a slightly different motivating context. In the hawk-dove game, two rivals are competing over some resource. Each rival can choose to be aggressive or submissive, and whichever rival is more aggressive will win the resource. Each rival would prefer to be aggressive and win the resource. However, if both are aggressive, it ends in a massively disastrous battle.

Coming back to the case of Iran and Israel, this is clearly an example of the hawk-dove game (or the chicken game, if you prefer). This game is laid out in the payoff table below, where the strategies for Israel and Iran are to be aggressive, or submissive. The payoffs are expressed as "+" for good outcomes, and "-" for bad outcomes (and "--" is particularly bad), while zero is a neutral payoff.

To find the Nash equilibrium in this game, we use the 'best response method'. To do this, we track: for each player, for each strategy, what is the best response of the other player. Where both players are selecting a best response, they are doing the best they can, given the choice of the other player (this is the definition of Nash equilibrium). In this game, the best responses are:

  1. If Iran chooses to be aggressive, Israel's best response is to be submissive (since "-" is better than "--" as a payoff - in other words, taking a bit of punishment is better than a massively disastrous war) [we track the best responses with ticks, and not-best-responses with crosses; Note: I'm also tracking which payoffs I am comparing with numbers corresponding to the numbers in this list];
  2. If Iran chooses to be submissive, Israel's best response is to be aggressive (since "+" is better than "0" as a payoff);
  3. If Israel chooses to be aggressive, Iran's best response is to be submissive (since "-" is better than "--" as a payoff); and
  4. If Israel chooses to be submissive, Iran's best response is to be aggressive (since "+" is better than "0" as a payoff).

In this scenario, there are no dominant strategies. Neither country has a strategy that is always better for them, no matter what the other country chooses to do. However, there are two Nash equilibriums (outcomes where both players are playing their best response), which occur when one country is aggressive, and the other is submissive.

The thing about the Iran-Israel hawk-dove game is that it isn't really a simultaneous game, as shown in the table above. It is a sequential game. Each player chooses whether to be aggressive or submissive, knowing what the other player chose to do previously. That sequential game is shown below. [*]

We can solve a sequential game using 'backward induction', which is essentially the same as the best response method, except we make sure we start with the last player, and work our way backwards through the game to work out what the first player should do. The resulting equilibrium that we find will be a 'subgame perfect Nash equilibrium'. In this case:

  1. If Israel chooses to be aggressive, Iran's best response is to be submissive (since "-" is better than "--" as a payoff) - now, since Iran would never choose to be aggressive when Israel has already been aggressive, Israel knows that the outcome will be that Iran is submissive;
  2. If Israel chooses to be submissive, Iran's best response is to be aggressive (since "+" is better than "0" as a payoff) - now, since Iran would never choose to be submissive when Israel has already been submissive, Israel knows that the outcome will be that Iran is aggressive;
  3. Israel will choose to be aggressive (since "+" is better than "-" as a payoff).

The subgame perfect Nash equilibrium is that Israel is aggressive, and Iran is submissive. As it turns out, that's sort of what happened. After an initial flurry of missile attacks, Iran stopped escalating the conflict.

One last thing to note is that this is a repeated game. Israel and Iran will find themselves in conflict often. The games as outlined above suggest that whichever country is the initial aggressor will end up getting their way, because the country moving second in the sequential game will be better off being submissive than retaliating. However, in repeated games, the outcome can often deviate from the equilibrium for strategic reasons.

Israel clearly doesn't want Iran to be aggressive, even though aggression would be good for Iran, if they moved first in this game. So, Israel wants to convince Iran not to make an aggressive first move. The only way that Israel can do that is to convince Iran that Iran would be worse off by making an aggressive first move. Israel needs to convince Iran that Israel will always retaliate with aggression. That would deter Iran from being aggressive as a first move. How does Israel achieve this? By developing a reputation for aggressively retaliating against any aggression. And indeed, that is what Israel has done (one need only look at Gaza or Lebanon for confirmation of this).

Israel's aggressive response to attacks by Iran, Gaza, and Lebanon is part of a strategic plan to deter future aggression against Israel. Many of us may not like it, but it's strategically rational. Whether it has a lasting effect remains to be seen.

*****

[*] I'm showing the game as having Israel move first. However, if you read Thomas's article, you'll see that 'who started it' is actually contested. I'm not taking a stand on that here, and in fact the game looks identical if Iran moves first.

Tuesday, 11 March 2025

The negative incentives in driver licence testing

My son has had a motorbike and his motorbike learner's licence for the last six months or so. Last week, he attempted the practical driving (riding?) test to move to a restricted motorbike licence. The test cost him $175, paid to the testing agency. The good part was that at least he could do the test riding his own motorbike, which he is familiar with. He was looking forward to the additional freedom that the restricted licence allows.

My son failed the test. Apparently, part of the practical driving test involves being able to maintain stable control of the motorbike while travelling at 100km/h. The person administering the test asked my son to ride along a straight road uphill. My son's 225155cc motorbike can only just maintain 100km/h on a straight flat road. It was never going to be able to do so on an incline, and it didn't. That's why he failed his test (although he also admits he didn't indicate when turning around at the end of a cul-de-sac).

So, now my son will have to attempt the test a second time. Unlike testing to get a car licence, the fee you pay does not include multiple attempts. My son will have to pay again for his next test. And because he may legitimately worry that his motorbike won't get to 100km/h in that part of the test again, he will need to use a bike on loan from the testing agency. The cost for that is $225.

What does this have to do with economics, you ask? I question the incentives here. The agency doing the testing benefits each time a driver fails the test, because the driver will need to sit an additional test at additional cost. That additional test means additional revenue for the testing agency. The agency therefore has an incentive to fail as many drivers as possible, to increase their revenue (and profits). I'm not saying that's what happened for sure in this case, but it's consistent with the incentives that the system creates.

And there are any number of ways that a testing agent can fail a driver. When I was young, I had two friends who both failed their restricted car licence test, because the testing agent leant over, honked the horn and waved to someone walking by. Because this was dangerous and the driver didn't prevent the testing agent from doing it, my friends both failed their test. At the time, we all just thought that it was messed up and that the testing agent was a bit of an ass. But again, it is consistent with the incentives that the system creates.

How could this problem be resolved? To some extent, it is now resolved for car licences. Unlike when I was young, the application fee for a car licence includes two tests (see here). If a driver fails their first test, they can sit a second test for free. With that system, there is little incentive for the testing agent to fail the driver, because having the driver come back for a second test costs the testing agency but provides no additional revenue. The incentive problem is alleviated. What I don't understand is why the system is different for motorbike and heavy vehicle licences, where drivers have to pay for every test.

Incentives matter. People will take advantage of a system that provides them with an avenue for additional gain, and the driver licensing system seems to do that for testing agencies. My son will no doubt pass his next licence test, but at significant extra (and possibly unnecessary, depending on how serious not indicating at the end of a cul-de-sac is viewed) cost. He won't be alone in this experience. Motorbike riders would be significantly better off if the system was changed.

Monday, 10 March 2025

Climate change denial is given a platform by the American Journal of Economics and Sociology

This editorial by the governing board of the American Journal of Economics and Sociology caught my attention, and was a real surprise. It says:

As part of a planned special section on climate change, an earlier AJES editor solicited an article by Andy May and Marcel Crok that was intended to counterbalance the scientific consensus that climate change is a serious problem caused largely by human activities. However, since the article was written more in a spirit of defiance or rebellion than as a contribution to a dialog, it has posed a number of problems for the journal.

So far, so problematic. But then it takes a turn for the worse:

The easiest decision would have been to simply pull the article and to publish the other articles in the issue. However, similar articles are ubiquitous on the Internet, and withholding publication will not restrict the flow of popular criticism of the climate consensus. Since the aim of AJES is to serve the public by discussing questions of social significance, we have chosen to make use of an awkward situation by confronting directly the abuse of research methods in this article. Rather than backtracking and refusing to publish an article that would normally be rejected, we prefer to publish the article in question in combination with (1) this explanation of why AJES has chosen to publish it, (2) a rebuttal by physicist Tinus Pulles, and (3) an examination by Clifford Cobb, a former AJES editor, of the social and political context of the climate debate. We hope that everyone might gain by opening climate-change denialism to scrutiny in this way.

I'm not a huge fan of de-platforming people. However, this is a case where the authors should have no expectation of being published. As the governing board says, the article would have normally been rejected. If you read the rest of the editorial, it is clear that the article fails to meet basic standards of scholarship, and for that reason alone it should not have been published. Because there should have been no expectation of being published, the authors aren't really being denied a platform that they otherwise should have had access to. The tricky thing is that the article was invited. But nevertheless, invited articles should still be subject to peer review and editorial discretion on their suitability for publication.

To make matters worse, the governing board's response is ineffective. While they have published a rebuttal and the editorial, most readers will access the May and Crok article directly from the web. Accessing the article directly here, there is no indication on that page that the rebuttal or the editorial even exist. A small amount of comfort is provided by the article not being open access, and therefore behind the Wiley paywall. Only readers with an institutional or other subscription will have access to it (although there are ungated versions available elsewhere). Fortunately, the rebuttal by retired environmental scientist Tinus Pulles is open access, and therefore available to everyone. But again, few who read the original article will know that the rebuttal exists, even though it is available open access.

Climate change is a serious issue, and requires serious scholarship. While it may be possible that the broad scientific consensus is wrong (it wouldn't be the first time), it would take some theoretically well-reasoned and empirically strong research to overturn it. This is not that research. And for that reason, it is incredibly disappointing that the governing board of American Journal of Economics and Sociology have given it an airing. We should expect better from the authorities that are tasked with protecting the quality and integrity of published research.

Friday, 7 March 2025

This week in research #65

Here's what caught my eye in research over the past week:

  • Yildirim and Bilman look at how removing the away goals rule and the introduction of video assistant referees affected soccer (ok, football) game dynamics in the UEFA Champions League, finding that removing the away goals rule increased defensive play, and the introduction of VAR led to less referee bias against away teams
  • Nguyen, Ost, and Qureshi (with ungated earlier version here) find that recent generations of elementary school teachers are significantly more effective at raising math test scores for students than those from earlier generations, and that the effects are significantly larger for African American students
  • Sievertsen and Smith (open access) find that the opinions of individual expert economists affect the opinions expressed by the public, the opinions expressed by visibly senior female economists are more persuasive than the same opinions expressed by male economists, and that removing credentials (university and professor title) eliminates the gender difference in persuasiveness, suggesting that credentials act as a differential information signal about the credibility of female experts
  • Ersoy and Speer (with ungated earlier version here) find that university student choices of major depend on non-job-related factors, such as a major’s course difficulty and gender composition, and while female students tend to avoid majors that are more difficult than they originally believed, male students are averse to majors with more female faculty but prefer those with more female students
  • Qi, Wang, and Wang find that there is an inverted U-shaped relationship between income inequality and economic growth in the long run (so economic growth is lowest when inequality is low, and when inequality is high)
  • Gans (with ungated earlier version here) shows that time travel does not undermine but rather reinforces the no-arbitrage conditions at the heart of the Efficient Markets Hypothesis
  • Jansson and Tyrefors (open access) investigate at university-wide anonymous grading reform at Stockholm University benefited female students more than male students
  • Abreha, Johnson, and Robertson (with ungated earlier version here) find that President Bukele’s 2022 crime crackdown in El Salvador reduced outward migration from El Salvador to the US by 45%-67%

Thursday, 6 March 2025

University students should probably take notes by hand, not on laptops

Moving around the lecture theatre while my students are working on some problem or exercise, it is surprising to see how many students persist with trying to use a laptop. I say surprising because most of the exercises I do with my class involve drawing diagrams, and drawing diagrams on a laptop is not easy. It would be much easier to draw the diagrams on paper, and that is what most students do.

The choice of note-taking medium is not benign. There have been a number of studies that have shown that hand-written notes are better for student learning than notes taken on devices such as laptops or tablets. In fact, I've blogged on this topic before (see here, as well as this post on laptop use more generally).

A recent meta-analysis makes the case even more strongly, that hand-written notes are better than notes taken on devices. That meta-analysis is reported in this 2024 article by Abraham Flanigan (Georgia Southern University) and co-authors, published in the journal Educational Psychology Review (open access). They combined the results of 24 studies (published across 21 articles), limiting their scope to experimental or quasi-experimental studies that involved university students (therefore excluding studies on high school students, who may approach their studies in different ways). They focus on two outcomes: (1) student achievement; and (2) the volume of notes taken.

In relation to student achievement, Flanigan et al. find:

...a mean effect size of 0.248, p < .001[95%CI ∶ 0.181, 0.315], which was statistically significant. This finding indicates that the overall sample of studies found that handwritten note-taking had a positive effect on achievement, meaning that handwriting notes produced higher achievement than typing notes.

Students using hand-written notes perform nearly a quarter of a standard deviation better than students using laptops or tablets for note-taking. That's not a huge difference, but it could make a real difference for some students. In my ECONS101 class, a quarter standard deviation difference is about four percentage points in overall grade, enough to drop a student by a grade point (from a B+ to a B, for example).

Flanigan et al. them perform some further analysis, which shows that the positive effect of hand-written notes (compared with notes taken on a device) is statistically similar for immediate and delayed assessments, and for different measures of student achievement. When students are allowed to review their notes, the effect is larger than when they are not allowed to review them. 

In terms of the volume of notes taken, Flanigan et al. find that:

...the overall mean effect size was statistically significant (0.919, p < 0.001[95%CI ∶ 0.679, 1.160]). These results indicate that typed notes contain more words and ideas than handwritten notes.

Taking those two results together, when students take notes on a device, they write a greater quantity of notes, but those notes are less effective in terms of the students' learning. Flanigan et al. offer some explanation for these results, being that:

...handwriting notes produce deeper processing than typing notes. Longhand notes tend to capture lecture ideas in a paraphrased and personalized style meaningful to the note-taker, whereas typed notes tend to capture lecture ideas in a verbatim, almost thoughtless way... Although typing leads to a greater quantity of recorded ideas than writing does, the shallow, verbatim nature of typing notes seems to hinder their external storage value, thereby rendering typed notes less useful during review than handwritten notes...

...handwritten notes contain more lecture images than typed notes. In studies measuring the number of images recorded in notes, college students typing notes recorded zero lecture images, whereas longhand notetakers recorded multiple images... According to dual-coding theory... learning occurs best when information is coded both verbally and visually.

That latter explanation is likely to be particularly important in economics. If a student doesn't take good notes of the diagrams, then they are likely to struggle to learn in class, and struggle to review later, leading overall to worse achievement in their assessments. For this reason, students should really be hand-writing their notes.

However, there is an exception, and it was good to see Flanigan et al. make this point:

Handwriting or typing lecture notes might not be an option for some students, whether their disabilities are physical or cognitive in nature. Other students might require note-taking assistance, such as having another student record notes for them or instructors providing notes for them.

Despite the number of students accessing additional support for disabilities appears to have grown over time, I have noted the number of 'note takers' (employed to take notes for other students) has declined over time. That may be because devices are seen as alleviating the need for human note takers. However, Flanigan et al.'s results suggest that relying on devices rather than human note takers may be making students with a disability worse off.

Finally, there are two points that Flanigan et al. don't make in their article, which are useful to consider. First, not all laptops and tablets are created equal. Touch-screen devices that allow students to hand-write notes, and then convert those notes to text, are probably more like hand-written notes than notes taken on a device. However, that may depend on how the device is used, because they do of course allow students to type as well. Some further study on this seems warranted. Second, I wonder how note-taking by AI would fare in comparison with notes taken by a student? I suspect that the AI would err on the side of verbatim notes, similar to typed notes by a student, rather than paraphrased notes that a student who is hand-writing their notes would make. On the other hand, the AI could be explicitly instructed to paraphrase. Going back to the previous point about students with a disability, an appropriately instructed AI tool could make a substantial positive impact. That is definitely something worth exploring further.

How students take notes has an impact on their learning, and on their achievement in assessments. For now it seems, hand-written notes remain best, at least until we can let an AI take over the note-taking.

[HT: This article in The Conversation last month]

Read more:

Wednesday, 5 March 2025

Minimum wages and alcohol consumption

There are several reasons to believe that higher minimum wages will affect alcohol consumption. First, higher incomes for those on the minimum wage gives them greater purchasing power. If alcohol is a normal good (which it is), then as their incomes increase people will consume more alcohol. On the other hand, higher minimum wages may lead to disemployment, especially among young people and those in the food and beverage industries. In that case, those workers without jobs have lower incomes and would consume less alcohol. However, losing a job (or not having a job) can be a stressful experience, and increase the incentives to drink alcohol as a coping strategy. And having a job that pays more due to a higher minimum wage may reduce financial stress and reduce the incentives to drink. Overall, there is a lot going on, and it isn't clear at all whether, overall, a higher minimum wage should lead to more alcohol consumption, or less alcohol consumption.

That's where this new article by Yihong Bai (Western University in Ontario) and Michael Veall (McMaster University), published in the journal Economics and Human Biology (open access), comes in. Bai and Veall use longitudinal data from the Canadian National Population Health Survey (NPHS) from 1994/95 to 2010/11, and look at how alcohol consumption is related to the province-level minimum wage (adjusted for inflation). Their full sample includes over 18,000 observations for a little over 4000 individuals in Canada. They measure alcohol consumption in six different ways: (1) whether each person is a drinker or not; (2) whether they binge drink at least once per month on average; (3) whether they are a 'heavy drinker' (defined by binge drinking at least once per week on average, or having average daily alcohol consumption [ADAC] of two drinks for men, or one drink for women; (4) average number of drinks over the last month; (5) number of binge drinking events over the last month; and (6) ADAC.

Bai and Veall apply a two-way fixed effects approach to identify the effects of the minimum wage on alcohol consumption, and find that:

...almost all the estimated coefficients are very small with reasonably tight confidence intervals that cover zero.

In other words, there is very little evidence that higher minimum wages increase alcohol consumption. However, that is based on the whole population, and minimum wages are more likely to affect low-income workers. Rather than looking at low-income workers directly, Bai and Veall look at low-education workers (being those with high school education or less), who are also more likely to be affected by minimum wage changes. For that group, they also find that almost all of the coefficients are not statistically significant. Another group that tends to be more affected by minimum wage changes is young people, and Bai and Veall report that:

We also estimate using samples for a sample of ages 21–25 and ages 15–20 and find no evidence of minimum wages increasing drinking. However, our confidence intervals are wide and this finding must be treated with caution.

One of the key issues with this paper is that they perform a large number of regressions (with six dependent variables), but don't adjust for multiple comparisons. This is important because the more comparisons they make, the more likely it is that some will turn out to be statistically significant just by chance. That's why I discount their finding that the ADAC decreases when the minimum wage increases. I doubt that it would be robust to an adjustment for multiple comparisons, and the ADAC results are inconsistent with the effects in the other models.

On the other hand, the two-way fixed effects approach is problematic and has attracted a lot of criticism recently (which is nicely outlined in two posts on the Development Impact blog, here and here, as well as this post). The short version is that the two-way fixed effects approach is likely to lead to biased estimates of the treatment effect - in this case, it would lead to a biased estimate of the effect of minimum wages on alcohol consumption. It isn't clear what direction the bias would lead.

So, by itself this paper doesn't give an answer to the question of whether minimum wages affect alcohol consumption or not, and if they do affect alcohol consumption whether minimum wages lead to an increase, or a decrease, in alcohol consumption. This research question is far from settled and is an area where future research would be useful.

Tuesday, 4 March 2025

Local minimum wages and low-quality housing rents in Japan

In this 2023 post, I discussed the impact of higher minimum wages on homelessness. Part of the story related to housing rents:

There are a couple of reasons to expect that higher minimum wages might increase homelessness. If minimum wages decrease employment (a result that is contested, but I believe it is likely given the galaxy of literature we have to date; again, see the links at the end of this post), then higher minimum wages may directly increase the risk of people becoming homeless. That's because when low-income people workers lose their jobs, they may no longer be able to afford to pay rent, and may lose their homes. Second, if minimum wages increase incomes for those that are not made unemployed, they may increase the demand for housing, pushing up rents. This may indirectly increase the risk of people becoming homeless, who can no longer afford the higher market rent.

The research that I referred to in that post found that higher minimum wages increased homelessness, and that they also increased housing rents, consistent with the mechanism outlined above. However, we shouldn't believe just a single paper's research findings. This 2021 article by Atsushi Yamagishi (Princeton University), published in the journal Regional Science and Urban Economics (ungated earlier version here), provides some additional evidence, this time from Japan.

Japan provides an interesting case study for examining the effects of the minimum wage on housing, because:

Japan has forty-seven prefectures and each has a different minimum wage rate. There is no difference in the minimum wage rate within a prefecture...

And on top of that, each prefecture has little control over its local minimum wage. Yamagishi notes that:

...the minimum wage setting in Japan is highly centralized and unresponsive to trends in local housing markets due to institutional features. Japanese prefectural minimum wages are determined by the following process. First, the central government classifies prefectures into four categories, and it assigns the targeted amount of minimum wage increase to each category. The categorization is reviewed only once every five years and changes in the classification are rare.

Yamagishi uses data from 2007 to 2013, and exploits an interesting natural experiment, where:

From 2007 to 2012, a new consideration took the primary role in setting minimum wages due to the national policy change... after the revision of the Minimum Wage Law in 2007, the primary consideration in setting the minimum wage rate became closing the gap between the quality of life of minimum wage workers and people relying on Public Assistance (seikatsu-hogo, PA henceforth)...

Since the gap was generally larger in urban areas, the policy resulted in a plausibly exogenous minimum wage increase in urban prefectures...

So, not only is there variation in minimum wages across prefectures, and that variation is not related to local housing markets, there is a change in the variation driven by the policy change. Yamagishi uses data on advertised apartment [*] rents from At Home, "one of the most popular online real estate search engines in Japan". Using both an event study research design and a difference-in-differences design, Yamagishi found that:

...low-quality apartments experience around a 2.5-4.5% rent increase in response to a 10% minimum wage increase.

When looking at differences by apartment quality (proxied by the age of the apartment, with 'old' apartments being over 25 years old and 'very old' apartments being over 35 years old), Yamagishi found that:

An old apartment experiences a rent increase of around 3.3% when the minimum wage increases by 10%, which is statistically significant at the 1% level. A very old apartment experiences an increase of around 4%, which is also significant at 1% level. Overall, the result reveals the larger impact on the rents of lower-quality apartments...

Of course, the key point here is that workers on the minimum wage are more likely to live in low-quality apartments than in higher-quality apartments. So, the takeaway from this paper is that higher minimum wages may make some workers better off in terms of higher wages (while also considering the disemployment effects of the higher minimum wage), but that the gains of those workers would be offset somewhat by higher rents. Yamagishi estimates that landlords gain between 7.5-13.5 percent of the increased minimum wage, but the assumptions necessary to arrive at that estimate are a little difficult to justify.

Nevertheless, the overall point stands. The minimum wage workers lose some of their higher minimum wage to higher rents.

[HT: Marginal Revolution, back in 2023]

*****

[*] As an interesting aside, Yamagishi notes that in Japan, 'apartments' are generally low quality. A high quality apartment is referred to as a 'mansion' (see here).

Read more: