Friday, 31 January 2025

This week in research #60

Here's what caught my eye in research over the past week:

  • Madsen (open access) shows that economic growth will not diminish overall across 21 OECD countries, because the ageing population and the associated educational and innovative expansion induced by the demographic transition will override the adverse income effects of the ageing population
  • Fang and Miao find that cities with higher exposure to industrial robots have lower overall crime rates in China, with evidence that industrial robot exposure increases employment opportunities for low-educated adults, who are more likely to engage in criminal activities
  • Huo finds that as workforce ageing intensifies, right partisanship becomes increasingly associated with more regulatory restrictions on competition, greater market concentration, and a greater pure profit share of income, using data from OECD countries

Wednesday, 29 January 2025

The gender gap in apologies

The gender gap in wages remains persistent. Even after controlling for a range of observable factors such as education, experience, time outside the workforce, choice of occupation and industry, women earn less than men. Some researchers have turned to looking at behavioural differences, such as differences in competitiveness (see here for example), negotiation, self-promotion, and so on. This job market paper by Lily Liu and Marshall Mo (both Stanford University) asks whether there is a gender gap in apologies. Their argument is simple:

Workers’ apologies may be interpreted by their superiors as a signal of incompetence. Some popular writers suggest that women apologize too readily for their transgressions, potentially holding them back in the workplace...

So, if women apologise more than men, they may receive lower pay as a result. Liu and Mo test this with a series of experiments. In the first experiment, they test whether there is a gender gap in apologies. As they explain, in their 'worker experiment':

We let workers perform a task where their performance affects the likelihood of their paired employer getting a high payment. Here, workers’ task performance is a proxy for their ability. If employers get a low payment, workers might feel the need to apologize. Workers can send a message to their paired employers. We use three types of messages to capture both the extensive and intensive margins of apologies: a free-form message where workers can send anything, a binary message where workers choose either “yes” or ‘no” to sending “I am sorry about this outcome.”, and a continuous message where workers indicate from a scale of 0 to 100, how much they agree with the statement “I am sorry about this outcome.”. We hypothesize that female workers apologize more conditional on having the same task performance.

In this worker experiment, they find that:

...the gender apology gap exists: conditional on the same task performance, female workers are twice as likely to apologize in the free-form message and apologize 14% more in the continuous apology message. The apology gap remains significant after controlling for confidence: after workers are informed of their absolute and relative performance, female workers apologize 12% more than male workers. This suggests that confidence is not the only reason behind the apology gap.

So, women apologise more. But does this matter? In a second experiment, Liu and Mo test the labour market implications of apologies. Specifically, in their 'employer experiment':

Each employer is paired with ten workers, and for each worker, employers will decide whether to promote that worker after learning the worker’s basic information (gender, age group, and education level), whether the task succeeded (i.e. if the employer got a high payment), and the continuous apology message. Employers are incentivized to promote workers who have above-average performance. Additionally, we ask employers to guess the worker’s task performance and report how warm they feel toward that worker. We test if employers infer lower ability from workers’ apologies and examine how apologies influence employers’ promotion decisions.

In this employer experiment, they find that:

...employers infer lower ability from apologies. Conditional on all the information employers learn about the worker (gender, age group, education level, and whether the task succeeded), employers think workers who apologize more in the continuous message have worse task performance. At the same time, apologies increase employers’ warmth toward the worker, which mitigates the negative effect of apologies on promotion. As for how the apology gap affects female workers’ labor market outcomes, we find that employers are aware that female workers apologize more when asked about it, but they do not take it into account when making ability inferences. As a result, employers infer a lower task performance from female workers.

Finally, Liu and Mo conducted a survey at the end of their worker experiment, to test whether their findings might hold in the real world. In this survey:

...we find that female workers apologize more than male workers.

They also back this up with a:

...textual analysis of 2 million congressional speeches and show that congresswomen are about 30% more likely to give speeches containing apologetic words.

There is much more detail on the specifics of the experiments in the paper. However, this is research with a clear and practical implication: Stop apologising! Especially, if you are a woman.

[HT: Marginal Revolution, last November]

Tuesday, 28 January 2025

The 'old boys' club' that is the American Economic Association

Economics is well known to be an 'old boys' club'. Any doubt about that should be dispelled by looking at the list of past Nobel Prize winners (with apologies to Ostrom, Duflo, and Goldin). The main things that past winners have in common are past presidency of either the American Economic Association or the Econometric Society, and/or current or past affiliations to Harvard, MIT, or Chicago.

The extent of the 'old boys' club' within the American Economic Association (AEA) is laid bare in this 2023 article by Kevin Hoover (Duke University) and Andrej Svorenčík (University of Pennsylvania), published in the Journal of Economic Literature (ungated earlier version here). Somewhat ironically (given the topic of the paper), the JEL is one of the journals published by the AEA.

Hoover and Svorenčík start by drawing from the Professional Climate Survey of the AEA in 2019:

The report reveals that the feeling that the AEA leadership is insular and disconnected from the membership is a widely held view, but it is mainly impressionistic. Our goal is to go beyond impressions and to carefully document and analyze the hierarchical structure of the leadership of the AEA and how it has changed over time.

Hoover and Svorenčík document this using data on the executive leadership of the AEA (made up of the Executive Committee, as well as the Nominating Committee that determines nominations for the Executive Committee), covering the period from 1950 to 2019. They focus on:

...the university at which an AEA leader or losing candidate received his or her highest academic degree (typically doctorates) and his or her places of employment (academic or nonacademic) at the time of appointment or of standing for election (win or lose).

The education of the executive leadership of the AEA is summarised in Figure 2 from the paper:

There are several things to note from this figure, including the sheer dominance of a small number of institutions (Harvard, MIT, Chicago, and Stanford), as well as the rise of MIT (and Stanford) and the relative decline of Chicago. There is more diversity of institution apparent in the employment data, as shown in Figure 3 from the paper:

Harvard, Stanford, MIT, and Chicago are joined by Princeton, but together they make up nearly half of the executive leadership in the more recent years. This concentration has grown substantially over time (in contrast to educational concentration, which in comparison has grown more slowly over time).

Hoover and Svorenčík then conduct some further analyses, trying to tease out the mechanisms underlying this increasing concentration. Those analyses are less robust, but point towards privilege over merit. They conclude by saying:

What, in the end, have we learned about who runs the AEA? The most obvious lessons are, perhaps, hardly surprising: the AEA leadership is overwhelmingly drawn from a small group of elite, private research universities—in the sense that its leaders were educated at these universities and, to a lesser degree, employed by them. What is less well-known is that for much of the past 70 years, the AEA leadership has been drawn predominantly from just three universities—Harvard, MIT, and Chicago. The leadership is spread more widely among places of employment; but, here too, a small number of institutions dominate... the pattern has become more pronounced through time: even within the group of elite universities, the top group has become more important and the bottom group less so... The vast majority of American universities with graduate programs and employers of economists other than elite universities have, at best, enjoyed token representation among the leadership.

Finally, Hoover and Svorenčík make several suggestions for reform:

Our research suggests that democratization of the AEA leadership would probably require structural changes. In particular, measures that would break the dynamic of network formation would be necessary. Key points of leverage would be arrangements that forced the Executive Committee to be drawn from a broader spectrum of institutions and the establishment of a Nominating Committee with greater independence from the incumbent leadership. In particular abandoning the practices of having past Presidents serve as Chairs of the Nominating Committee and of requiring the slate of candidates to be approved jointly by the Nominating and Executive Committees would help to decouple nominations from the existing structure of the leadership...

I'm not well placed to comment on the specifics of their suggested reforms. However, anything that reduces the power and influence of the 'old boys' club', and opens the leadership to a greater diversity of members, would likely be a good thing.

[HT: This 2023 article in Slate]

Sunday, 26 January 2025

Try this: Treasury's Income Explorer shows effective marginal tax rates

A new Treasury Analytical Note by Meghan Stephens, Yvonne Wang, and Liam Barnes presents data on effective marginal tax rates for different families (more on that in a moment). However, one cool thing that the note points to is Treasury's Income Explorer tool, which allows you to graph effective marginal tax rates (EMTRs) based on different historical tax schedules (from 2014 to 2024, and forecast tax schedules for 2025 to 2028. You choose the taxpayer's hourly wage, whether they are partnered (and the partner's work hours and pay rate), the number of children, and whether they are a homeowner or renting (which affects their eligibility for the accommodation supplement). This allows you to look at EMTRs for a whole variety of taxpayers in different situations.

As a quick reminder, the effective marginal tax rate for a taxpayer is the proportion of the next dollar earned that is lost to taxation and to decreases in government transfers, rebates, or subsidies. These rates can get quite high - sometimes over 100 percent (in which case, the taxpayer would be worse off in net terms by earning another dollar). As an example of a high EMTR, consider this graph (made in the Income Explorer tool) based on a single parent (with two children aged 0 and 2) earning $40 per hour in their job, and paying weekly rent of $450:

Notice that the EMTR varies depending on the number of hours worked (shown along the top x-axis) and annual income (shown along the bottom x-axis). There are points in the distribution where EMTRs are low, but others where EMTR is very high. And for this taxpayer, working between 38 and 42 hours leads to an EMTR that is 107.6 percent. The Income Explorer tool even breaks that EMTR down: it is made up of 34.6 percent wage tax (including ACC levies), 27 percent Working for Families abatement, 21 percent Best Start abatement, and 25 percent accommodation supplement abatement. Notice that most of the EMTR (73 percentages points out of 107.6) is made up of reductions in entitlements to government assistance for that taxpayer. This would clearly affect the incentives to work additional hours (at least, between 38 and 42 hours).

Fortunately, the results are not so bad across the board. Stephens et al. show that less than six percent of all taxpayers have EMTRs greater than 50 percent, as summarised in this table:

The majority face an EMTR between 25 percent and 50 percent, and for most people, their EMTR is equal to their marginal income tax rate (plus ACC levies).

Understanding EMTRs is important for understanding the incentive effects of the tax and transfer system. Treasury's Income Explorer is a great tool for visualising EMTRs (as well as replacement rates, participation tax rates, and other complementary measures). Try it out for yourself!

[HT: Les Oxley for the analytical note]

Friday, 24 January 2025

This week in research #59

Here's what caught my eye in research over the past week:

  • Saygin and Zhang (open access) find that the overall quality ratings on RateMyProfessor.com have a greater influence on course enrolment at a particular unnamed US research university than official teaching evaluations, particularly affecting the enrolment decisions of female students
  • McCannon (with ungated earlier version here) finds that adding a NCAA Division III college football team has no statistically significant effect on undergraduate enrolment, and it reduces the proportion of enrolments who are women
  • Dagorn and Moulin (open access) find a 3.7 percent decline in the probability of re-enrolment for the subsequent academic year among the first cohort of university students affected by the COVID-19 pandemic in France
  • Pan et al. (open access) asked research participants to evaluate the same football videos with either visible or obscured players, and find that when players are visible, superstars receive lower performance ratings than non-superstars, and that this effect is even larger when the players' identities are obscured (maybe superstars aren't that great after all?)

Thursday, 23 January 2025

Daron Acemoglu expects only a tiny macroeconomic impact of AI

It would be fair to say that 2024 Nobel Prize winner Daron Acemoglu has been a bit of a sceptic about the impacts of generative AI (for example, see here). This scepticism is exemplified in a new paper forthcoming in the journal Economic Policy (ungated earlier version here). Acemoglu first notes that:

Some experts believe that truly transformative implications, including artificial general intelligence enabling AI to perform essentially all human tasks, could be around the corner... Other forecasters are more grounded, but still predict big effects on output. Goldman Sachs (2023) predicts a 7% increase in global Gross Domestic Product (GDP), equivalent to $7 trillion and a 1.5% per annum increase in US productivity growth over a 10-year period. Recent forecasts from the McKinsey Global Institute suggest that generative AI could offer a boost as large as $17.1 to $25.6 trillion to the global economy, on top of the earlier estimates of economic growth from increased work automation (Chui et al., 2023). They reckon that the overall impact of AI and other automation technologies could produce up to a 1.5–3.4 percentage point rise in average annual GDP growth in advanced economies over the coming decade...

Acemoglu then asks: "Are such large effects plausible?" His answer is a resounding "no". In making this assessment, Acemoglu relies on a relatively straightforward approach, which essentially boils down to working out the proportion of job tasks that will be affected by AI (relying on the earlier research I discussed here), how much they will be affected (in terms of potential cost savings), and how much of the economy those jobs make up. Overall, he estimates that:

...TFP [Total Factor Productivity] effects within the next 10 years should be no more than 0.66% in total – or approximately a 0.064% increase in TFP growth annually.

The effect of TFP growth on overall GDP growth rates depends on what happens to capital investment. Taking that into account, Acemoglu estimates that:

With these investment effects incorporated, GDP is also estimated to grow by 0.93–1.16% over the next 10 years. When I assume that the investment response will be similar to those for earlier automation technologies and use the full framework from Acemoglu and Restrepo (2022) to estimate the increase in the capital stock, the upper bound on GDP effects rises to around 1.4–1.56%.

Acemoglu then notes that AI will have bigger impacts on tasks that are easier to learn. He assumes that:

...productivity gains in hard tasks will be approximately one-quarter of the easy ones. This leads to an updated, more modest increase in TFP and GDP in the next 10 years that can be upper bounded by 0.53% and 0.90%, respectively.

Finally, Acemoglu argues that even these tiny GDP growth rate estimates likely overestimate the impact on wellbeing, because of potential offsetting negative impacts of AI, such as AI-powered social media. However, I think that point fails to take account of the fact that consumer surplus is not included in GDP growth (a real problem when you consider the contribution of 'free' (zero price) goods such as social media to GDP).

Nobel Prize winner Robert Solow once famously remarked that "You can see the computer age everywhere except in the productivity statistics". Clearly, Acemoglu is expecting something similar in terms of AI. This is a very public prediction of future AI-related growth, and it will be interesting to see if it is close to accurate over time.

[HT: Marginal Revolution, in April last year]

Tuesday, 21 January 2025

Book review: Hidden games

Game theory has a lot of real-world applications. I am never short of good examples to use when teaching game theory in my ECONS101 class. However, I can always use more examples. And so, I was really interested to read Hidden Games, by Moshe Hoffman and Erez Yoeli. The subtitle promises: "The surprising power of game theory to explain irrational human behavior". I set aside the word 'surprising', as I wasn't expecting to be surprised, but I was expecting to be entertained.

And indeed, the book is entertaining. Hoffman and Yoeli use examples from several television series, including The Wire, The Sopranos, and even Star Trek: The Next Generation. I really enjoyed those examples, and many others. At one point, they use game theory to explain Elizabeth's decision about whether to marry Mr Darcy or not, in Jane Austen's Pride and Prejudice.

The overall aim of the book is to explain social puzzles, and Hoffman and Yoeli note that:

To explain all of our social puzzles, we will use game theory, but the game theory will often be hidden and will need to be interpreted through the lens of learning and evolutionary processes.

Moreover, they write that:

One of the premises on which the analyses in this book rest [sic] is that learning, regardless of whether it is from one's own experience via reinforcement or from others' via imitation and instruction, leads us to do what is good for us, at least on average, much of the time.

So, the overall theme of the book is that game theory can explain social puzzles, and that we humans (as well as other animals in some sections of the book) act as if we are solving these puzzles using game theory, and that is because of learning. Given that some of the learning is social learning, this is really an evolutionary argument. And it makes a good story.

The first few chapters are easy to read and follow, and will engage most readers. Even the maths (and game theory can have a lot of maths) is relatively straightforward, and in any case is explained in a way that is easy to understand. However, this takes a turn when the book gets to Chapter 8 where Hoffman and Yoeli introduce elements of Bayesian reasoning (and evidence) into the picture. The maths transitions towards greater difficulty, and the explanations are not as clear. From that point on, I understood the maths but still found the book to be heavy going. Overall, when Hoffman and Yoeli are using narrative examples, the book is good. When they resort to maths, which is all too frequent through the second half of the book, it is not so good. In fact, I think that the book would have been much better if the maths had been excised and the examples explained narratively without the complicated technical details.

I also found several parts where I thought a bit more depth of narrative would have helped. For example, Chapter 7 discusses 'countersignalling' (signalling a positive attribute by not signalling that one has that attribute). However, Hoffman and Yoeli don't explain how it is that someone without the positive attribute couldn't simply pretend to be countersignalling. 

This is a good, if somewhat uneven book. A game theory enthusiast would certainly enjoy it, as will the more maths-inclined reader. Those without a good understanding of maths will probably be turned off by that aspect of the second half of the book which, although understandable, is a bit of a shame.

Friday, 17 January 2025

This week in research #58

Here's what caught my eye in research over the past week (a very quiet week, it seems!):

  • Bowles, Carlin, and Subramanyam (open access) study the topics of over 27,000 papers published in the major economics journals in the UK and USA between 1900 and 2014, and find that there has been much increased attention to civil society over that time

Wednesday, 15 January 2025

Lab experimental vs. real-world measures of risky choice

Before Ryan Oprea caused us to question all lab experimental measures of risky choice behaviour (as noted in yesterday's post), one main concern about experiments in the lab was whether they accurately reflected real-world decisions. The news is somewhat mixed, as I've written about before (see here and here). And that is quite aside from concerns that experimental subject pools made up of (usually undergraduate) students are not representative of real-world populations (see here).

Nevertheless, the last somewhat hopeful point in yesterday's post suggested that real-world behaviour might still be consistent with the experimental results (even if we cannot believe the experiments). On that note, I recently read this 2016 article by Arjan Verschoor, Ben D’Exelle, and Borja Perez-Viana (all University of East Anglia), published in the Journal of Economic Behavior and Organization (open access). They compare a measure of risk preferences estimated from a 'lab-in-the-field' experiment among over 800 farmers in rural Uganda, with measures of risk taking based on their agricultural choices.

Verschoor et al. compare two types of decisions. The first they term 'narrow-bracketed', where:

...the decision-maker does not consider their consequences together with the consequences of other decisions)...

So, narrow-bracketed decisions are those that are quite independent of other decisions, and therefore simpler. In contrast, a decision that is interdependent with other decisions, which Verschoor et al. liken to "portfolio management", is more complex.

Comparing the results of the lab experiment with real-world decisions that are narrow-bracketed (a fertiliser purchase decision) and not (the decision on whether or not to grow and sell crops for the market), Verschoor et al. find that:

Controlling for other determinants of risk-taking in agriculture, we find that risk-taking in the experiment is associated with the relatively straightforward investment decision of fertiliser purchase. However, for more involved livelihoods strategies that call not only on willingness to take risks but also on other attributes of entrepreneurship, viz. moving away from subsistence farming to growing crops for the market (measured in two alternative ways), we find no evidence of an association with risk-taking in the experiment. By contrast, a hypothetical willingness to take large-scale risks, elicited through a questionnaire, is associated with both fertiliser purchase and growing crops for the market (however measured), suggesting that this is a better proxy for entrepreneurship broadly defined.

In other words, when the real-world decision was reasonably straightforward, and reasonably independent of other decisions ('narrowly bracketed') the farmers behaved in line with their risk preferences measured in the experiment. However, when the real-world decision was more complex and inter-related with other decisions, there is little association with the risk preferences measured in the experiment. Verschoor et al. note that:

The decision to buy fertiliser is a straightforward investment decision that raises both the expected profit and the spread of possible profits within an existing livelihoods strategy... Decisions to grow cash crops or to grow for the market more broadly, on the other hand, are complex, multi-dimensional decisions that invoke not only risk preferences but also the nebulous notion of entrepreneurship.

It is interesting to think about these results alongside those in the Ryan Oprea research I discussed yesterday. Oprea found that risk preferences consistent with behavioural economics (specifically Prospect Theory) only arose because of the complexity of the experimental task used to measure them. Verschoor et al. note in the discussion of their underlying theoretical model that:

...prospect theory, which only considers changes to wealth relative to a reference level, correspondence between the two domains [risky choices in lab experiments and risky choices in real life] is assured provided both are narrowly bracketed...

Combining the two sets of results (from Verschoor et al. and Oprea), I infer that perhaps narrowly-bracketed decisions, which nevertheless involve complex choices in the lab, may show preferences consistent with Prospect Theory. Real-world decisions that are narrowly bracketed demonstrate similar preferences. An open question is whether the real-world decisions are consistent with Prospect Theory because of the complexity of those decisions. Verschoor et al. don't give us any steer on that (and neither should they, given that their article was published eight years before Oprea's).

When you move beyond narrow bracketing, adding interdependence with other decisions and therefore even more complexity, then decisions are not consistent with the lab-estimated risk preferences. We don't know whether that makes them inconsistent with Prospect Theory, but I expect it does. Does that mean that adding even more complexity makes decision-makers more rational? Or perhaps, more salience of the decision (or higher stakes of the decision) makes them more rational? It's not simply the real-world context, or the fertiliser decisions would also be affected.

Now of course, I am trying to reconcile just two sets of results from a much larger literature here, and probably going too far in doing so. However, it will be interesting to see where the lab experiments vs. real-world research literature goes next.

Read more:

Tuesday, 14 January 2025

Are experimental measures of loss aversion and behaviour under risk just an artefact of complexity?

Loss aversion has been under fire in the economics literature recently (see here and here). As one of the foundations of behavioural economics, this is a big deal. So, I was interested to read this recent paper by Ryan Oprea (University of California, Santa Barbara), published in the journal American Economic Review (ungated earlier version here). Oprea essentially tests the key tenets of Prospect Theory, that when faced with a risky choice such as a lottery, people are risk averse when it comes to gains, but risk seeking when it comes to losses. Oprea's argument is that we observe that behaviour in lottery experiments, not because it is real, but because it is an artefact of the complexity of the lotteries that the research participants are faced with.

Here's what Oprea did:

In each task in our experiment, we elicit subjects’ dollar valuations for a set of 100 “boxes,” each of which contains some dollar amount. For example, in one of our tasks (called G90), we ask subjects to value a set consisting of 90 boxes that each contain $25 and 10 boxes that each contain $0. Acquiring a set of boxes influences the subject’s earnings in the experiment according to a payoff rule, and we compare how subjects value these sets under two contrasting payoff rules.

By opening one of the boxes from the set at random and paying the subject the amount inside, we turn the set into a lottery (i.e., G90 becomes a risky prospect of earning $25 with probability 0.9), and the dollar value the subject attaches to it becomes a certainty equivalent: the certain dollar amount the subject judges to be equivalently valuable to the risky lottery.

Using those results, Oprea replicates the key results from Prospect Theory, which he refers to as the 'fourfold pattern' of risk (a term that actually comes from Kahneman and Tversky), as well as loss aversion. Then:

Our contribution is to compare these valuations to the valuations of what we call “deterministic mirrors” of the same lotteries. A deterministic mirror of a lottery consists of the same set of 100 boxes used to describe the lottery but is characterized by a different payoff rule: instead of paying the dollar amount in one of the 100 boxes selected at random as a lottery does, a mirror pays the sum of the rewards in all of the boxes, weighted by the total number of boxes. Thus, instead of paying $25 with probability 0.9 (as a lottery does), the mirror of G90 pays 0.9 × $25 = $22.50 with certainty.

In other words, the 'deterministic mirror' of a lottery retains all of the complexity associated with the choice, but eliminates all of the risk (because the amount received is certain, rather than risky). So, if the 'fourfold pattern' is real and arises from the riskiness of the lottery, it should disappear in these experiments. Instead, using data from 673 research participants (and with similar results in a second sample of 489 research participants):

...we find that

(i) The fourfold pattern arises in the valuations of deterministic mirrors just as it does in lotteries, and with roughly the same strength. Importantly, this means that we find strong evidence of what is usually called “probability weighting” in settings without probabilities.

(ii) Loss aversion arises in deterministic mirrors even though at the relevant margins they cannot actually produce losses. Thus, we find strong evidence of what is usually called “loss aversion” in settings without risk of loss.

(iii) Across subjects, the severity of each of these anomalies in lotteries is strongly predicted by their severity in deterministic mirrors, suggesting that the behaviors in the two settings are strongly linked, deriving from a common behavioral mechanism (which, clearly, cannot be grounded in risk or risk preferences).

In other words, Oprea finds strong evidence that it is complexity that drives the 'fourfold pattern' of risk in lottery experiments, because when risk is removed (but complexity remains), the 'fourfold pattern' is still there. On top of that, loss aversion remains even when there is no risk of loss. So, loss aversion may also be an artefact of complexity of lottery experiments. Oprea concludes that:

First, theories of risk preferences designed to explain these anomalies (e.g., prospect theory) are unlikely to contain much normative content and therefore should not be accommodated in the inference of welfare or the design of policy. Second, our finding of systematic departures from neoclassical benchmarks in perfectly deterministic settings suggests that many of our descriptive theories of preferences for risk are really descriptive theories of the way people evaluate complex things.

That's a really nice way of saying that behavioural economists may need to reconsider some of their key theories, because the lab experiments they have been using to verify them do not stand up to this scrutiny. And Oprea's results may also help to explain some of the recent anomalies in the loss aversion literature (see here and here).

Oprea's results are important, and even though the working paper version of this article has already been cited over 50 times, I still don't think this research has received the attention that it deserves (and see Eric Crampton's take here). However, it may not be time to throw away behavioural economics or loss aversion entirely. Oprea notes that:

We do not claim, for instance, on the basis of these data that risk preferences or even loss preferences do not exist but only that they are unlikely to be reliably revealed in lottery valuations.

That is an important caveat. Behavioural economists may simply need to find a new way of demonstrating the 'fourfold pattern' of risk, and loss aversion, without resorting to complex lotteries. These effects may still be real. After all, there is a lot of real-world behaviour that is very consistent with loss aversion (see my various posts on that topic here).

Read more:

Monday, 13 January 2025

The £100 pineapple pizza, and conspicuous consumption

The New Zealand Herald reported yesterday (the original story on the Telegraph is here, but behind their paywall):

It is arguably the most divisive culinary combination.

Topping the traditional Italian favourite with pineapple now comes with a hefty price tag at one trendy pizzeria.

Lupa Pizza, in Norwich, is charging customers £100 ($220) for their Hawaiian pizza on food delivery service Deliveroo because they disapprove of the combination so strongly.

Lupa Pizza is demonstrating their knowledge that demand curves slope downwards. As the price increases, the quantity demanded decreases. A high price of £100 is likely to lead to a quantity demanded of zero. That is, no one buys the Hawaiian pizza.

However, the publicity generated by this stunt may perversely lead Lupa Pizza to sell their absurdly priced pizza. I can just imagine some wannabe social media influencer paying £100 for the memes. Any day now. That would be an example of conspicuous consumption - buying a high-price good simply to signal the wannabe influencer's high status as a purchaser (because who, other than someone of high status, would be willing to pay £100 for a pizza? - see here, for more on that point).

I wonder how much Lupa Pizza's owners would complain when they eventually sell a Hawaiian pizza? Probably not as much as the article would have you believe - the profit margin on a £100 pizza is likely to be substantial (even when you factor in the carrying cost of pineapple as an ingredient that they would rarely use, and probably have to run to the store to get if anyone orders the Hawaiian pizza!).

The pizza may even be underpriced. If lots of wannabe influencers start buying the pizza, Lupa Pizza might need to increase the price even further, in order to really price them out of the market. I wonder what the maximum willingness-to-pay for a Hawaiian pizza is among wannabe social media influencers? We may soon find out.

Saturday, 11 January 2025

The irony that is using ChatGPT to write a research paper about students' acceptance of ChatGPT

ChatGPT has been around a while now. One of the things that has been noticed about ChatGPT (and other large language models) is their preference for using rather flowery language. Certainly, they use more elaborate language than you would see in the average research paper. Forbes has a good article on over-used words and phrases in ChatGPT, where it cites:

...foster, delve, dive, landscape, dynamic, embark, realm, vital, transformative, it's important to note, and perhaps the cheesiest phrase of all: "It's a testament to..."

To that list should be added "tapestry", "discourse", "weave", and "mosaic". This Reddit thread from early last year also has some good examples. Once you know what to look for, you start to see these phrases becoming increasingly common.

Why does ChatGPT use those phrases? ChatGPT is trained on the corpus of human writing, which includes academic articles, books (both non-fiction and fiction), and writing on the internet, among other things. If ChatGPT was only trained on academic writing, then its writing style would likely be quite formal and academic. Because ChatGPT is trained on a lot of more creative and informal writing, then its style is much more creative than we would expect to see in an academic research paper.

I think this is quite well known. So, I wasn't surprised when I saw some of those words and phrases used in this 2024 journal article (open access) I read today, published in the journal Innovative Higher Education. Or maybe I should have been surprised, given that the article was about students' acceptance of ChatGPT. To be clear, the article includes the following phrases (emphasis is mine):

However, a marked improvement in AI’s capabilities, particularly in the realm of generative AI, became apparent in last decade... 

One can point to the surge of interest in AI’s educational potential over the last decade, juxtaposed with a landscape marked by exaggerated claims and often inconclusive findings...

This investigation, therefore, situates itself within this rich tapestry of debates, possibilities, and challenges. By delving deeper into the matrix of technology adoption, artificial intelligence, and the nuances presented by applications like ChatGPT, we aim to contribute to a more informed, equitable, and constructive discourse that respects both the transformative potential of AI and the foundational tenets of education in the twenty-first century...

We delve into the determinants that foster or inhibit its acceptance and utilization...

In the realm of higher education the UTAUT2 model is used to identify factors affecting students’ or teachers’ intentions to use different technology tools such as e-learning systems...

Recent research delves into the increasing discourse around the integration and application of AI-driven tools in education...

Our investigation pivots to a different facet of the educational journey, centering on students’ acceptance and utilization of generative AI utilities. This research’s value lies in shedding light on determinants shaping the embrace of such AI-driven tools.

There are other examples I could cite as well, but you get the idea. There is a certain vibe about those sentences, and that vibe is ChatGPT.

Perhaps I'm being unfair. Maybe some academics really do use those phrases. I checked this author's other publications, and in this 2024 article (also on students' acceptance of ChatGPT, but with a co-author this time), they also use "delve" (the third word of the abstract is "delves"), "foster" (also in the abstract), "discourse", and "realm", but at least there are no tapestries in sight. So, perhaps these words and phrases really are a key part of their vocabulary. However, if I go back to the author's pre-ChatGPT articles, like this one, and this one, there is no delving, weaving, or fostering, or any tapestries, landscapes or discourses. Now, our writing styles can change over time. But this change of writing style towards the ChatGPT style, at just the time that ChatGPT becomes available, is pretty fishy.

Having delved into the rich tapestry of this author's publications, there is multifaceted evidence that fosters a view that at least some parts of their papers are now being written by ChatGPT. Which, given the topic, is kind of ironic. [*] Especially when 'they' write that:

The role of ChatGPT in academic authorship raises concerns about integrity and recognition of contribution.

Indeed.

*****

[*] To be fair to the author, it isn't prohibited to use ChatGPT or other large language models in writing up research papers. However, like using research assistants, it is good practice to acknowledge them. And increasingly, journal editorial policies are mandating that the use of these tools is disclosed. I'm also not saying that their analysis is tainted by the use of ChatGPT. It is tainted by their use of structural equation modelling, but that is altogether a different story for another time.

Friday, 10 January 2025

This week in research #57

Here's what caught my eye in research over the past week:

  • Anderson et al. (with ungated earlier version here) look at whether the world’s best tennis pros play Nash equilibrium mixed strategies, finding that for most elite pro servers, a best-response strategy significantly increases their win probability relative to the mixed strategies they actually use
  • Aucejo and Wong conducted a randomized controlled trial involving approximately 3,000 students across 39 introductory economics classes at Arizona State University, and find that personalised feedback messages that were tailored using information from students’ initial academic performance and surveys completed at the beginning of the semester benefit first-generation students in synchronous classes significantly, while no such effects are observed in asynchronous classes
  • Roupakias and Chletsos (open access) find that the proportion of immigrants in a country is positively associated with the country's 'economic fitness', as measured by the World Bank's Economic Fitness index
  • Cawley et al. (with ungated earlier version here) investigate whether the Jared Fogle scandal in 2015 had any effect on patronage at Subway restaurants, and find no significant effect
  • Mohammed assesses the impact of Airbnb’s website redesign policy, which delayed the exposure of host profile photos, on racial discrimination, and finds a negligible reduction

Thursday, 9 January 2025

The impact of Fox News on American democracy

In yesterday's post, I noted a number of opportunities for research on the economics of social media. At least one of those opportunities intersected with the impact of traditional media. So, I was interested to read this new article by Elliott Ash, Sergio Galletta, Matteo Pinna (all ETH Zurich), and Christopher Warshaw (George Washington University), published in the Journal of Public Economics (open access). They look at the impact of Fox News Channel on political ideology of voters, and voting outcomes in the US. Given the well-known right-wing nature of Fox News, it is reasonable to wonder whether it is having a political impact.

Identifying the causal impact of a television channel is somewhat challenging, because where viewership is higher that might be because there are more right-leaning voters. So, the causality might run from voters to viewership, rather than the other way around (reverse causality). However, Ash et al. make use of the fact that the channel number assigned to Fox News varies across markets, and that assignment is random (or, at least, it isn't related to the partisanship of the population in a particular television market). So, Ash et al. use channel position as an instrument for viewership (I'll come back to this point later). They then look at the political preferences of voters using data from:

...the 2000 and 2004 National Annenberg Election Survey (NAES) and the 2006–2020 Cooperative Congressional Election Study (CCES) surveys. Overall, we have data on the preferences of approximately 661,000 Americans.

They also look at the effect on presidential and down-ballot (senate, gubernatorial, and house) elections using a variety of election data sources. They find that:

...from 2006–2008 onwards, a lower Fox News Channel (FNC) position correlates with an increase in self-identified Republican viewers, significant at the 10% level initially and 5% in recent years. A one-standard-deviation drop in FNC’s channel position corresponds to a roughly one-percentage point rise in Republican self-identification... A one-standard-deviation decrease in FNC’s channel leads the average American’s ideological position to shift .03-.04 standard deviations to the right in recent years...

Looking across time, the results are not statistically significant (at the 5 percent level) until 2009-2012, or later, depending on the measure. Turning to presidential elections, Ash et al. find that:

Initially, in the 2000 and 2004 elections, FNC’s impact was minimal, likely due to its growing viewership. By 2008, a one-standard-deviation decrease in FNC’s channel position correlated with a 0.32 increase in the Republican vote share...

And on down-ballot elections:

Overall, the effects of FNC in down-ballot elections are qualitatively similar to, though less precise than, those in presidential elections. In House elections, we see a positive (Pro-Republican) coefficient on FNC for Republicans’ two-party vote share in 2012, and it becomes statistically significant starting 2018. Since around 2012, counties with one-standard-deviation lower FNC’s channel position have about .6 to 1.6 percentage point greater shares in Republican vote.

In Senate elections, we find a positive coefficient starting in 2006, which becomes statistically significant starting in 2014.23 Since then, there have been relatively consistent year-to-year effects of around .6 to .73 percentage points. In other words, a one standard deviation shift in FNC’s channel position increases Republican Senate candidates’ vote share by over half a percentage point.

In gubernatorial races, FNC had a small and statistically insignificant effect until the latter half of the 2010s. In recent elections, however, the effect of FNC in gubernatorial races is similar to Senate races — with a Republican vote share about .5 percentage points higher...

So, it is clear from the results that Fox News Channel is driving a rightward shift in the voting public, and this is having a clear effect on both presidential and down-ballot elections. Ash et al. conclude that:

Given the estimated effect sizes on presidential elections, for example, Fox News could have easily tipped the scales for Donald Trump in 2016.

Perhaps Donald Trump should think himself lucky that his 2022 feud with Fox News didn't escalate too much?

However, there is some reason for scepticism about these results. Although Ash et al. make a good case for their use of channel position as an instrument for viewership, it turns out that they didn't actually run a full instrumental variables analysis:

Ideally, we would provide first-stage and two-stage least-squares (2SLS) results for all years in our analysis. However, there are limitations in estimating and interpreting the 2SLS results. First, we only have data on both the endogenous regressor (FNC ratings) and channel positions for 2005, 2006, 2008, and 2020. Second, the first stage provides evidence that the typical variation induced by the channel positioning is somewhat limited, suggesting that the size of the 2SLS coefficients would be, by construction, unreasonable.

...our main analysis focuses on the reduced form, where the outcomes (e.g., vote shares) are regressed directly on the instrument (FNC channel position)...

So, while they have a good instrument, a lack of data prevents them from making the most of it. And so, I think we would need further evidence before we can conclude that these results demonstrate a causal effect of Fox News Channel on political outcomes in the US. It seems to me that the main holdup to doing a full instrumental variables analysis is not having all the viewership data from Nielsen. So perhaps some wealthy research institution needs to buy that data?

Wednesday, 8 January 2025

The economics of social media and opportunities for future research

There has been an explosion in research into social media over the last decade, as you might expect. In a new article published in the Journal of Economic Literature (ungated earlier version here), Guy Aridor (Northwestern University), Rafael Jiménez-Durán (Bocconi University), Ro’ee Levy (Tel Aviv University), and Lena Song (University of Illinois Urbana–Champaign) document the trend. Here's Panel A of Figure 1 from the paper:

The increase in published research is clear (the "General interest publications" they refer to are more-or-less the top general journals in economics). Aridor et al.'s article reviews this large and growing literature, highlighting what is already known about the economics of social media, as well as where they see gaps suitable for future research. The focus of the review is often narrowed to a case study related to social media and politics, which while important in itself, is not the only important factor associated with social media. So, there may be additional research opportunities outside of those that are identified in the review.

Aridor et al. define social media as "two-sided platforms that primarily host user-generated content distributed via algorithms, while allowing for interactions among users". They then structure their review to follow the "flow of content" in social media, starting with content production (generated by users), then content distribution (including both 'organic content' and advertising, distributed by platforms using algorithms), and finally content consumption (again, by users).

The review is difficult to summarise, so if you are interested, I encourage you to read it in detail. In this post, I just want to highlight some of the research opportunities in this space (but not all of the opportunities that Aridor et al. describe - I'll focus on what I think are the more important or interesting ones).

First, in terms of content production, Aridor et al. note that:

...content creation is increasingly viewed as a viable career or income source... A natural next step to the existing evidence on the elasticity of the content supply curve concerns the labor economics of this activity, studying questions such as the effects of unions for content creators or whether monetary incentives crowd out nonmonetary motives. Beyond ad revenue–sharing programs, other monetary incentives that have been increasingly used by platforms (for example, allowing users to subscribe to producers) remain understudied, perhaps due to missing data.

Also:

More research is needed on how social media algorithms could affect the production of other types of content beyond news. For example, it has been argued anecdotally that TikTok is driving songwriters to focus on brief, danceable 15-second snippets.

In general, TikTok has probably been under-studied relative to other platforms, particularly given its large user base. Turning to misinformation, Aridor et al. note that:

The literature has mostly focused on the sharing, as opposed to the production, of misinformation. One potential reason for this imbalance is the role of resharing in diffusing misinformation... An important gap in this literature is to understand the determinants of the production of misinformation, beyond the sharing of existing articles.

And on fact-checking (which may now be more difficult to research given that Meta is giving up on fact checking):

One gap in this literature is to disentangle a potential dual role of fact-checking interventions, which affect not only the users’ perceived veracity of the content they are about to share but also the perceived likelihood that they will be fact-checked by the platform in the future.

Aridor et al. then turn to sanctions on users, and note that:

Besides crowd-sourcing, platforms conduct other content moderation measures at scale, such as down-ranking or removing posts, banning groups, and suspending user accounts... However, more research is needed to understand the causal effects of these “harder” interventions on the production of misinformation and the mechanisms through which they operate, whether they crowd out fact-checking efforts by the users, and the net welfare effect of sanctions.

And on counter-speech to reduce the toxicity on platforms:

An open question is what determines the equilibrium provision of counterspeech and how to incentivize users to provide this public good (similarly to fact-checking).

Finally on content production, Aridor et al. note that:

More evidence is needed on the connection between content moderation and advertising. Specifically, there is limited research examining how content moderation influences advertisers and, conversely, how advertising dynamics influence content moderation decisions... more evidence is needed to understand the effect of hate speech and other types of content on user interactions with advertisements and whether content moderation policies can alleviate any potential negative effects.

There seem to be a number of promising research opportunities in the content distribution space (and that likely primarily relates to a lack of access to the algorithms, which is why these questions haven't already been answered). Specifically, Aridor et al. note that:

Future research could examine what drives demand for social media content and when and to what extent content consumption is driven by algorithms. 

And importantly:

An open question is how social media algorithms can optimally increase social welfare and what government incentives can encourage them to do so.

That last question is a big one, with huge potential for policy impact. I would be very surprised if there weren't already concerted research efforts in that space. Also:

...more research is needed on how users decide which pages to follow (for example, the accounts of media outlets or politicians), since those pages may be driving segregation.

Then, turning attention back to advertising, Aridor et al. note that:

Economic analysis of advertising... posits that advertising primarily works through the following channels: shifting beliefs through information (for example, product awareness, attribute information) or directly shifting consumer preferences (for example, increasing affinity to the brand)...

Empirical research suggests several unique aspects of social media advertising where these mechanisms interact with the “social” aspect of social media.... more work is needed to understand the relative role of each of the different mechanisms for ad effectiveness and how they interact with both the increased targeting abilities and the social aspect of social media advertising.

 And finally:

...such a large amount of money is spent on digital advertising in national election campaigns is puzzling and deserves additional research, since either these ads are more effective than current research indicates or researchers are wrongly inferring the objectives that campaigns pursue with these ads (for example, fundraising rather than voter persuasion).

That question relates to some of my own earlier (and most cited) research on the impact (or rather, non-impact) of the number of social media followers on politicians' chances of being elected (see here, or here for an earlier ungated version). Aridor et al. also note that:

One direction for future work, even for studying these issues within the United States, is to better understand the impact of these types of ads in local elections, which is precisely where we may expect that social media advertising could have a larger impact.

I can say that in some follow-up work on local elections in New Zealand, which we haven't published, there was no significant effect of social media (which didn't surprise us at all, given the earlier results in national elections I pointed to above).

Aridor et al. then discuss opportunities in researching content consumption, and note that:

There are several avenues for future research. First, given the complex and fast-evolving nature of social media consumption, descriptive evidence detailing consumption behavior would be valuable... Second, while existing research highlights self-control problems among American adults, it is policy relevant to quantify the extent of these problems in the younger population. Finally, future work could look within platforms and quantify how different design features influence what and how users consume. Certain features (for example, content format or algorithms) may exacerbate self-control problems... Defining the key product characteristics and quantifying their effect on consumer choice is an important step forward in understanding the welfare implications of consumption.

And importantly:

There is convincing evidence that social media use—particularly the exposure to toxic content—can lead to offline hate crimes...

There is also some evidence that government regulation akin to a Pigouvian tax can mitigate this externality... Additional research is needed to study whether these policies have unintended consequences such as the silencing of political dissidents. Lastly, future research could explore how content moderation affects other offline harmful actions besides violence (for example, self-harm)...

And finally:

There are several interesting directions for future work. The first is that given the large informational externalities from consumption... an unexplored question is not only to measure market power in terms of time spent, but also to think of media power as Prat (2018) does for traditional media. Of particular interest is understanding whether social media increases or decreases the media power of existing large media organizations. The second is to explore the implications of habit formation... for competition among social media platforms.

The overall takeaway from this review is that, while there is a large (and quickly growing) literature on the economics of social media, there is still substantial scope for future research that will have real policy and practical impact. I look forward to seeing many of these research questions addressed in the near future.

Tuesday, 7 January 2025

How could a university create successful peer activities for students?

There is a robust literature on the effects of student peers on learning (see for example this 2011 review by Bruce Sacerdote (ungated version here)). If having successful peers can contribute to a student's success, then it may be attractive for universities to try and set up peer groups. There are various ways to do this, including early group work in classes, social events, and so on. It would be fair to say that the success of these attempts has been mixed.

So, I was interested to read this 2020 article by Thomas Fischer (Lund University) and Johannes Rode (Technische Universität Darmstadt), published in the Journal of Economic Behavior and Organization (ungated earlier version here). To be fair, what attracted me to this article was the question in the title: are persistent peer relationships are formed in the classroom or in the pub? It turns out that isn't really the focus of the paper at all, and that comparison is only of secondary interest to the story in the article.

What Fischer and Rode did was look at the peer effects for industrial engineering students at Technische Universität Darmstadt, which is somewhat of a different learning environment (compared to universities in the US, UK, or New Zealand). As they explain:

It is important to acknowledge that there is little interaction between students in our setting as compared to the Anglo-Saxon case mostly covered in the literature... This is not only a specific case of the institution TU Darmstadt and the study field, but more generally of the German university system. There is no pronounced campus live [sic], nor is there forced interaction in academic studies.

TU Darmstadt is not a campus university. This means above all that students do not usually live in university-provided housing...

The curriculum is built entirely around individual study results that are tested in the form of written exams and thus does not enforce interaction. Even the presence in lectures or lab sessions is not enforced. In fact, the growing online platforms make it more and more obsolete to attend physical meetings.

What this setting ensures (as much as is possible) is that there is unlikely to be any pre-existing peer groups between students, because their education is very individualistic (no group work) and doesn't involve interaction with their peers (inside or outside of class). Fischer and Rode then look at what happens after peer interaction is introduced, in two ways. First:

There is one exception from the focus on non-team work courses. A mandatory group work course labeled Projekt im Bachelor (engl. Project at Bachelor level) is part of the curriculum. The group work course intends to help students build soft skills and gain hands-on experience as group work is essential in future professional life. During the group work course, students have to come up with a business plan for some novel technological idea covering various aspects of marketing, budgeting, and legal implementation. After one week of intensive group work, all groups have to deliver a final report and pitch their results in front of a jury of professors and professionals. The task is deliberately designed with time pressure to induce cooperation between group members. Students are randomly assigned into groups...

Fischer and Rode then look at the effect of the nearest-peer's quality (in terms of pre-university grades) in the same major within these groups (which are quite large groups, of 11 to 13 students) for each student on their own grades in later courses. In other words, they look at the lasting effect of the nearest-peer on later grades. They find that an: 

...increase in the peer quality by one standard deviation (0.5 grade steps) increases the performance of the individual by −0.17 standard deviations (equivalent to a grade step of approx. −0.1). Note that in the German grade system 1.0 is the best grade and 5.0 the worst grade.

Note that last sentence. The negative effect here is an improvement in grades. Having a better nearest-peer significantly improves a student's grades. In contrast, the average quality of the other group members does not have a statistically significant effect on a student's grades. Fischer and Rode attribute this to students having a preference to match up with students who are similar to themselves in terms of academic quality. These peer effects persist into the future because these students continue to work together. And Fischer and Rode document statistically significant peer effects three semesters, five semesters, and seven semesters after the group work project.

Why would students prefer to continue working with other students at a similar level of academic quality? Fischer and Rode use an elaborate mathematical model to explain, but I have a much simpler explanation. If students have to work in pairs, and working with a student of higher quality improves Student A's grade, but working with a student of lower quality improves their grade by less (or decreases their grade), then Student A will want to work with the best student. The top students will pair up together, leaving the next-top students to pair up, and so on down the chain. Essentially, we end up with assortative matching of students into groups. And if working together improves grades, even outside of a group-work setting, those students will continue to work together.

Fischer and Rode then look at a second setting where peer interaction was introduced:

When students enter university, they are invited to an orientation week (OW), which takes place in the very first week of studies... Students are randomly assigned to other students taking the same major with an average group size of eight students to spend one week together. During this week students are given an introduction to the university by senior students. The students’ union organizes the event... The week usually culminates in a big party and includes several gatherings in pubs.

Given that individuals start very likely without social networks in this new setting, this is the very first opportunity in which they can make new contacts.

The orientation week activity is voluntary, but has about 80 percent participation. Fischer and Rode repeat their analysis looking at students matched up in the orientation week groups, and find:

Neither the next-best peer effect... nor the standard mean peer effect... are significant...

They attribute these null results to:

During the social gathering individuals learn little if anything about the academic ability of their potential peers. Nevertheless, they might decide to form learning groups. Once these learning groups are formed, the true ability will be shown also revealing mismatches... As such, these unbalanced learning groups may not last long. It is likely that individuals match with others of similar ability they get to know in other settings such as exercise sessions.

So, since these students cannot easily match up with similar peers in the orientation week exercise, they don't maintain those peer connections and the peer effect doesn't exist. Notice also that this orientation activity happens right at the beginning of the students' university career. Some students will drop out, which will also make it difficult to maintain peer connections.

There are two important learnings from this research for universities looking to generate peer effects. It will likely be more effective to generate these effects through classroom activities, not through social events. And, a focus on peer activities that are too early in the students' university journey is likely to be less effective than peer activities that occur a bit later. In other words, second-year group work will likely have a bigger effect than first-year group work. That is important to know.

Monday, 6 January 2025

The practical problem of Pigovian alcohol beverage taxes

I just finished reading this 2022 article by Preety Srivastava (RMIT University), Ou Yang (University of Melbourne), and Xueyan Zhao (Monash University), published in the journal Economic Record (open access), which had been sitting in my (virtual) to-be-read pile for far too long. They look at the negative consequences of consuming different alcoholic beverages in Australia, and what that implies for the taxation of different beverage types.

First, Srivastava et al. make a good case for why alcohol is taxed:

The economic argument for alcohol tax is the need for correction of market failures and negative externalities that are associated with alcohol consumption...

...there are many reasons to believe that serious market failure exists in alcohol consumption, and the scale of alcohol abuse we observe in many societies is a testament to this. One example of market failure is the incomplete information regarding the long-term health impact and addictive nature of alcohol consumption in binge drinkers’ private decision-making on consumption. Another example is the use of incorrect discount rates to future harms due to the problem of willpower. More importantly, significant external costs of excessive drinking are borne by society. These include health-care costs of alcohol abuse in publicly funded health systems (such as in Australia), road accidents from drink-driving, and antisocial behaviours when intoxicated, including public nuisances, damage to and stealing of property, and physical and verbal abuse of family members and the wider community.

An excise tax, such as the one that is imposed on alcohol, is therefore an example of a Pigovian tax (named after Arthur Pigou), because one of the purposes of the tax (aside from generating revenue for the government), is to correct for the negative externality (because a tax will reduce the quantity consumed - see here, for example). Now, ideally:

...alcohol tax should be targeted at those excessive consumers whose consumption creates negative external costs, and not at moderate consumers who do not generate such negative costs to others. However, ‘excessive consumption’ is hard to measure, and taxing directly by consumer type is difficult to implement due to potential ethical and privacy issues in obtaining the required information.

So, while the government should be aiming to tax consumers based on the amount of negative externality they generate, that isn't possible because the government can't easily determine who the 'excessive drinkers', who generate the most negative externalities are. So, a general alcohol tax might be applied to all alcoholic beverages. The consequence is that:

A general alcohol tax applied to all drink types will reduce consumption for all consumers, achieving an efficiency gain for consumers with excessive consumption. However, this will also result in efficiency loss for consumers with low to moderate levels of consumption who have already accounted for all negative impacts as private costs in their consumption decision-making. Given that taxing heterogeneous consumers is less feasible, taxing the products that are more likely to be associated with negative external costs or consumed by individuals who are more likely to be involved in risky and abusive behaviours would seem to be a more feasible and efficient approach.

It turns out that is what Australia (and to a lesser extent New Zealand) does. But ineffectively, as we will see. In their paper, Srivastava et al. look at the relationship between consumption of different beverage types and various antisocial behaviours. They use data from the National Drug Strategy Household Survey (NDSHS) between 2004 and 2019, which included over 113,000 people (after excluding abstainers, who don't drink). The survey asks respondents whether they engaged in a range of activities "while under the influence of or affected by alcohol". Srivastava et al. focus on a subset of these behaviours, being those likely to result in negative externalities:

In this study, we focus on the following eight antisocial and unlawful behaviours under the influence of alcohol: (1) driving a motor vehicle; (2) operating a boat; (3) operating hazardous machinery; (4) creating a public disturbance or nuisance; (5) causing damage to property; (6) stealing money, goods or property; (7) verbally abusing someone; (8) physically abusing someone.

They look at drink-driving as one category, and merge all of the others into a separate category that they call "hazardous, disturbing or abusive behaviour" or HDA. One thing that isn't clear from the paper is exactly how the HDA measure is constructed (presumably, it is just whether the respondent engaged in any of the seven behaviours while affected by alcohol). In terms of the different alcoholic beverages, the NDSHS:

...has several questions on individuals’ consumption of various drink types. One of the questions relates to their drinking preferences: ‘What types of alcohol do you usually drink? (Mark all the types of drinks that apply)’. We use this information to construct ten dichotomous variables to indicate respondents’ usual drinking preferences. These binary indicator variables are respectively regular-strength beer (RSB), middle strength beer (MSB), low-strength beer (LSB), cask wine (CW), bottled wine (BW), fortified wine (FW), pre-mixed spirits in a can (PMSC), pre-mixed spirits in a bottle (PMSB), bottled spirits and liqueurs (BS), and other alcohol (Other).

That's a lot of acronyms. And you have to pay attention to them if you're reading the paper, because Srivastava et al. keep using them, and don't usually remind you what they refer to (which was a bit exhausting!). 

Now, obviously there are some endogeneity problems in a regression of alcohol harms on beverage consumption, because some of the variables that affect beverage consumption (like demographics - for example, younger drinkers have different drink preferences than older drinkers) also affect alcohol harm. This would bias any estimate of the effect of beverages on harm. Srivastava et al. mitigate this problem by using a Lewbel instrument (see here as well) - the price of beverages should affect beverage consumption, but not directly affect harm, so prices make a good instrument. Essentially, they replace beverage consumption in their main model with estimated beverage consumption based on prices, which won't have the same endogeneity problem. They also run models that don't use the instrument (but which will likely be affected by endogeneity).

Aside from endogeneity, there is one problem with the analysis. Looking at Table 1 in the paper, it is clear that the main dependent variables (drink driving, and HDA) are trended downwards over time - there is lower prevalence of all of these behaviours over time. The alcohol beverage consumption data also appears to be trended over time (from Table 2 in the paper). Srinistava et al. adjust all the prices to 2011/12 dollars, so the most obvious source of trending in the prices (inflation) is removed. However, when running a regression model with trended data in the dependent and explanatory variables, there is a risk of spurious correlation (which Tyler Vigen still does the best job of illustrating). That will clearly be a problem in the models without the Lewbel instruments and it isn't clear to me whether the instrument solves this problem. Srinistava et al. don't mention using time fixed effects or corrections for serial autocorrelation in their models, which would go some way towards mitigating this problem. Anyway, enough pointy-headedness.

Unsurprisingly, Srivastava et al. find that harms differ by beverage type. For drink-driving (DD - yes, another acronym):

RSB, PMSC, MSB, BW, CW and BS are all associated with a higher probability of DD, while LSB, PMSB, FW and Other are related to negative or insignificant effects on the probability of DD. Specifically, in terms of ranking, RSB has the highest positive impact, and is shown to be linked to a 6.5–9.5 percentage points higher probability of DD... Interestingly, CW and BS, the drinks that have drawn much attention in the tax debate, although having a positive impact on DD, both rank behind MSB, with a 1.7–3.3 percentage points higher probability of DD.

And for hazardous, disturbing, or abusive behaviour (HDA):

...RSB, PMSC and CW are ranked as the top three drinks that relate to the highest MEs [marginal effects, yet another acronym] for increasing the probability of HDA behaviours from all three models. CW currently has the lowest tax per LAL across all beverages, which has long been the focus of tax reform discussions...

Finally, LSB, BW and FW are shown to have negative MEs on the probability of HDA across all three models, with LSB having the largest negative effect. A noticeable result is for BW. In contrast to the results for DD, drinking BW is linked to a lower probability of HDA behaviours.

The takeaway from all this analysis is that different beverage types should have different tax rates, with those that are associated with the greatest harm having the highest tax rates. That would mean taxing regular strength beer, pre-mixed spirits in a can, and probably cask wine the most. Unfortunately, that's not at all what Australia does:

When converted to an effective rate per litre of alcohol (LAL), based on the 2007/08 data, the volumetric tax rates vary greatly by beverage... with cask wine paying effectively $3/LAL, bottled wines $14–$33/LAL by prices, beers $19–$31/LAL by alcohol strength, ready-to-drink (RTD) pre-mixed spirits $41–$43/LAL, and straight spirits $66/LAL.

So, while cask wine should be among the beverage types with the highest tax rate, it is taxed the lowest. As Yang and Srinistava note in this article in The Conversation about their paper, the Australian alcohol tax system is "incoherent". It is entirely correct for Australia to charge different tax rates for different beverage types, but those tax rates don't bear appear to bear any relationship to the harms arising from the different beverages. New Zealand's excise system is significantly simpler than Australia's, but again it is unlikely that it bears much resemblance to an optimal set of tax rates across different beverage types. To work efficiently in internalising the negative externalities, Pigovian taxes need to reflect the value of the externalities that they are meant to be correcting. We don't have that now, and some more investigation is needed so that we might hope to in the future.

Sunday, 5 January 2025

Robert Bray's insights on using generative AI in teaching

I think we are barely scratching the surface on the ways that we can use generative AI for innovative teaching. However, new and exciting applications are increasingly surfacing. For example, I really enjoyed this new and reflective paper by Robert Bray (Northwestern University). Bray discusses how he has used large language models in the teaching of data analytics, but I think the examples can be adapted to work well in many other disciplines.

I recommend reading Bray's paper in its entirety (but skipping over the specifics of the R programming if that isn't your thing). However, there are a few key bits that I want to highlight here. Most importantly, generative AI is not something that teachers can ignore. This shouldn't need to be said, but I suspect that too many of my colleagues are trying to avoid engaging with it, or doing so in surface ways, or trying to play 'whack-a-mole' with their approaches to assessment in order to limit students' opportunities to cheat using generative AI. Incorporating AI into teaching is also going to change the way that teaching and learning needs to be framed. Bray notes that:

The most important new work introduced by AI is learning how to use AI.

I explain to my students that they can think of my R instruction as a pretext for the real lesson of the course, which is learning how to leverage AI. If you’re a regular ChatGPT user, you may wonder what there is to learn, as conversing with ChatGPT is so natural. Well, some people are natural in front of a camera, some are natural in front of an audience, and some are natural in front of an LLM. If you’re in this third category, count yourself lucky, because most people are not. Most people must explicitly learn how to use LLMs.

Using generative AI effectively is an important transferable skill. Our students will benefit in the labour market if we can get them interacting with generative AI and learning key skills in prompting and collaborating with generative AI tools. Bray gives many examples of how courses can include meaningful interactions with generative AI, but I particularly liked two of them: (1) turning homework into an AI tutoring session using an 'AI assistant'; (2) using AI to engage in learning by teaching.

On turning homework into an AI tutoring session, Bray notes that:

LLMs give rise to a new homework modality: the AI tutoring session. Rather than save a homework as a PDF or a Canvas assignment, you can embed it in an AI assistant that walks students through the assignment, like a tutor would... For example, I asked students to collaborate with a custom-made GPT on a set of study questions before each class in 2024. Students would “submit” these assignments by sending the grader a link to the chat transcripts.

This is definitely something that I have been considering for my classes, particularly for students who cannot attend on-campus tutorials. Bray gives fairly detailed instructions on how to set up ChatGPT as an AI assistant, and notes that students actually preferred the AI tutoring sessions to regular homework. However, there is a downside to this - not for the lecturer, but for the human tutoring team. Bray notes that:

I hired four tutors for my class in 2022 to help students work through the labs and master the R syntax. I didn’t hire any tutors in 2023, however, because I wanted my students to practice querying ChatGPT.

Bray also notes that:

ChatGPT is an ideal tutor: It provides immediate, thoughtful, and voluminous feedback on any topic; it has infinite patience and is incapable of scrutiny; and it’s superior at parsing and correcting sloppy code. In fact, ChatGPT’s one-on-one instruction is so good that my office-hours attendance fell from around six per week in 2022 to about three per quarter in 2023. And the textbook I wrote is even more obsolete: hardly any students download their free copy.

The days of humans tutoring other humans may well be numbered (and that number is rather small). I will be sorry to see the end of human tutors, and the negative here is that tutoring provided a key means for top students to signal their combined technical and interpersonal skills to employers. The loss of that signal will clearly make those students worse off.

On learning by teaching, Bray notes that:

ChatGPT gives you a superpower: the ability to turn students into teachers... The best way to learn something is to teach it to someone else, but before now, there was no easy way to flip the roles during class and cast students as teachers. AI gives us three new techniques for doing so. First, you can use the chatbot to role play as a student: Give a lesson to the class and have students teach the AI what they learned. Second, you can use the chatbot to parallelize assessment: Have all students propose solutions to a problem and use AI to identify the students who should share their answers with the class. And third, you can use the chatbot to parallelize instruction: Have students learn different material with different chatbots and then reconvene to teach each other what they learned.

Sadly, this again highlights what is lost when we give up human tutoring. The tutors benefit from improving their skills. However, all is not lost since they may still benefit from learning by teaching if it is incorporated into classes.

Finally, we may worry that generative AI will make our papers too 'easy', and reduce our ability to distinguish between good students and not-so-good students. Bray notes early in the paper that introducing generative AI into his courses didn't much affect the grade distribution. He offers a number of potential explanations:

Several factors muted ChatGPT’s effect on grades. First, the AI didn’t convert incorrect answers into correct answers as frequently as it converted egregiously incorrect answers into slightly incorrect answers. And since all wrong answers yielded zero points, most ChatGPT improvements didn’t increase student scores...

Second, correct AI code wouldn’t always translate into correct answers...

Third, whereas my pre-AI students learned to code, my post-AI students learned to code with ChatGPT, an entirely different proposition. Like second-generation immigrants who understand but can’t speak the mother tongue, my post-AI students could read but not write R code unassisted. Accordingly, most of my students were at the chatbot’s mercy...

Fourth, offloading the low-level details to a chatbot may have compromised the students’ high-level understanding... Several students expressed regret, in their course evaluations, for outsourcing so much of the thinking to the chatbot:

Since ChatGPT did most of the heavy lifting, I feel like I didn’t learn as much as I wanted. Especially in data–analytics.

Because we relied so heavily on ChatGPT—I truly don’t know what a lot of R even means or what I would use to complete tasks. As well, it was hard to stay engaged.

It was occasionally the case that I would mindlessly complete the quiz without fully knowing what I was doing due to the time constraint, but I got away with it since ChatGPT is so good at coding. If there is a way to effectively force students to think about how to use ChatGPT rather than simply pasting prompts, then that could prove more impactful...

Fifth, students often developed tunnel vision because crafting GPT prompts would command their undivided attention. Indeed, the students largely ignored the template solutions we covered in class, opting to spend their limited quiz time conversing with the chatbot rather than perusing their notes...

Sixth, echoing the Peltzman Effect, students used ChatGPT to improve their performance and to decrease their study time. The reported weekly study time in the compulsory and elective sections fell from an average of 3.88 and 4.85 hours in 2022 to an average of 2.62 and 3.57 hours in 2023 (the former drop is statistically insignificant, but the latter is statistically significant at the p = 0.01 level). Furthermore, 22% of students reported not studying for quizzes, which would have been inconceivable in 2022.

Those explanations are quite convincing, but the last one in particular (the Peltzman effect) is one that I have noticed over many years. Whenever I introduce some new innovation into the teaching of one of my papers, which I expect to make students' studying experience easier and more effective, the result is that many students spend less time on my paper and re-distribute their scarce studying time to their other papers that they are now finding more difficult (in relative terms). It is quite a dispiriting experience, but entirely understandable from the students' perspective. They have limited time available to spend on studying, so at the margin their next hour of study time is best spent where it will generate the greatest gains. That tends to be in the paper that is more difficult, rather than the one that is easier.

What we can take away from Bray's grade distributions is that we can't necessarily expect big learning gains from incorporating AI into teaching. So, why should we do it? Aside from wanting to set our students up with important transferable skills (as noted above), Bray notes in the conclusion to the paper that:

Simply put, ChatGPT made investing in my class fun again. AI allowed me to do things that had never before been done in the classroom. I got hooked on finding AI-empowered teaching innovations.

I want my teaching to be fun. Not just for the students, but for me.

[HT: Marginal Revolution]

Read more: