Saturday, 23 November 2024

Jared Cooney Horvath on how generative AI could harm learning

In a post last month about generative AI, I expressed some scepticism towards those among my colleagues who are trying to integrate generative AI into assessment (an "if you can't beat them, join them" solution to the impact of generative AI on assessment). I also expressed some hope that generative AI can be used in sensible ways to assist in student learning. Both of those views are contested. They certainly are not universally held among teachers.

In a recent article on the Harvard Business Publishing website, Jared Cooney Horvath outlines three critical problems generative AI poses for learning: (1) AI tools lack empathy, and an empathetic learner-teacher relationship is a strong contributor to learning; (2) while AI tools are good at retrieving information, in so doing they make having internal knowledge less important for students, and yet it is a broad internal knowledge that helps us to understand and solve problems; and (3) generative AI encourages multitasking, which is bad for learning.

On the latter point, Horvath concludes that:

It’s not that computers can’t be used for learning; it’s that they so often aren’t used for learning that whenever we attempt to shoehorn this function in, we place a very large (and unnecessary) obstacle between the learner and the desired outcome—one many struggle to overcome.

Finally, Horvath notes one positive for generative AI and learning:

There is one area of learning where generative AI may prove beneficial: cognitive offloading. This is a process whereby people employ an external tool to manage “grunt work” that would otherwise sap cognitive energy.

However, as noted above, when novices try to offload memorization and organization, learning is impaired, the emergence of higher-order thinking skills is stifled, and without deep-knowledge and skill, they’re unable to adequately vet outputs.

Experienced learners or experts can benefit from cognitive offloading. Imagine a mathematician using a calculator to avoid arithmetic, an event planner using a digital calendar to organize a busy conference schedule, or a lawyer using a digital index to alphabetize case files. In each of these scenarios, the individual has the requisite knowledge and skill to ensure the output meaningfully matches the desired outcome.

Horvath hasn't really changed my views on generative AI and learning. He does give some food for thought though, especially in relation to the value of created a finetuned AI designed to help with a particular course. If students use it as an interactive tutor, to help them develop their internal knowledge, then it is likely positive. However, if they use it purely to ask contingent questions, it may impair their ability to develop that internal knowledge and make them worse off. I wonder if there are particular learning tasks that can be used to encourage the former behaviour without too many students resorting to the latter? Clearly I have more thinking to do on this before I roll something like that out for my students.

{HT: Mary Low]

Read more:

Friday, 22 November 2024

This week in research #50

As I mentioned, last week I was at the North American Regional Science Congress in New Orleans. This isn't a science conference per se. Regional science is essentially a mix of economics, geography, sociology, and political science (and a bunch of other fields mixed in as well). As is often the case, there were more sessions that I wanted to attend than I could possibly attend, but here are some of the highlights I found from the conference:

  • My long-time friend and collaborator Matt Roskruge presented on the challenges of developing quantitative measures of Māori social capital (my takeaway was that it may be best to throw away the Western conceptions of social capital, and start over with a Te Ao Māori (Māori worldview) perspective, but apparently that has been done several times already)
  • Steven Deller presented on elder care and female labour force participation, showing that female labour force participation is lower in counties that have less access to elder care
  • Rosella Nicolini presented data that showed immigrants in rural areas are associated with increased GDP growth in Spain, while immigrants in urban areas are associated with decreased GDP growth
  • Rafael González-Val presented analysis of the impacts of the Spanish Civil War, showing a large (12 percent) reduction in industrial employment in provinces aligned with the Republicans, compared to those aligned with the rebels (although it must be noted that all of Spain's main industrial centres were aligned with the Republicans, so it may be no surprise that they declined relative to other regions)
  • Aurelie Lalanne presented some amazingly detailed data on urban growth in France, drawn from historical censuses that have been harmonised, and covering the period from 1800-2015

Aside from the conference, here's what caught my eye in research over the past week:

  • Davis, Ghent, and Gregory (with ungated earlier version here) use a simulation model (calibrated to real-world data) to show that the pandemic induced a large change to the relative productivity of working from home that substantially increased home prices and will permanently affect incomes, income inequality, and city structure
  • Galasso and Profeta find that reducing or eliminating time pressure decreases the math gender gap by up to 40 percent, and that time pressure contributes to the gap through increased anxiety rather than through students modifying their test-taking strategies
  • Mizzi (open access) looks at how economics teachers develop and utilise pedagogical content knowledge (the intersection of pedagogical knowledge and content knowledge) to assist their students’ engagement with disciplinary knowledge in economics (I feel like we should know more about this topic)
  • Liu et al. find that large increases in minimum wages have significant adverse effects on workplace safety, increasing work accidents by 4.6 percent, based on US state-level data
  • Matthes and Piazolo (open access) analyse data from over 40 seasons of professional road cycling races, and find that having a teammate in a group behind positively impacts win probability
  • Fernando and George find that home-team cricket umpires are less biased when working with a neutral colleague (one who is neither a national of the home nor the foreign team)
  • Chilton et al. find that there are large potential gains in better identifying exceptional students in law schools, if changes were made to certain personnel, course, and grading policies to improve the signalling quality of grades (and yet to me, it seems like most universities are on a policy trajectory to reduce the quality of grades as a signal)

Thursday, 21 November 2024

Will New Zealand finally deal with excess demand for access to tourist destinations?

New Zealand has long had a problem with excess demand for access to tourist destinations. I've written about this before, using the Great Walks as an example (see here, and here). Because the price for access to these tourist destinations is too low, the demand for access far exceeds the supply. The consequence is a much-degraded experience for everyone.

The solution, as I have noted before, is to let the price increase. Charge more for access to the Great Walks, and other tourist destinations. And, finally, that may be about to happen. As the New Zealand Herald reported last week:

A $20 access fee for Cathedral Cove, the Tongariro Alpine Crossing, Franz Josef Glacier, Milford Sound, and Aoraki Mount Cook National Park?

The Government is floating the idea of charging visitors – including New Zealanders – as part of two discussion documents, released today, which Conservation Minister Tama Potaka calls the biggest potential changes in conservation in more than three decades...

Charging $20 per New Zealander and $30 per non-New Zealander for accessing those places would bring in an estimated $71 million a year. Charging only international visitors would yield about half that.

Charging for access to these tourist destinations would go some way towards dealing with the excess demand. I'm totally ok with the differential price for New Zealanders and overseas travellers as well (which is something I have noted before, again in the context of the Great Walks). My main concern though is that the price of $20 for New Zealanders and $30 for non-New Zealanders may be too low. However, others have a different view:

But it has triggered a strong reaction from Forest and Bird, which said: “Connection to te Taiao (nature) is a fundamental part of being a New Zealander. All New Zealanders should be guaranteed the ability to connect with our natural environment regardless of how much money they earn.”

How easily can New Zealanders connect with their natural environment when it is thronged with tourists all visiting for free? Charging a price for access limits the numbers of tourists (including other New Zealanders), and makes it more likely, not less likely, that New Zealanders can get genuine access to these places. There is a meaningful difference between accessing a tourist location when there are hundreds of other tourists swarming all over it, and when few people are around and a peaceful engagement with nature is possible.

Quite aside from this being a way for the government to fund the Department of Conservation's operational costs, this proposal to charge a fee for access to these tourist locations is a sensible way to manage demand. Maybe we will finally have a working solution to the excess demand problem in these places.

Read more:

Wednesday, 20 November 2024

Natural capital and the problematic measurement of GDP

I've been thinking a bit about GDP this year, and in particular about the weirdness of its measurement. One of the key problems that has occupied me has been an asymmetry in how capital is accounted for within GDP. When new capital is created, the spending on the new capital adds to GDP. However, when capital is depleted, that depletion does not subtract from GDP. That is why, following a large natural disaster, GDP might actually increase due to rebuilding activity (and because any destruction of capital is ignored).

With that in mind, I was interested to run across this 2019 article by Colin Mayer (Oxford University), Published in the journal Oxford Review of Economic Policy (ungated earlier version here), deep down in my to-be-read pile of articles. Mayer was a member of the UK's Natural Capital Committee, which ran from 2012 to 2020, and this article considers how economists can, and should, approach accounting for natural capital. Mayer distinguishes between economists' traditional view of natural capital, and an approach more similar to how an accountant would approach natural capital:

To the economist, natural capital, like any other asset, is the plaything of humans, there to be treated as mankind sees fit. To the accountant, the firm is an entity of which the managers are the stewards. They are there to preserve the firm and to promote its flourishing. So, too, we should consider whether it is our right to employ nature in the way in which we see fit, or our obligation to act as its steward or trustee.

Mayer's solution is that we should revise how natural capital is treated, and should:

...incorporate a maintenance charge in the balance sheets and profit and loss statements of nations, municipalities, corporations, and landowners to reflect the liability associated with maintaining or restoring these assets.

I think that Mayer could have been much clearer in the explanation here. When natural capital is depleted, through pollution, or extractive industries, or carbon emissions, my view is that the cost of that depletion should directly reduce GDP (which is the equivalent of the 'profit and loss statements' that Mayer refers to). Instead, Mayer seems to be suggesting that this is a liability. Both of those approaches may be correct, given the simple accounting identity (Assets + Expenses = Liabilities + Proprietorship + Revenues). A liability on the right-hand side of that identity equation can arise because of an expense on the left-hand side. However, the labelling as a liability implies an obligation to repay, which may not be the case for all types of natural capital (how would one pay off the liability of mining extraction, for instance?).

Anyway, there is clearly more thinking to be done here. I don't think that economists' approach to natural capital is correct. I think that the approach to other forms of capital (physical, social, and human capital) is similarly flawed. For example, decreasing social capital over time (as accounted by Robert Putnam's 2000 book Bowling Alone (which I reviewed here) should also decrease GDP in my view. By correctly accounting for changes in capital (both upwards and downwards) GDP would better capture changes in societal-level wellbeing.

Saturday, 16 November 2024

This week in research #49

Another quiet blogging week for me, due to travel and the North American Regional Science conference in New Orleans (more on that in next week's post). However, I have been trying to keep up with research, and here's what caught my eye over the past week:

  • Mello (open access) finds that winning the FIFA World Cup increases a country's year-over-year GDP growth by at least 0.48 percentage points in the two subsequent quarters
  • Boyd et al. (open access) describe how an agent-based model could be used to evaluate the impact of minimum unit pricing of alcohol in Scotland (but they don't actually show the results of any such modelling, which is a bit disappointing)
  • Singleton et al. (open access) find that a university located in a town that loses an English Premier League team (due to relegation to the Championship) suffers a reduction in undergraduate year-to-year admissions growth by 4–8 percent
  • Ozkes et al. find that human players of the ultimatum game do not differentiate between human and algorithmic opponents, or between different types of algorithms, but they are more willing to forgo higher payoffs when the algorithm’s earnings benefit a human (this has interesting implications for how humans interact with AI)
  • Gjerdseth (with ungated version here) finds that the destruction of ivory does not reduce elephant poaching rates, using CITES data from 2003 to 2019 (for more on this topic, see this post and the links at the end of it)
  • Hagen-Zanker et al. (open access) use data from a large-scale survey conducted in 25 communities in ten countries across Asia, Africa and the Middle East, and show that there is little consistency in the individual-level and community-level factors that are associated with migration intentions, although women are less likely to have migration intentions, while those with access to transnational social networks are more likely to have migration intentions

Saturday, 9 November 2024

This week in research #48

It's been a quiet week in terms of my keeping up with research, as I've been travelling. However, here's what caught my eye in research over the past week:

  • Rasmussena, Borb, and Petersen merge Twitter data with Danish administrative data, and find that individuals with more aggressive dispositions (as proxied by having many more criminal verdicts) are more hostile in social media conversations, and that people from more resourceful childhood environments (those with better grades in primary school and higher parental socioeconomic status) are more hostile on average, as such people are more politically engaged

In other news, as I said above my wife and I have been travelling this week. We started in Texas, then Oklahoma, and now Arkansas (with Alabama, Mississippi, and Louisiana to come). While in Texas, I had the great pleasure of meeting Cyril Morong, The Dangerous Economist:

Next week may also be fairly quiet on the blog, as I'll be at the North American Regional Science Congress in New Orleans. And, New Orleans, of course.

Sunday, 3 November 2024

Book review: How Big Things Get Done

There are certain books that shouldn't need to be written. Inevitably, those are the books that, in reality, most need to be written. That is certainly the case for How Big Things Get Done, by Bent Flyvbjerg and Dan Gardner. This is a book about big projects, and importantly, how those projects succeed or, as is often the case, how they fail. As the authors note in the preface, it is a book that aims to answer a number of important questions:

Why is the track record of big projects so bad? Even more important, what about the rare, tantalizing exceptions? Why do they succeed where so many others fail?

The book draws on decades of Flyvbjerg's academic research on big projects, as well as his experience both consulting on, and being directly involved in, big projects. Through this work, Flyvbjerg has developed a massive database of projects, their cost and benefit estimates at the time the project began, and the cost over-runs and benefit shortfalls that so often resulted. The numbers do not make for easy reading, and the examples that Flyvbjerg uses range from transport infrastructure to It projects to nuclear power stations to the Olympic Games. On the latter, the book is a useful complement to Andrew Zimbalist's book Circus Maximus (which I reviewed here).

Flyvbjerg and Gardner spend a lot of time discussing failed projects, but devote substantial space to discussing successes, such as Terminal 5 at Heathrow. Many of us will remember the opening of Heathrow for the terrible problems associated with baggage handling in the first few days of opening, but the project itself delivered on time and on budget. Once you read this book, you'll realise just how extraordinary that accomplishment is.

Flyvbjerg and Gardner use the comparison between successful projects and failures to draw a number of lessons. Most of the lessons seem obvious, but clearly those lessons have not been learned well enough in the 'big projects' space, because they are so often not heeded. The biggest lesson of all is to 'think slow, act fast'. Thinking slow means spending substantial time planning before the project begins, ensuring that the risks are well known and have been planned for, before the first spade turns the first sod. Acting fast means completing the project as quickly as possible, to avoid the 'unknown unknowns' from impacting the project - the more delays, the more time there is for something unforeseen to happen.

The 'think slow, act fast' approach seems inconsistent with Silicon Valley's approach to development (as ably described in Jonathan Taplin's 2017 book Move Fast and Break Things, which I reviewed here). Flyvbjerg and Gardner anticipate that counterexample, and note that the two are not inconsistent at all, because:

Planning is doing: Try something, see if it works, and try something else in light of what you've learned. Planning is iteration and learning before you deliver at full scale, with careful, demanding, extensive testing producing a plan that increases the odds of the delivery going smoothly and swiftly.

That is, more or less, what the big tech firms do. Flyvbjerg and Gardner note that iteration is key to those firms' development process, and is generally successful (or where it isn't, the firm can rapidly iterate to something new). In contrast, most big projects are delivered using a 'think fast, act slow' approach that is doomed to failure. 

I really enjoyed this book, even though it does seem quite depressing at times, just how badly big projects are at delivering on their promises (both in terms of costs, and in terms of benefits). The book is not only well researched, but draws on many interviews that Flyvbjerg has completed with people in the industry. The writing did make me wonder what Gardner's contribution was - the whole book is written as if by Flyvbjerg alone (with lots of "I" and "my"), which seems an odd stylistic choice for a co-authored book. Nevertheless it is an enjoyable read, and definitely recommended.

Saturday, 2 November 2024

What does the Cantril Ladder really measure?

Imagine a ladder with steps numbered from 0 at the bottom to 10 at the top. The top of the ladder represents the best possible life for you and the bottom of the ladder represents the worst possible life for you. On which step of the ladder would you say you personally feel you stand at this time?

Now, consider the question you probably just answered. What factors played into your answer? What sorts of things contribute to the best possible life for you, compared with the worst possible life for you? If we used your answer to that question as a measure of life satisfaction, what is it really measuring?

That's not an unimportant question. The first paragraph of this post is a commonly used way of measuring life satisfaction, known as the Cantril ladder (see here). It is used in the Gallup World Poll, and is recommended by the OECD as a way of measuring subjective wellbeing. When researchers (or governments, or others) measure life satisfaction or happiness, it is often the Cantril ladder that is being used.

The question of what the Cantril ladder measures was explored in this recent article by August Nilsson (Lund University), Johannes Eichstaedt (Stanford University), Tim Lomas (Harvard University), Andrew Schwartz (Stony Brook University), and Oscar Kjell (Lund University), published in the journal Scientific Reports (open access, with non-technical summary on The Conversation). Nilsson et al. looked at the framing of the Cantril ladder, and investigated how nearly 1600 people responded to different framings of the question, and the words that they used to describe the top and the bottom of the scale in those different framings, and where they would 'prefer to be' on the scale. The first framing was the traditional Cantril ladder. The second framing essentially replaced the ladder metaphor with the word "scale" (but left the rest intact). The third framing removed references to the "bottom" and "top" (as well as the ladder metaphor). The fourth framing did all of that plus changed "best possible life" to "happiest possible life" (and "worst possible life" to "unhappiest possible life"). And the fifth and final framing instead replaced "best possible life" to "most harmonious life" (and "worst possible life" to "least harmonious life").

Nilsson et al. found that:

The ladder and bottom-to-top scale anchor descriptions influenced respondents to use significantly more words from the LIWC dictionaries Power and Money when interpreting the Cantril Ladder... compared to when these anchors were removed. Of all the words respondents used to describe the top of the Cantril Ladder, 17.3% fell into the Power and Money dictionaries. This language was reduced by more than a third when the ladder was removed in the no-ladder condition (absolute difference of 6.0%, d = 0.35, p < 0.001), and more than halved when the bottom-to-top scale descriptions were removed too (absolute difference of 10.3%, d = 0.64, p < 0.001). Further, for the Cantril Ladder, words in the Power and Money dictionaries occurred 3.3 times as frequently compared to the alternative Harmony anchor condition (absolute difference of 12%, d = 0.77, p < 0.001).

They interpret those results as meaning that:

...the original Cantril Ladder influenced respondents to focus more on money in terms of wealth (whereas when the ladder framing was excluded, they focused more on financial security) than the other conditions.

Were you thinking about the financial aspects of life when you answered the question above? The results seem to suggest that is more common than thinking about social relationships or the various other contributors to our subjective wellbeing. Nilsson et al. don't explore the use of words other than in the 'Power' and 'Money' domains, but it would have been interesting to see some others to compare with.

It's not surprising that financial security, income, or wealth are important contributors to subjective wellbeing or life satisfaction. We should expect people to be better able to satisfy their needs when they have greater financial resources available to them. However, the results on research participants' preferred level on the ladder are genuinely surprising, because:

...over 50% did not prefer the highest level (of 10) in any of the study conditions, and less than a third preferred the top of the Cantril Ladder, which had a significantly lower average preferred level than all the other study conditions.

In other words, even though the top of the Cantril ladder is framed as the 'best possible life', around two-thirds of research participants said that they would prefer not to be at the top of the ladder. This proportion was lower (but still not zero) for other framings, as shown in Figure 4 from the article (where the dark blue part of the bar shows the proportion of research participants who responded that 10 was their preference):

What was your preferred level on the ladder? Did you want to have the best possible life (that is, 10 on the scale)? Or would you prefer to be somewhere just below the best possible life? What do you think about in answering the question on your preferred level? Maybe research participants want 'room to grow' and become even happier or more satisfied with their lives? I have no idea. Nilsson et al. have given us something to really think about here, but unfortunately the article doesn't go far enough in exploring why people don't prefer the top of the ladder. There is definitely scope for further follow-up research on this point.

In addition to being surprising, that last result may call into question how the Cantril ladder is interpreted (on top of the arguments about the validity of happiness data generally - see here, and here, and here). If the top of the scale is not the top of the scale, or if it is different for different research participants, then how do we interpret an average across all people responding to the question? That should make researchers worry, and makes follow-up research even more important.

[HT: New Zealand Herald, back in April]

Read more:

Friday, 1 November 2024

This week in research #47

Here's what caught my eye in research over the past week:

  • Geerling, Mateer, and Wooten (open access working paper) identify a group of “rising stars” in the economics teaching field (where I'm ranked #27 in the world according to their ranking, and #5 outside of the US)
  • Li and Xia find that students just above a letter-grade cutoff in an introductory course are 3.6% more likely to major in the same field as that course, using data from the National University of Singapore
  • Divle, Ertac and Gumren find in an online experiment that although working in a team is more profitable and participants also expect this, a large proportion shy away from teamwork, and that research participants primed with COVID-19 are less likely to self-select into teamwork
  • Dickinson and Waddell find, using data from GitHub, that the transition to Daylight Saving Time reduces worker activity, but that the effects are relatively short-lived, although when using more detailed hourly data losses appear in the early working hours of work days into a second week following the initiation of Daylight Saving Time
  • Naidenova et al. look at twelve years of data from professional Counter-Strike: Global Offensive games and find that there is a substantial decrease in the performance of esports players during overtime, which they attribute to 'choking under pressure', although the impact is less in online competitions compared to live events
  • Martínez-Alfaro, Silverio-Murillo, and Balmori-de-la-Miyar (open access) find in an audit study that job applications from transgender candidates received 36% fewer positive responses than those from cisgender candidates in Mexico

Thursday, 31 October 2024

Book review: Wonderland (Steven Johnson)

When I think about the dramatic changes in society that have occurred since the end of the Industrial Revolution, one of the trends that stands out (to me) is the massive increase in leisure time. In the 19th Century, most people worked far more hours than they do today. The recent decades of that trend were well-described in Daniel Hamermesh's book Spending Time (which I reviewed here). What was left unexplored in that book was the way that leisure pursuits have affected the economy and society.

That is the purpose of Steven Johnson's book Wonderland, which is subtitled "How play made the modern world". Johnson describes the book as:

...a history of play, a history of the pastimes that human beings have concocted to amuse themselves as an escape from the daily grind of subsistence. This is a history of what we do for fun.

The book is comprised of chapters devoted to fashion and shopping, music, food, entertainment, games, and our use of public space. Each chapter is well written and well resourced, and a pleasure to read. Johnson is a great storyteller and the stories he presents are interesting and engaging.

However, from the first chapter, I struggled with the overall thesis of the book, which is that changes in leisure pursuits drove broader societal changes and economic changes. This is most glaringly demonstrated in the first chapter, where Johnson contends that it was the desire for fashion that drove the Industrial Revolution:

When historians have gone back to wrestle with the question of why the industrial revolution happened, when they have tried to define the forces that made it possible, their eyes have been drawn to more familiar culprits on the supply side: technological innovations that increased industrial productivity, the expansion of credit networks and financing structures; insurance markets that took significant risk out of global shipping channels. But the frivolities of shopping have long been considered a secondary effect of the industrial revolution itself, and effect, not a cause... But the Calico Madams suggest that the standard theory is, at the very least, more complicated than that: the "agreeable amusements" of shopping most likely came first, and set the thunderous chain of industrialization into motion with their seemingly trivial pursuits.

In spite of the excellent prose, I'm not persuaded by the demand-side argument for the Industrial Revolution, which flies in the face of lots of scholarship in economic history (as well as in history). Now, it may be that the first chapter just made me grumpy. But Johnson draws several conclusions which are, at best, a selective interpretation of the evidence. And at times, he makes comparisons that are somewhat odd, such as a comparison between the tools and technologies available to artists and scientists and those available to musicians in the 17th Century, concluding that there were fewer and less advanced tools available to artists and scientists than for musicians. There doesn't seem to be any firm basis to make such a comparison (how does one measure how advanced technologies in different disciplines are, in order to compare them?).

The final chapter, though, was a highlight to me. There was a really good discussion of the role of taverns in the American Revolution. And in that discussion, Johnson acknowledges that it is difficult to establish a causal relationship (which made me again wonder why he was unconcerned about the challenges of causality between shopping and the industrial revolution earlier in the book). I really appreciated the discussion of the work of Jürgen Habermas, Ray Oldenburg, and the "third places" (places of gathering that are neither work, nor home). It reminded me of my wife's excellent PhD thesis on cafés.

Overall, I did enjoy the book in spite of my griping about the overall thesis and the way that Johnson sometimes draws conclusions from slim evidence. If you are interested in the history of leisure pursuits, I recommend it to you.

Wednesday, 30 October 2024

Some notes on generative AI and assessment (in higher education)

Last week, I posted some notes on generative AI in higher education, focusing on positive uses for AI for academic staff. Today, I want to follow up with a few notes on generative AI and assessment, based on some notes I made for a discussion at the School of Psychological and Social Sciences this afternoon. That discussion quickly evolved into more of a discussion on intentional design of assessment more generally, rather than focusing on the risks of generative AI to assessment more specifically. That's probably a good thing. Any time academic staff are thinking more intentionally about the assessment design, the outcomes are likely to be better for students (and for the staff as well).

Anyway, here are a few notes that I made. Most importantly, the impact of generative AI on assessment, and the robustness of any particular item of assessment to generative AI, depends on context. As I see it, there are three main elements of the context of assessment that matter most.

First, assessment can be formative, or summative (see here, for example). The purpose of formative assessment is to promote student learning, and provide actionable feedback that students can use to improve. Formative assessment is typically low stakes, and the size and scope of any assessment item is usually quite small. Generative AI diminishes the potential for learning from formative assessment. If students are outsourcing (part of) their assessment to generative AI, then they aren't benefiting from the feedback or the opportunity for learning that this type of assessment provides.

Summative assessment, in contrast, is designed to evaluate learning, distinguish good students from not-so-good students from failing students, and award grades. Summative assessment is typically high stakes, with a larger size and scope of assessment than formative assessment. Generative AI is a problem in summative assessment because it may diminish the validity of the assessment, in terms of its ability to distinguish between good students and not-so-good students, or between not-so-good students and failing students, or (worst of all) between good students and failing students.

Second, the level of skills that are assessed is important. In this context, I am a fan of Bloom's taxonomy (which has many critics, but in my view still captures the key idea that there is a hierarchy of skills that students develop over the course of their studies). In Bloom's taxonomy, the 'cognitive domain' of learning objectives is separated into six levels (from lowest to highest): (1) Knowledge; (2) Comprehension; (3) Application; (4) Analysis; (5) Synthesis; and (6) Evaluation.

Typically, first-year papers (like ECONS101 or ECONS102 that I teach) predominantly assess skills and learning objectives in the first four levels. Senior undergraduate papers mostly assess skills and learning objectives in the last three levels. Teachers might hope that generative AI is better at the lower levels - things like definitions, classification, understanding and application of simple theories, models, and techniques. And indeed, it is. Teachers might also hope that generative AI is less good at the higher levels - things like synthesising papers, evaluating arguments, and presenting its own arguments. Unfortunately, it also appears that generative AI is also good at those skills. However, context does matter. In my experience, and this is subject to change because generative AI models are improving rapidly, generative AI can mimic the ability of even good students at tasks at low levels of Bloom's taxonomy, which means that tasks at that end lack any robustness to generative AI. However, at tasks higher on Bloom's taxonomy, generative AI can mimic the ability of failing and not-so-good students, but is still outperformed by good students. So, many assessments like essays or assignments that require higher-level skills may still be a robust way of identifying the top students, but will be much less useful for distinguishing between students who are failing and students who are not-so-good.

Third, authenticity of assessment matters. Authentic assessment (see here, for example) is assessment that requires students to apply their knowledge in a real-world contextualised task. Writing a report or a policy brief is a more authentic assessment than answering a series of workbook problems, for example. Teachers might hope that authentic assessment would engage students more, and reduce the use of generative AI. I am quite sure that many students are more engaged when assessment is authentic. I am less sure that generative AI is used less when assessment is authentic. And, despite any hopes that teachers have, generative AI is just as good in an authentic assessment as it is in other assessments. It might be better in fact. Consider the example of a report or a policy brief. The training datasets of generative AI no doubt contain lots of reports and policy briefs, so it has lots of experience with exactly the types of tasks we might ask students to complete in an authentic assessment.

So, given these contextual factors, what types of assessment are robust to generative AI. I hate to say it, and I'm sure many people will disagree, but in-person assessment cannot be beaten in terms of robustness to generative AI. In-person tests and examinations, in-person presentations, in-class exercises, class participation or contributions, and so on, are assessment types where it is not impossible for generative AI to influence, but where it is certainly very difficult for it to do so. Oral examinations are probably the most robust of all. It is impossible to hide your lack of knowledge in a conversation with your teacher. This is why universities often use oral examinations at the end of a PhD.

In-person assessment is valid for formative and summative assessment (although the specific assessments used will vary). It is valid at all levels of learning objectives that students are expected to meet. It is valid regardless of whether assessment is authentic or not. Yes, in case it's not clear, I am advocating for more in-person assessment.

After in-person assessment, I think the next best option is video assessment. But not for long. Using generative AI to create a video avatar to attend Zoom tutorials, or to make a presentation, is already possible (HeyGen is one example of this). In the meantime though, video reflections (as I use in ECONS101), interactive online tutorials or workshops, online presentations, or question-and-answer sessions, are all valid assessments that are somewhat robust to AI.

Next are group assessments, like group projects or group assignments, or group video presentations. The reason that I believe group assessments are somewhat robust is that it requires a certain amount of group cohesion to make a sustained effort at 'cheating'. I don't believe that most groups that are formed within a single class are cohesive enough to maintain this (although I am probably too hopeful here!). Of course, there will be cases when just one group member's contribution to a larger project was created with generative AI, but generally it would take the entire group to do so. When generative AI for video becomes more widespread, group assessments will become a more valid assessment alternative than video assessment.

Next are long-form written assessments, like essays. I'm not a fan of essays, as I don't think they are authentic as assessment, and I don't think they assess skills that most students are likely to use in the real world (unless they are going onto graduate study). However, they might still be a valid way of distinguishing between good students and not-so-good students. To see why, read this New Yorker article by Cal Newport. Among other issues, the short context window of most generative AI models means that it is not great at long-form writing, at least compared with shorter pieces. However, generative AI's shortcomings here will not last, and that's why I've ranked long-form writing so low.

Finally, online tests, quizzes, and the likes should no longer be used for assessment. The development of browser plug-ins that can be used to answer multiple-choice, true/false, fill-in-the-blanks, and short-answer-style questions automatically, with minimal student input (other than perhaps to hit the 'submit' button), makes these types of assessments invalid. Any attempts to thwart generative AI in this space (and I've seen things like using hidden text, using pictures rather than text, and other similar workarounds) are at best an arms race. Best to get out of that now, rather than wasting lots of time trying (but generally failing) to stay one step ahead of the generative AI tools.

Finally, I know that many of my colleagues have become attracted to getting students to use generative AI in assessment. This is the "if you can't beat them, join them" solution to generative AI's impact on assessment. I am not convinced that this is a solution, for two reasons.

First, as is well recognised, generative AI has a tendency to hallucinate. Users know this, and can recognise when a generative AI has hallucinated in a domain in which they (the user) have specific knowledge. If students, who are supposed to be developing their own knowledge, are being asked to use or work with generative AI in their assessment, at what point will those students develop their own knowledge that they can use to recognise when the generative AI tool that they are working with is hallucinating? Critical thinking is an important skill for students to develop, but criticality in relation to generative AI use often requires the application of domain-specific knowledge. So, at the least, I wouldn't like to see students encouraged to work with generative AI until they have a lot of the basics (skills that are low on Bloom's taxonomy) nailed first. Let generative AI help them with analysis, synthesis, or evaluation, while the student's own skills in knowledge, comprehension, and application allow them to identify generative AI hallucinations.

Second, the specific implementations of assessments that involve students working with generative AI are not often well thought through. One common example I have seen is to give students a passage of text that was written by AI in response to some prompt, and ask students to critique the AI response. I wonder, in that case, what stops the students from simply asking a different generative AI model to critique the first model's passage of text?

There are good examples of getting students to work with generative AI though. One involves asking students to write a prompt, retrieve the generative AI output, and then engage in a conversation with the generative AI model to improve the output, finally constructing an answer that combines both the generative AI output and the student's own ideas. The student then submits this final answer, along with the entire transcript of their conversation with the generative AI model. This type of assessment has the advantage of being very authentic, because it is likely that this is how most working people engage with generative AI for competing work tasks (I know that it's one of the ways that I engage with generative AI). Of course, it is then more work for the marker to look at both the answer and the transcript that led to that answer. But then again, as I noted in last week's post, generative AI may be able to help with the marking!

You can see that I'm trying to finish this post on a positive note. Generative AI is not all bad for assessment. It does create challenges. Those challenges are not insurmountable (unless you are offering purely online education, in which case good luck to you!). And it may be that generative AI can be used in sensible ways to assist in students' learning (as I noted last week), as well as in students completing assessment. However, we first need to ensure that students are given adequate opportunity to develop a grounding on which they can apply critical thinking skills to the output of generative AI models.

[HT: Devon Polaschek for the New Yorker article]

Read more:

Monday, 28 October 2024

Generative AI may increase global inequality

As I noted in a post earlier this month, the general public appears to be worried about the impact of generative artificial intelligence on jobs and inequality. Some economists are clearly worried as well. Consider this post on the Center for Global Development blog, by Philip Schellekens and David Skilling. They note three reasons why generative AI might increase global inequality, because: (1) richer countries are better equipped to harness AI’s benefits; (2) poorer countries may be less prepared to handle AI’s disruptions; and (3) AI is intensifying pressure on traditional development models.

I have a lot of sympathy for these arguments, but it is worth exploring them in a bit more detail. Here's part of what Schellekens and Skilling said on the first reason:

High-income countries, along with wealthier developing nations, hold a distinct advantage in capturing economic value from AI thanks to superior digital infrastructure, abundant AI development resources, and advanced data systems...

When many people may think about economic growth, we think about catch-up growth. Developing countries often have growth rates that exceed those in developed countries. There are vivid examples of catch-up growth, like the way many developing countries were able to bypass copper telephone lines and move straight to mobile telecommunications. Could AI be like that? It's a hopeful vision. However, the problem with that argument is that AI isn't quite the same as the telecommunications example. There is no outdated technology that is being replaced by AI (unless humans count?). So, developing countries can't leapfrog technology and catch up. If a country doesn't have the technology infrastructure and capital necessary to develop their own AI models, they will be forced to use models developed in other countries. That creates problems for developing countries, and Schellekens and Skilling note two particular concerns:

First, AI could reinforce the dominance of wealthier nations in high-value sectors like finance, pharmaceuticals, advance manufacturing, and defense. As richer countries use AI to enhance productivity and innovation, it becomes harder for poorer countries to penetrate these markets.

Second, while AI is poised to primarily disrupt skill-intensive jobs more prevalent in advanced economies, it can also undermine lower-cost labor in developing countries. Automation in manufacturing, logistics, and quality control would enable wealthier nations to produce goods more efficiently, reducing the need for low-wage foreign workers. This shift, supported by AI-driven predictive analytics and customization capabilities, may allow richer countries to outcompete on cost, speed, and product desirability.

Note that second argument says that in spite of any increase in inequality within developed countries (which is what the general public was most concerned about in my previous post), there would be increases in global inequality because of the differential impact on different labour markets. This is a consequence of past labour market polarisation, where different countries have become reliant on employment in different sectors.

On their second point, Schellekens and Skilling note that, while the social safety net in developed countries may insulate their populations from the negative impacts of AI (a point that I'm not sure that many would agree with), the situation in developing countries is quite different:

Limited resources and underdeveloped social protection systems mean they are less equipped to absorb the economic and social shocks caused by AI-driven disruptions. Many lower-income countries already struggle with high rates of informal employment and fragile labor markets, leaving workers highly vulnerable to sudden economic shifts.

The lack of fiscal space also restricts these countries from investing in crucial areas like reskilling programs, infrastructure upgrades, or targeted welfare schemes to support affected communities. Without such mechanisms, the impact of AI-related job losses could exacerbate unemployment and deepen poverty.

It would be interesting to see some research on the expected impact of generative AI on informal sector employment, but I except that Schellekens and Skilling are largely correct about the impacts on formal sector employment in developing countries.

Finally, on their third point, Schellekens and Skilling note that the model of development that many countries have followed in recent decades, moving first from an agrarian economy, into low-technology manufacturing (like garments), and then into higher-technology manufacturing over time, has become less viable for developing countries over time, and that generative AI may impact the obvious alternative, which is export-oriented service industries:

Countries like the Philippines and India have seen success in business process outsourcing, thanks to booming call center industries and IT services. But AI poses a threat to this model as well. AI has the potential to reduce the labor intensity of these activities, eroding the competitive edge in the international marketplace of lower-cost service providers.

If AI were to undermine labor-intensive service industries, developing countries may find it harder to identify viable pathways for growth, posing a significant challenge to long-term development and dampening the prospects of convergence.

The conclusion here is that generative AI may not only increase within-country inequality, but because of the differential impact on developed and developing countries, it may increase between-country inequality as well. This would potentially reverse decades of declining global inequality (see here and here).

Sunday, 27 October 2024

Airlines have to pay more compensation for death or injury, but it probably still isn't enough

The value of a preventable fatality (a more palatable term than the value of a statistical life) for New Zealand was increased last year to $12.5m (see here). That is the value that Waka Kotahi New Zealand Transport Agency uses in evaluating the benefits of road safety improvements, for example. The new value was a substantial increase from the previous value of $4.88 million.

So, I was interested to read this week that the International Civil Aviation Organisation (ICAO) has revised the amount that airlines must pay in compensation in the event of a death or injury, to just $335,000. As the New Zealand Herald reported:

Travellers will be eligible for higher compensation for international flights, with the International Civil Aviation Organisation (ICAO) setting new liability limits for death, injury, delays, baggage and cargo issues.

This means airlines must pay out at least $335,000 for death or “bodily injury” on flights as a result of the review of payment levels that come into force late this year.

While liability limits are set by the international Montreal Convention agreement, there are no financial limits to the liability for passenger injury or death if a court rules against an airline.

Why is the ICAO value so much lower? After some fruitless searching, I haven't been able to find anything to say how the ICAO sets its value. It dates back to 1999, where the value was set as 100,000 SDRs (Special Drawing Rights - an international reserve asset created by the International Monetary Fund, based on a basket of five currencies).

One reason that might account for this difference is the way that the two estimates are measured. The value of statistical life for New Zealand noted above is measured using the willingness-to-pay approach. Essentially, that method involves working out how much people are willing to pay for a small reduction in the risk of death, then scaling that value up to work out how much they would be willing to pay for a 100 percent reduction in the risk of death, which becomes the estimated value of a statistical life.

An alternative is to use the human capital approach, which involves estimating the value of life as the total amount of economic production remaining in the average person's life. The value of that production is estimated as their wages. Essentially then, this approach involves working out the total amount of wages that the average person will earn in their remaining lifetime. Typically, the human capital approach will lead to a much smaller estimate than the willingness-to-pay (WTP) approach (and for an unsurprising reason - people are worth more than just the value they generate in the labour market!).

So, this difference in approach might account for the different estimates. Why might the ICAO use the human capital approach? One reason may be that the human capital approach leads to lower liability for compensation (in cases where the airline is not found to be at fault - if the airline is found by courts to be at fault, then the compensation is uncapped). Given that many airlines that belong to ICAO are national carriers, each country has an incentive to try and limit the liability of their own airline to paying compensation. A second reason is explained in Kip Viscusi's book Pricing Lives (which I reviewed here). In the book, Viscusi argues that the WTP approach is more appropriate when considering what society is willing to pay to prevent deaths (e.g. in road safety improvements), and that the human capital approach is more appropriate approach when considering a particular life (e.g. in calculating a legal penalty for wrongful death). If we believe Viscusi's argument, then the human capital approach should be used by ICAO.

However, even if we believe that the human capital approach is the right approach (and I'm not convinced that it is), it probably still underestimates the compensation that should be paid, at least for New Zealanders. Consider the following details. The median age in New Zealand is 38.1 years (at the 2023 Census). Life expectancy (at birth) is 80 years for males, and 83.5 years for females. The median weekly earnings (from wages and salaries) was $1343 in June 2024, or $69,836 per year. Using those numbers, and assuming that the median-aged person works only until age 65, and using a social discount rate of 3 percent per year, the discounted value of future wages for the average New Zealander is $1.35 million. That is more than four times higher than ICAO's figure, and is estimated using the human capital approach. Even if we used a discount rate of 10 percent, rather than 3 percent, the value is still about $715,000, more than double the ICAO value.

The ICAO is seriously understating the value of compensation that should be paid in the case of a death on a flight (and where the airline is not at fault). It's just as well that these are rare events!

Saturday, 26 October 2024

If airlines priced all tickets the same, then that would create other problems

Dynamic pricing has been in the news again this week, with Consumer NZ labelling Air New Zealand ticket prices a "rip off". As the New Zealand Herald reported:

Consumer NZ has found that Air New Zealand flights across the Tasman around school holidays increased 43% - almost twice the rate of rival Qantas.

It says it might not be worth flying Air New Zealand to Australia, with evidence that our national carrier is exploiting its market share and demand during the school holidays, giving travellers cause to question if what they’re paying is fair...

A recent Consumer investigation into domestic flights found dynamic pricing could increase the price of the same ticket from Auckland to Dunedin by up to four times as much...

Consumer says while supply and demand do impact dynamic pricing algorithms, “we’re not convinced it’s that simple. We think it’s likely that dynamic pricing allows Air New Zealand to make up profit margins, and it certainly looks like its practices are capitalising on New Zealanders wanting to travel during the school holidays.

“Compared to Qantas, which was consistently cheaper and didn’t have comparable price hikes during either New Zealand or Queensland school holidays, flying with our national carrier to Brisbane looks like a rip off.”

The issue here is the difference in price between a ticket purchased well in advance, and one purchased closer to the date of travel, with the latter being much more expensive. This is an example of price discrimination - selling the same good or service to different consumers for different prices. And price discrimination by airlines is a topic I have posted on before. Here's the explanation I gave then:

Some consumers will buy a ticket close to the date of the flight, while others buy far in advance. That is information the airline can use. If you are buying close to the date of the flight, the airline can assume that you really want to go to that destination on that date, and that few alternatives will satisfy you (maybe you really need to go to Canberra for a meeting that day, or to Christchurch for your aunt's funeral). Your demand will be relatively inelastic, so the airline can increase the mark-up on the ticket price. In contrast, if you buy a long time in advance, you probably have more choice over where you are going, and when. Your demand will be relatively elastic, so the airline will lower the mark-up on the ticket price. This intertemporal price discrimination is why airline ticket prices are low if you buy far in advance.

Similarly, if you buy a return ticket that stretches over a weekend, or a flight that leaves at 10am rather than 6:30am, you are more likely to be a leisure traveller (relatively more elastic demand) than a business traveller (relatively more inelastic demand), and will probably pay a lower price.

The solution is simple. If you want to pay a lower price for an airline ticket, book in advance. That's the advice that Air New Zealand gives in the article:

Customers should book early to secure the best deals, said the (Air New Zealand] spokesperson.

Consumer NZ is of course trying to do the best by consumers. They want lower prices for airline tickets, even when purchased close to the date of travel. However, taking aim at dynamic pricing might be counterproductive. Even putting aside the infeasibility of regulating dynamic pricing, if airlines were to eliminate dynamic pricing, that isn't without cost to travellers.

One thing that an escalating ticket price over time does is manage demand for airline tickets. As price increases, fewer consumers are willing and able to buy tickets. That means that there will generally be more airline tickets available close to the date of travel than there would have been if airline ticket prices remained low all along. Would it be worse to have to pay a high price for an airline ticket purchased at the last minute, or to have no tickets available at all, because the low price encouraged more people to buy, selling out planes sooner? It's not clear to me that is a better outcome.

Even in the case where tickets remain available, a second issue is that it isn't clear that ticket prices would remain low. A profit-maximising airline that no longer price discriminates would set a lower price for tickets purchased close to the date of travel, but a higher price for tickets purchased well in advance. Essentially, they would average the price out over time, meaning that some travellers would end up paying a lower price. That would likely be the leisure travellers, purchasing their tickets well in advance. Business travellers, who are more likely to purchase tickets at the last minute, would benefit greatly from airlines no longer using dynamic pricing.

Consumer NZ is trying to look after the interests of airline travellers (it's not the first time either). However, it isn't clear that they have thought through all of the implications of their attack on dynamic pricing.

Read more:

Friday, 25 October 2024

This week in research #46

With my marking out of the way and provisional grades released to students, I can turn my attention to research once again. Sadly, it was a very quiet week, but here's what caught my eye in research:

  • Charmetant, Casara, and Arvaniti (open access) document the extent of treatment of climate change in introductory economics textbooks (and the CORE text The Economy, which I use in ECONS101, looks pretty good overall, ranking top among US textbooks, and second overall)
  • Miller, Shane, and Snipp (open access working paper) look at the impact of the 1887 Dawes Act in the US (which made Native Americans citizens of the United States with individually-titled plots of land rather than members of collective tribes with communal land), and find that it increased various measures of Native American child and adult mortality from nearly 20% to as much as one third (implying a decline in life expectancy at birth of about 20%)

Wednesday, 23 October 2024

Some notes on generative AI in higher education

I've been a little quiet on the blog this week, as I've been concentrating on reducing my end-of-trimester marking load. However, I came out of exile today to contribute to a discussion on generative artificial intelligence in higher education, for staff of Waikato Management School. The risk with any discussion of AI is that it degenerates into a series of gripes about the minority of students who are making extensive use of AI in completing their assessment. I was type-cast into being the person to talk about that aspect (in part because I will be doing that next week in a discussion at the School of Psychological and Social Sciences). However, I wanted to be a bit more upbeat, and focus on the positive aspects of AI for academics.

I don't consider myself an expert on AI. However, I have read a lot, and I pay attention to how others have been using AI. I've used it a little bit myself (and I'm sure there is much more use that I could make of it). I made some notes to use in the discussion, and thought I would share them. I link to a few different AI tools below, but those are by no means the only tools that can be used for those purposes. Where I haven't linked to a specific tool, then a general-purpose generative AI like ChatGPT, Claude, or Gemini will do the job.

I see opportunities for generative AI in four areas of academic work. First, and perhaps most obviously, generative AI can be used for improving productivity. There are many tasks that academics do that are essentially boring time-sinks. If we adopt the language from David Graeber's book Bullshit Jobs (which I reviewed here), these are tasks that are essentially 'box-ticking'. Where I am faced with a task that I really don't want to do, but I know that I can't really say no to, my first option is to outsource as much of it as possible to ChatGPT. "You want me to write a short marketing blurb for X? Sure, I can do that." [Opens ChatGPT].

Aside from avoiding bullshit tasks, there is lots of scope for using generative AI for improving productivity. I'm sure that a quick Google search (or a ChatGPT query) will find lots of ideas. A few that I have used, or advocated for others to use (because I'm useless at following advice that I freely give to others), are:

  • Brainstorming - coming up with some ideas to get you started on a project. If you have the idea, but are looking for some inspiration, generative AI will give you some starting points to get you underway.
  • Writing drafts - sometimes generative AI can be used to create the first draft of a common task, or to create templates for future use. For example, I got ChatGPT to re-write the templates that I use for reference letters for students, and for supervision reports for my postgraduate students. I can then adapt those templates as needed in the future.
  • Editing - sometimes you have an email that you need to send, and you want to use a particular tone. With a suitable prompt, generative AI can easily change the tone of your email from 'total dick' to 'critical but helpful' (I may need to use this much more!).
  • Condensing or expanding - Academics will often use ten words when four words would be enough. Generative AI can do a great job of condensing a long email or piece of text. On the other hand, if you need to expand on something, generative AI can help with that too.
  • Summarising or paraphrasing - On a similar note, generative AI can help with paraphrasing long pieces of text, or summarising one or more sources. Some good tools here are Quillbot for paraphrasing, Genei for summarising text, or summarize.ing for summarising YouTube videos.
  • Translation - Going from one language to another is a breeze. It may not always be 100% accurate, but it is close enough.

Second, generative AI can be used for teaching. Here's a few use cases in the teaching space:

  • Writing questions or problem sets - generative AI can write new questions, but they aren't always good questions. However, it can be used to generate new context or flavour text on which to base a question, which is pretty important if you want something new but are feeling uninspired. Also, creating a problem set or quiz questions (multiple choice, fill-in-the-blanks, true/false) is fairly straightforward, by uploading your notes or lecture slides. However, I wouldn't use those questions in an online testing format (more on that when I post about assessment next week).
  • Writing marking rubrics - With a short prompt outlining the task, the number of cut-points and marks, ChatGPT created the first draft of all of the marking rubrics for the BUSAN205 paper in A Trimester this year. I had to cut back on the number of criteria that ChatGPT was using, and modify the language a little bit, but otherwise they were pretty good.
  • Marking to a rubric - Once you have the rubric and the student's submitted assessment, generative AI can easily mark the work against the rubric. You would want to check a good sample of the work to ensure you were getting what you expected, but this could be a huge time-saver for marking long written work (provided you can believe that the work is the student's own, and not written by generative AI!). In case you are wondering, I didn't do this in BUSAN205 (it didn't occur to me until this week!).
  • Lesson plans - Creating lesson plans (which is more often a primary or secondary school approach to teaching than in higher education) is a breeze with generative AI. Just tell it what you want, and how much time you have, and it can create the plan for you. One useful tool is lessonplans.ai.
  • Lecture slides - Most of us probably write our slides first, and write notes second. However, if you have the notes and want slides, then generative AI can save you the hassle. And the end product will likely be better than anything you or I could create (as well as conforming to recommendations like limits on the number of bullet points on a single page, etc.).

Third, generative AI can be used for assisting student learning (this is separate from students using it for completing assessment tasks). I can see two good use cases here:

  • As a personal tutor - Using a tool like coursable.io or yippity.ai, students can create personalised flash cards or quizzes for any content that they upload. A link from your learning management system could point students to these useful tools
  • Creating your own finetuned AI - This is one use case where I am very excited. By uploading my lecture slides, tutorials, transcripts of my lecture recordings, and posts from my blog, I think I can probably finetune my own AI. What better way for students to learn that from the chatbot based on their lecturer? I will likely be playing with this option over the summer.

Fourth, generative AI can be used as a credible research assistant. However, like any human research assistant, you would be wise to avoid taking anything that generative AI provide you uncritically. Applying high standards of due diligence will help to minimise problems of hallucination, for example (for comparison, I'm not sure what the rate of hallucination is among human research assistants, but I'm pretty sure it is not zero). Aside from some of the use cases above, which could apply to research as well, I can see these options:

  • Literature review - It's far from perfect, but tools like Elicit or Consensus do a credible job of drafting literature reviews. It would provide a good base to build on, or a good way to identify literature that you might otherwise miss.
  • Qualitative data analysis - Some of the most time-consuming research is qualitative. However, using a tool like atlas.ti, you can automate (or semi-automate) thematic analysis, narrative analysis, or discourse analysis (and probably other qualitative methods that I don't know the names of).
  • Sentiment analysis - Sentiment analysis is increasingly being used in quantitative and qualitative research, and generative AI can be used to easily derive measures of sentiment from textual data. I'm sure there are lots of other uses cases for textual data analysis as well.
  • Basic statistics - I've seen examples of generative AI being used to generate basic statistical analyses. This is particularly useful if you are not quantitatively inclined, and yet want to present some statistics to provide some additional context or additional support for your research.
  • Coding - Writing computer code has never been easier. Particularly useful for users of one statistics package (like R or Stata or Python) wanting to write code to run in a different package.

Anyway, I'm sure that there are many other use cases as well, but those are the ones that I briefly touched on in the session today. I'll be talking about the negative case (the risks of generative AI for assessment, and how to make assessment more AI-robust) next week, and I'll post on that topic then. In the meantime, try out some of these tools, and enjoy the productivity and work quality benefits they provide.

Friday, 18 October 2024

This week in research #45

Here's what caught my eye in research over the past (fairly quiet) week:

  • Angrist et al. (open access) analyse the effectiveness and cost-effectiveness of education interventions from over 200 impact evaluations across 52 countries (using learning-adjusted years of schooling (LAYS) as a unified measure across all studies)
  • Grant and Üngör develop a theoretical model that shows that rising use of automation in production will cause a rise in the skill premium (increasing wages of high-skilled traditional workers and high-skilled workers with an AI background) and the AI skill premium (wages of high-skilled labour with an AI-based education relative to those with a traditional education background), which will likely increase income inequality

Wednesday, 16 October 2024

The economic welfare gains from the introduction of generic weight-loss drugs

The Financial Times reported this week (paywalled):

India’s powerful copycat pharmaceutical industry is set to roll out generic weight-loss drugs in the UK within weeks, with one leading producer forecasting a “huge price war” that could widen access to the popular medicines.

Bengaluru-based Biocon is the first company to win UK authorisation to offer a generic version of Novo Nordisk’s Saxenda weight treatment and is ready to launch sales by November.

Saxenda is an older drug of the same GLP-1 drug class as the Danish company’s popular Ozempic diabetes treatment and Wegovy weight-loss medication.

In an interview with the Financial Times, Biocon chief executive Siddharth Mittal declined to comment on his pricing strategy for generic Saxenda, but predicted his company’s sales of the drug would reach £18mn annually in the UK after the expiry of its patent protection there next month. Mittal said he expected Biocon’s generic version of Saxenda to be approved by the EU this year and in the US by 2025.

“When the generics come in there will be a huge price war,” he said. “There is a huge demand for these drugs at the right price.”

To see how the introduction of generic medicines affects the market, consider the diagram of the market for Saxenda below. When the active ingredient in Saxenda is protected by a patent, the market is effectively a natural monopoly. That means that the average cost curve (AC in the diagram) is downward sloping for all levels of output. This is because, as the quantity sold increases, the large up-front cost of developing Saxenda (see here for example) will be spread over more and more sales, lowering the cost on average. If Novo Nordisk (the producer of Saxenda) is maximising its profits, it will operate at the quantity where marginal revenue meets marginal cost, i.e. at QM, which it can obtain by setting a price of PM (this is because at the price PM, consumers will demand the profit-maximising quantity QM). Novo Nordisk makes a profit from Saxenda that is equal to the area PMBKL. [*]


Now consider what happens in this market when the patent expires and generic versions of Saxenda enter the market. We end up with a market that is more competitive, which would operate at the point where supply (MC) meets demand. This is at a price of PC, and the quantity of QC. Notice that the price of Saxenda falls dramatically - this is how the price war that Mittal mentions will play out.

Now consider what happens to the other areas of economic welfare. Before the patent expires, the consumer surplus is equal to the area GBPM. After the patent expires, the consumer surplus increases to the area GEPC. Consumers are made much better off by the patent expiry, because they can buy Saxenda at a much lower price, and they respond by buying much more of it. The producer surplus, which was PMBHPC, becomes zero. [**] The competition between the producers drives this producer surplus down. Total welfare (the sum of consumer and producer surplus) increases from GBHPC to GEPC. So, society is better off after the patent expiry.

Now, you could argue based on this that expiring the patent earlier would be even better, given the economic welfare gain that would result. And while I have some sympathy for that view, governments should be a little cautious here. The large producer surplus from having the patent in place creates an incentive for the big pharmaceutical firms to develop these pharmaceuticals in the first place. So, an appropriate balance between patent protection and incentives for pharmaceutical development needs to be found. Nevertheless, it is clear that once patents expire, there is a large welfare gain to society at that point.

*****

[*] This is different from the producer surplus, which is the area PMBHPC. The difference between producer surplus and profits arises because of the fixed cost - in this case, the cost of development of Saxenda.

[**] If we treat this as continuing to be a natural monopoly after the patent expiry, the market makes a negative profit of -JFEPC (because the price PC is less than the average cost of production ACC). However, you could argue that because the firms producing the generic version didn't face the up-front cost of development, this is no longer a natural monopoly once the patent has expired.

Tuesday, 15 October 2024

Nobel Prize for Daron Acemoglu, Simon Johnson, and James Robinson

Many economists had been picking this prize for a few years. Daron Acemoglu (MIT), Simon Johnson (MIT), and James Robinson (University of Chicago) were awarded the 2024 Sveriges Riksbank Prize in Economic Sciences in Memory of Alfred Nobel (aka the Nobel Prize in Economics) yesterday, "for studies of how institutions are formed and affect prosperity".

While many, if not most, Nobel Prize winners in economics are largely unknown outside the discipline, having toiled away publishing papers only read by other economists, this award recognises three academics whose key contributions on which the award are based are contained within several best-selling books, including Why Nations Fail (by Acemoglu and Robinson, which I reviewed here), The Narrow Corridor (by Acemoglu and Robinson, which I reviewed here), and Power and Progress (by Acemoglu and Johnson, which I haven't read yet, but it is close to the top of my pile of books to-be-read). The Nobel Prize Committee's citation noted:

The laureates have shown that one explanation for differences in countries’ prosperity is the societal institutions that were introduced during colonisation. Inclusive institutions were often introduced in countries that were poor when they were colonised, over time resulting in a generally prosperous population. This is an important reason for why former colonies that were once rich are now poor, and vice versa.

Some countries become trapped in a situation with extractive institutions and low economic growth. The introduction of inclusive institutions would create long-term benefits for everyone, but extractive institutions provide short-term gains for the people in power. As long as the political system guarantees they will remain in control, no one will trust their promises of future economic reforms. According to the laureates, this is why no improvement occurs.

However, this inability to make credible promises of positive change can also explain why democratisation sometimes occurs. When there is a threat of revolution, the people in power face a dilemma. They would prefer to remain in power and try to placate the masses by promising economic reforms, but the population are unlikely to believe that they will not return to the old system as soon as the situation settles down. In the end, the only option may be to transfer power and establish democracy.

Notice that citation really is the theme across their three books. Of course, there is an academic base that those books are founded on as well, and which no doubt contributed to their prize. Alex Tabarrok at Marginal Revolution gives a good summary of their work, as does John Hawkins at The Conversation. As those two posts make clear, all three prize winners have made contributions beyond those in the citation.

However, Acemoglu is clearly a standout performer, and has been for a long time. He is one of the most cited economists in the world, with contributions across a number of areas. Tabarrok points to joint work between Acemoglu and pascual Restrepo on technological change. I have on my list of interesting ideas to go back and look at a different paper by Acemoglu and Restrepo, on the impacts of population ageing on economic growth, but using different measures of population ageing (as in my article here). I also pointed to Acemoglu's views on the impact of generative AI on inequality yesterday, which he has also researched recently.

In my ECONS102 class, I've been including more of a focus on economic and political institutions over time, and this prize may prompt me to even include a bit more (or at least, to point more explicitly to the work of Acemoglu, Johnson, and Robinson). And hopefully it will encourage even more people to read their books.

Monday, 14 October 2024

Generative AI and expectations about inequality

In the last week of my ECONS102 class, we covered inequality. In discussing the structural causes of inequality, I go through a whole bunch of causes grouped together under a heading of 'structural changes in the labour market', one of which is skills-biased technological change. The basic idea is that over time, some technology (like computers) has made people in professional, managerial, technical, and creative occupations more productive or allowed them to reach larger audiences at low cost. However, other technology (like robots) has tended to replace routine jobs in sectors like manufacturing. This has increased the premium for skilled labour, increasing the ‘gap’ between skilled and unskilled wages.

In discussing this idea of skills-biased technological change this year, I mused about the potential impact of generative artificial intelligence, and whether skills-biased technological change was about to reverse, leading to job losses in professional, managerial, technical, and creative occupations, while jobs in activities that might broadly be grouped into manual and dexterous labour (like plumbers, electricians, or baristas) would remain. A change like that would likely reduce inequality (but not necessarily in a good way!).

The truth is, I don't think that economists have a good handle on what the impacts of generative AI will be on the labour market. On the one hand, you have some economists like Stanford's Nick Bloom, claiming that a lot of jobs (in particular tasks or occupations or sectors) are at risk. The loss of low-productivity, low-wage jobs that Bloom considers at risk, like call centre workers, will likely increase inequality further. On the other hand, you have other economists like MIT's Daron Acemoglu, claiming that the impact of generative AI on inequality will be small.

Given that economists can't agree on this, it is interesting to know what the general public thinks. That's the question that this post on Liberty Street Economics by Natalia Emanuel and Emma Harrington addresses. Using data from the February 2024 Survey of Consumer Expectations, they report that:

In general, a substantial share of respondents did not anticipate that genAI tools would affect wages: 47 percent expected no wage changes. These beliefs did not differ significantly based on prior exposure to genAI tools.

However, respondents believed that genAI tools would reduce the number of jobs available. Forty-three percent of survey respondents overall thought that the tools would diminish jobs. This expectation was slightly more pronounced among those who had used genAI tools, a statistically significant difference.

And specifically in terms of inequality:

We find that those who have used genAI tools tend to be more pessimistic about future inequality. Specifically, we asked people whether they thought there would be more, less, or about the same amount of inequality as there is today for the next generation... while 33 percent of those who have not used genAI tools think there will be more inequality in the next generation, 53 percent of those who have used genAI tools think there will be more inequality. This gap persists and is statistically significant, even after controlling for other observable traits. 

So, a large minority of the general public seems to be concerned about generative AI's impact on inequality, and that concern is greater among those with experience (where a small majority believe inequality will increase). Now, it could be that those with greater experience are better able to accurately assess the risks to their own (and others') jobs from generative AI. Or maybe people who use generative AI are simply more likely to have read the AI doomers' predictions of an AI apocalypse (or equally, they could be more likely to read the bullish views of AI proponents). The general public may not know that they fear skills-biased technological change, but they may intuitively understand the potential risks. The real question, which we still cannot answer, is whether those risks are real or not.