Sex, Drugs and Economics: 2022

Saturday, 31 December 2022

Doctor shortages and the medical licensing cartel

The definition of market power is the ability for a seller (or sometimes a buyer) to influence market prices. More market power means more control over prices. For a seller, that means that they can push the price upwards, and increase their profits. In my ECONS101 and ECONS102 classes, we discuss several ways that firms can obtain market power, one of which occurs when the government grants a firm the exclusive right to produce and/or sell a particular good or service. One example of this is patents. Only the patent holder, or another firm that buys a licence from the patent holder, is allowed to produce and sell the patented product. Other sellers are excluded from the market.

Occupational licensing provides a similar source of market power to patents. When the government creates rules that stipulate that all sellers within a particular market must be licensed, then that excludes other potential sellers from operating in that market. Sometimes, the government handles the licensing process itself, and sometimes it outsources the licensing process to an industry body. The Medical Council handles licensing (registration) of doctors, for example. Unregistered doctors are prohibited from practicing medicine and selling their services. Only those that are registered with the Medical Council are allowed to practice.

The incentives created by occupational licensing systems are obvious. From the perspective of the insiders (licensed professionals, like doctors), more market power means more profits (or higher salaries). So, the insiders will want the licensing system to increase their market power, by excluding as many competitors as possible from being licensed. They can achieve this by ensuring that there are many requirements for new licensees to meet, including training and examinations, knowledge of local context, supervised work requirements, and so on. All of this can be dressed up as 'protecting the public' from low-quality practitioners, when all it really does is limit competition.

If the government is handling the licensing system, the industry association will lobby for these protections to be in place. If the industry association is handling the process itself, there is little to stop them from enacting all sorts of spurious requirements in the interest of 'public safety'. And limiting the number of people who can achieve registration is an effective way of keeping the competition out. Since the real purpose of the licensing system, from the perspective of the industry association, is to limit competition, the effect of the licensing system is essentially government-sanctioned cartel behaviour.

And so we end up in this situation with doctors, as noted in The Conversation earlier this month by Johanna Thomas-Maude (Massey University):

Immigration New Zealand’s recent announcement that all medical doctors would be included on the straight-to-residence pathway doesn’t quite give the full picture. In fact, “all” only includes those doctors who can have their medical registration approved before coming to New Zealand.

For many foreign-trained doctors already living here, the obstacle preventing them from working isn’t immigration – it’s medical licensing. If more is not done to streamline and speed up the licensing process, New Zealand risks losing prospective doctors to countries that make the process easier.

Doctors trained in Australia, the United Kingdom and Ireland, or other “comparable health systems”, can usually register and receive a job offer before immigrating.

But as of mid-November, more than 50 foreign-trained doctors who have met the Medical Council’s standards are still caught in a bottleneck, waiting for supervised hospital positions that will allow them to be provisionally registered before their exam pass expires.

Yes, you read that right. In the midst of a doctor shortage, doctors trained internationally in medical systems that are substantially comparable to New Zealand's, cannot get registered here due to a lack of supervised hospital positions. They need to spend some time in a supervised position before they can be registered, because it is a requirement of the licensing regime. And to make matters worse:

Potentially hundreds of other doctors already in New Zealand are also waiting to take the required local clinical skills exam (NZREX), which is only open to 30 people at a time. The exam has only been offered four times – instead of the usual nine – in the past three years, with only one currently scheduled for 2023.

More licensing rules, that simply serve to protect the market power of the medical licensing cartel by limiting incoming competition from overseas. By limiting the number of exam slots, and limiting the availability of supervised hospital positions, current doctors will have more market power to push up their own salaries. The solution is obvious, as Thomas-Maude notes:

New Zealanders should be pushing for further change. At a minimum, there should be viable supervised pathways for all doctors who demonstrate the required knowledge through international and local exams, as well as more exam offerings.

I'd go even further. Registration in 'comparable health systems' should be deemed comparable enough that registration in New Zealand is automatic when a job offer is extended. We don't need to be excluding good doctors from Australia, the UK or Ireland, from working in New Zealand. The medical licensing cartel needs to be reined in.

[Update: Eric Crampton at Offsetting Behaviour makes some additional comments]

Friday, 30 December 2022

Your eggs are going to cost you more next year

It really feels like 2022 has been the year of the shortage. Just in the last few months, I've posted about shortages of French mustard, CO2 for beer making, after-hours veterinarian services, dungeon masters, and Kobe beef croquettes (although this one was purposeful on the part of the seller). And now eggs, as reported by the New Zealand Herald earlier this week:

Supermarket shelves are bare of eggs while others are limiting the number of cartons customers can buy during a drop in supply...

A ban on battery-caged hens, announced in 2012, comes into effect on Saturday and over the past few years the deadline has caused turmoil in the industry.

Egg Producers Federation executive director Michael Brooks said more than 75 per cent of chicken farmers have had to change their farming methods or their career because of the ban.

“The supermarkets’ announcement to refuse colony cage eggs, the end of the cage system, plus Covid, plus the grain cost rising because of the Ukraine war have all come together,” he said.

“It’s led to a drop of about 600,000 or 700,000 hens in the commercial flock. That’s a lot of eggs that aren’t available.”...

Brooks predicted egg prices would also rise as it has cost farmers millions to change their practices.

When the Government announced the battery cage ban, it told farmers they would have to transition to colony, barn or free-range farming.

But in 2019, Foodstuffs said they would no longer accept colony eggs and aims to be fully cage-free by 2027 either, which Brooks described as a “bombshell”.

“That put real confusion into the industry. A number of people - in fact a third of the industry - had already gone to colony eggs. But to go free range, they’d have to buy a whole new farm and the barn system was one we hardly knew in New Zealand, so a lot of farmers were really thrown.”

The funny thing is, this is almost exactly as predicted in 2019, when the ban on cage eggs was announced. Let's reprise what I said would happen then, and how it applies again now, but with bans on both cage eggs and colony eggs.

Consider the market for eggs, as shown in the diagram below. Egg producers are facing increasing production costs, because of the need to move from cage egg production and colony egg production for free range egg production. When costs of production increase, that results in a decrease in supply, shown by the supply curve shifting up and to the left, from S0 to S1. If egg prices were to remain at the original equilibrium price (P0), then the quantity of eggs demanded (Q0) would exceed the quantity of eggs supplied (QS) at that price, because egg producers are only willing to produce QS eggs at the price of P0, after the supply curve shifts. There would be a shortage of eggs, as we are observing in the market.

Shortages don't tend to last forever though. At least, not if the market is allowed to adjust. How would the egg market adjust to the shortage? The price of eggs would increase. To see why, consider what happens when there is a shortage. Some buyers, who are willing to pay the market price (P0), are missing out on eggs. Some of them will find a willing seller, and offer the seller a little bit more, in order to avoid missing out. In other words, buyers bid up the price. The result is that the price increases, until the price is restored to equilibrium, at the new (higher) equilibrium price of P1. At the new equilibrium price of P1, the quantity of eggs demanded is exactly equal to the quantity of eggs supplied (both are equal to Q1). We can say that the market clears. There is no longer a shortage.

In the meantime though, there is a shortage of eggs. As many economists would tell you, the reason for the shortage of eggs is not that there is not enough eggs, but rather that the price of eggs has not yet adjusted sufficiently. Expect your eggs to cost you more in the future.

Wednesday, 28 December 2022

We really need to move to research grant lotteries

The premiere source of competitive research funding in New Zealand is the Royal Society of New Zealand's Marsden Fund, which "supports excellence in science, engineering, maths, social sciences and the humanities in New Zealand by providing grants for investigator-initiated research". Researchers submit proposals, which are assessed on the quality of the proposed research and research team, with typically less than 15 percent of proposals being funded each year. There is a two-round selection process. In the first round, an expert panel screens short proposals (one-page plus CVs and some additional details), with a small number invited to progress to the second round. In the second round, each full proposal (around five or six pages plus CVs and additional details) is assessed by several international referees, who each score the proposal. The proposals are then reviewed by the expert panel, and ranked, with the top-ranked proposals being funded. Typically about half of the proposals that go through to the second stage are ultimately funded. [*] That long process of selection of successful research proposals raises two obvious questions: (1) how well are the top proposals selected in this process; and (2) what is the consequence of being funded for the researchers involved?

Those are essentially the research questions addressed in this 2018 article by Jason Gush (Royal Society of New Zealand), Adam Jaffe (Motu Research), Victoria Larsen (University of Otago), and Athene Laws (Motu Research), published in the journal New Zealand Economic Papers (ungated earlier version here). They used data from 1263 proposals that made it to the second round, over the years from 2003 to 2008.

In terms of the second research question, their simplest test for an effect of funding on research output, while ignoring any selection bias, finds that:

...funding is associated with an increase in publications of about 6% and citations about 12% relative to what would have been predicted based on previous performance.

So, that suggests that being successfully funded is associated with greater research output in the future, which is what you would hope (although in other results Gush et al. find that there is no evidence that it generates 'home runs', in terms of very highly cited research). The next question is, how much of that is actually selection bias (better researchers, who would have published more anyway, are those that get funded)? In this case:

The surprising result... is that the coefficient on scaled rank is negative. This means that, controlling for the other regressors - including the effect of the funding itself - proposal teams that were highly ranked by the RSNZ panels actually performed worse than those that were ranked lower. Specifically, because the rank is scaled so that it is roughly one for the best-ranked proposal and zero for the worst, the coefficient of -.2 to -.3 means that the worst ranked proposal team got 20%-30% more output than the best team, after controlling for all other attributes, including previous performance.

Yikes! The selection process appears to do a negative job of selecting the best research teams. However, when Gush et al. change the model specification (to a counts-based model), they find that:

...the negative effect of scaled rank appears to be concentrated among the unfunded proposals, although there is still no evidence of the expected positive effect even for the funded proposals.

So, there isn't negative selection among the proposals that are actually funded, which should be a bit of a relief. However, there isn't positive selection either. You could do nearly as well in selecting the best proposals by simply selecting some at random (which earlier research has suggested might not be too bad as an option). Gush et al. conclude that:

Given the significant time and resources that both researchers and the RSNZ devote to the second-round selection ranking, its apparent ineffectiveness in predicting bibliometric outcomes suggests that the Fund could benefit from review of its selection processes.

That isn't far from what I have suggested before. Instead of expending substantial time, effort, and resources conducting a review process that is no better than random at selecting the best research proposals, run a research grant lottery instead.

*****

[*] In the interests of full disclosure, I had an unsuccessful proposal in the second round of the Marsden Fund this year. You are welcome to interpret this entire post as a gripe against the system that thwarted my latest proposal. However, you should first note that I have been successful in other rounds, and from other funders, and I've written on this topic before. And, I've been on the other side of the funding decisions, being on panels for the Health Research Council the last two years (and my experience there suggests that we probably could have done just as well by randomising, after culling a few proposals that were clearly below par).

Friday, 23 December 2022

Jason Collins on behavioural economics and heliocentrism

When I teach the concept of rational decision-making in my ECONS102 class, we quickly move onto talking about many of the ways in which rationality fails to represent 'real-world' decisions made by real people. This draws on decades of insights from behavioural economics, and I group those insights together into four areas: (1) heuristics (or rules of thumb); (2) present bias; (3) loss aversion; and (4) framing. Within each category there are several biases that we can discuss. However, that barely scratches the surface of the hundreds of biases that psychologists, behavioural scientists, and behavioural economists have identified (see this list on Wikipedia).

Yet, despite all of these biases that are known, the rational behaviour model persists in economics. The reason why the model persists is because there isn't a better single model that captures the biases, as well as the occasions when people do act rationally. We need a better model.

On the Works in Progress site, Jason Collins has an incredibly insightful article along these lines. He draws a fascinating parallel with early astronomy:

From the time of Aristotle through to the 1500s, the dominant model of the universe had the sun, planets, and stars orbiting around the Earth.

This simple model, however, did not match what could be seen in the skies. Venus appears in the evening or morning. It never crosses the night sky as we would expect if it were orbiting the Earth. Jupiter moves across the night sky but will abruptly turn around and go back the other way.

To deal with these ‘anomalies’, Greek astronomers developed a model with planets orbiting around two spheres. A large sphere called the deferent is centered on the Earth, providing the classic geocentric orbit. The smaller spheres, called epicycles, are centered on the rim of the larger sphere. The planets orbit those epicycles on the rim. This combination of two orbits allowed planets to shift back and forth across the sky.

But epicycles were still not enough to describe what could be observed. Earth needed to be offset from the center of the deferent to generate the uneven length of seasons. The deferent had to rotate at varying speeds to capture the observed planetary orbits. And so on. The result was a complicated pattern of deviations and fixes to this model of the sun, planets, and stars orbiting around the Earth.

Instead of this model of deviations and epicycles, what about an alternative model? What about a model where the Earth and the planets travel in elliptical orbits around the sun?

By adopting this new model of the solar system, a large collection of deviations was shaped into a coherent model. The retrograde movements of the planets were given a simple explanation. The act of prediction became easier as a model that otherwise allowed astronomers to muddle through became more closely linked to the reality it was trying to describe...

Behavioral economics today is famous for its increasingly large collection of deviations from rationality, or, as they are often called, ‘biases’. While useful in applied work, it is time to shift our focus from collecting deviations from a model of rationality that we know is not true. Rather, we need to develop new theories of human decision to progress behavioral economics as a science. We need heliocentrism.

Once you hear it explained, the parallel between early astronomy and current economic theory is obvious. As Collins observes (and I encourage you to read his entire article), we need a new model. We can continue to investigate cognitive biases, and make minor ad hoc adjustments to models and policies to try to take account of the latest biases. However, as long as the underlying model is rational behaviour, we are going to continue to lack a proper understanding of human decision-making.

We need a model where cognitive biases are no longer exceptions to the model, but are instead explained by the model itself. It sounds obvious, but it's going to take a spark of genius. Where is the Copernicus of economics?

Thursday, 22 December 2022

Online off-the-shelf lessons and student learning outcomes

There are plenty of online resources available to teachers, to help improve their teaching, or to reduce the up-front costs of developing lessons. At university level, most textbook publishers provide a plethora of additional teaching resources including pre-prepared PowerPoint slides, learning activities, readings, case studies, test banks, and more. There are also websites where teachers can share their lesson plans and other materials with other teachers.

For the most part, I have resisted the temptation to use all of the resources available to me as a teacher, and I suspect I am not alone. There are several reasons for this. First, in relation to textbook resources, I feel like they are a thinly-disguised attempt to lock teachers into using the same textbook in perpetuity. They create a switching cost for lecturers wanting to change textbooks, since the lecturer would then need to develop new resources to replace those from the old textbook. And the textbook resources are really not that good, anyway. The PowerPoint slides simply parrot details which students can read for themselves from the textbook, reducing any value-add that students may get from lectures. The test banks and case studies tend to be error-laden, having largely been written by poorly-paid graduate students, and subjected to minimal, if any, quality checks (and I say this as a former poorly-paid graduate student who updated the instructor resources for the Gans, King and Mankiw Principles of Economics textbook some years ago). The resources provided by teachers for other teachers are much better quality, but tend to be somewhat idiosyncratic and require tailoring to the class they will be used in, and that tailoring is not without cost to the lecturer. For my part, I have adapted a number of in-class exercises and experiments for my classes, as well as developing my own.

Finally, it is worth wondering, does using pre-prepared resources affect student learning at all? Pre-prepared off-the-shelf lessons or activities may save the teacher some time, and that time might be used to improve the pedagogy that is employed in the classroom, including by improving the quality of discussion or examples or applications that they use (or, off-the-shelf lessons may free up the teacher to engage in more relaxing non-teaching activities, leaving them refreshed and engaged and better able to attend to students' learning needs). On the other hand, off-the-shelf lessons or activities may not be well adapted to the learning needs of a particular classroom, and therefore not improve student learning at all.

The value of off-the-shelf lessons is tested in this 2018 article by Kirabo Jackson and Alexey Makarin (both Northwestern University), published in the American Economic Journal: Economic Policy (ungated earlier version here). Jackson and Makarin conducted a randomised experiment using Mathalicious resources for high school mathematics teachers. As they explain:

Under our experiment, teachers were randomly assigned to one of three treatment conditions. In the license-only condition, we informed teachers that these lessons were high quality and that they had free access to them. To promote lesson adoption, some teachers were randomly assigned to the full treatment condition in which teachers received email reminders to use the lessons and were invited to an online social media group focused on lesson implementation (in addition to the license-only offerings). Finally, teachers randomly assigned to the control condition continued business-as-usual.

The sample frame was large:

All three Virginia districts agreed to participate: Chesterfield, Henrico, and Hanover. Across all grade levels, 59,186 students were enrolled in 62 Chesterfield public schools, 50,569 students were enrolled in 82 Henrico public schools, and 18,264 students were enrolled in 26 Hanover public schools in the 2013–2014 school year (NCES). All grades 6 through 8 math teachers in these districts were part of the study.

The sample size was also large, covering 363 teachers and 27,613 students in the 2013-14 school year. The key outcome variable was standardised maths test scores. The random assignment of teachers to the treatment conditions (after some non-randomised teachers were excluded from the sample) allows causal estimates of the effects of the experimental treatments to be estimated. Jackson and Makarin find that:

Students of teachers in the license-only and the full treatment groups experienced a 0.06σ and 0.09σ test score increase relative to those in the control condition, respectively. The full treatment effect is statistically significant at the 1 percent level, and has a similarly sized effect as that of moving from an average teacher to one at the eightieth percentile of quality, or reducing class size by 15 percent...

Those are sizeable effects, but not all teachers improved outcomes for their students equally, as:

...the benefits of online lesson use are the largest for the least effective teachers (as measured by teacher/classroom value added). We theorize that this is due largely to lesson quality improvements being largest for weaker teachers. We also find suggestive evidence that lesson provision had larger effects for first-year teachers, implying that the off-the-shelf lessons may have provided some time savings for these teachers.

So far, so good. So, what was it about the off-the-shelf lessons that led to these improvements, especially for the least effective and first-year teachers? Exploring the mechanisms, Jackson and Makarin find that:

Students from the full treatment group are 0.175σ (p-value < 0.05) more likely to agree that their math teacher promotes deep understanding. Also, consistent with off-the-shelf lessons freeing up teacher time to exert more effort in complementary teaching tasks, student agreement with statements indicating that their math teacher spends more one-on-one time with them is 0.144σ higher in the full treatment condition than in the control condition (p-value < 0.05). While the results are consistent with the time savings hypothesis, we cannot rule out that the increases in one-on-one time are due to changes in classroom practices due to using the new lessons...

Interestingly, the off-the-shelf lessons appear to have given teachers either greater time, or greater confidence, to focus on the real-life applications of maths, as well as more time to spend one-on-one with students.

Finally, Jackson and Makarin present some back-of-the-envelope calculations on the cost-effectiveness of the lessons, showing that there may be:

...a benefit-cost ratio of 939. Because of the low marginal cost of the intervention, it is extraordinarily cost effective.

Pre-prepared off-the-shelf lessons are not for every teacher (despite the evidence it provides, this study hasn't convinced me that I need to use them). But clearly, they have the potential for a positive impact on student learning. They may also have proven useful in the transition to online teaching and learning enforced by the pandemic, and in the current era of teaching both online and in-person. However, that would require a different evaluation.

Wednesday, 21 December 2022

Learning communities and the academic performance of women and minorities in STEM

I'm a big fan of peer mentoring approaches in higher education. They can help students in directly improving their grades by connecting students with senior peers in the same subject. However, the biggest impact on grades is probably indirect - having access to senior peers helps first-year students to understand the resources that are available to them, how to deal with particular styles of assessment, how best to allocate their time, and where to get additional help if they need it. These indirect impacts are likely to be largest for students who are first-in-family to attend university, because students who have parents or older siblings who have graduated already have access to that assistance.

A community of learning is a slightly broader concept than peer mentoring. There are various flavours, but mostly they involve connecting students more closely with peers, academic mentors, and study advisors. If resourced well, communities of learning have great potential to improve student outcomes, again particularly for first-in-family and disadvantaged students.

So, I was interested to read this 2017 article by Lauren Russell (Dartmouth College) recently, which was published in the journal Economics of Education Review (sorry, I don't see an ungated version online). Lauren investigates the impact of the Experimental Study Group (ESG) at MIT, which:

...aims to make the transition to MIT easier, especially for freshman who come from non-elite high schools and/or traditionally underrepresented groups in STEM. ESG features small classes and teaching methods that differ from mainstream versions of introductory subjects. Students co-enroll in courses and take advantage of dedicated study-spaces to foster peer networks. Finally, students are intentionally mentored by both MIT upperclassmen and ESG faculty. In this way, ESG combines a policy-relevant bundle of treatments designed to address obstacles to academic success at the undergraduate level.

ESG is over-subscribed, so the 55 available places each year are allocated by lottery. Russell uses that random assignment to extract the causal impact of ESG on academic outcomes, using data for 2011-15 (which are the only years for which full lottery results are available). ESG applicants, as you may expect, are different from the average MIT freshman, and:

...applicants are more likely to be female, international, or a first-generation college student. They are more likely to be black or Hispanic and less likely to be white or Asian. Finally, they are more likely to have applied for financial aid and are more likely to be Pell Grant recipients.

So, it is worth noting at the outset that ESG applicants are those most likely to benefit from a programme like this. They are also the right students to target the programme to, if it is effective. However, for all ESG enrolees on average, the results are not good:

The standardized treatment effect point estimate suggests that ESG participation increases academic performance by 0.07 of a standard deviation, though this effect is not statistically significant.

However, the results become substantially more positive when looking at subgroups, specifically female students, minority students, and low-income students:

ESG raises the academic performance of female students by 0.4-0.5 of a standard deviation with this effect statistically significant at the 5% level in the specification without controls... and the 1% level in the specification with full controls... The effects for the other two subgroups are similarly positive and large (0.3-0.4 of a standard deviation) but the standard errors are too large to infer much based on the statistical significance or lack thereof.

Those effects are large, but the lack of statistical significance for minority and low-income students should dampen our enthusiasm somewhat. That also leads to the obvious question: why is it female students that benefit? Russell looks at the interaction between student gender and instructor gender, and finds that:

...among non-ESG applicants, female students earn course letter grades 0.23 GPA points (0.29 of a standard deviation) higher when a GIR course is taught by a female instructor rather than a male instructor. They are also 11 percentage points more likely to take another course in the same subject area... The magnitude of the female student-female instructor interaction on course grade is even larger for ESG applicants. For female ESG applicants, the female instructor effect is 0.38 GPA points (0.53 of a standard deviation). This analysis suggests that the academic performance of these students may be even more sensitive to instructor gender than students who do not apply to ESG...

However, despite the apparent size of the effect, instructor-student gender interactions explain only 15 percent of the effect of ESG on grades for female students. So, we are kind of left in the dark as to how it is working so well for female students. It certainly isn't through 'excitement and the extent of hands-on coursework', 'reported self-confidence', or 'social connections and mental health', all of which are statistically insignificantly affected by ESG.

On the plus side, and going back to the full sample, ESG does appear to have an effect on choice of major, as it:

...increases the probability that a student will (single) major in math, computer science, or electrical engineering by about 10 percentage points...

The effect is particularly large for minority students (but not for female or low-income students). I don't think we should read too much into those results, particularly given that we are unsure about the mechanisms through which they are working.

So, what do we take away from this research? It provides some suggestive evidence of a positive effect of the ESG intervention on students, particularly female students. It is targeted at students who would be likely to benefit, but perhaps it is not targeted enough? Or perhaps, there is little to gain from this kind of intervention in this population of students. Remember that this is a sample of MIT students. They are already high-achievers, so the potential gains from a successful community of learning are likely to be modest. To give us a better understanding of how well these programmes work, we need to test them on a student population that has more potential to gain from them.

Tuesday, 20 December 2022

The traditional essay is dead as an assessment tool, so what should we do instead?

Unless you've been hiding under a conveniently large rock over the last two weeks, you will no doubt have heard about ChatGPT, the latest AI offering (specifically, a 'large language model') from OpenAI. ChatGPT will take a text prompt and create a convincing text response that is at least as good as what you might get from a real human. The uses of ChatGPT are nearly endless, but one use case in particular should have teachers and lecturers very worried. As the Financial Times reported earlier this week (gated):

Universities are being urged to safeguard against the use of artificial intelligence to write essays after the emergence of a sophisticated chatbot that can imitate academic work, leading to a debate over better ways to evaluate students in the future.

ChatGPT, a program created by Microsoft-backed company OpenAI that can form arguments and write convincing swaths of text, has led to widespread concern that students will use the software to cheat on written assignments.

Academics, higher education consultants and cognitive scientists across the world have suggested universities develop new modes of assessment in response to the threat to academic integrity posed by AI...

Moving to more interactive assessments or reflective work could be costly and challenging for an already cash-strapped sector, said Charles Knight, a higher education consultant.

“The reason the written essay is so successful is partly economic,” he added. “If you do [other] assessment, the cost and the time needed increases.”

The traditional essay is dead as an assessment tool. Academic integrity cannot be assured in essay writing, when students can use ChatGPT to create a convincing essay with almost no effort. However, the death of the essay need not be a bad thing. The main problem with essays are that they are not authentic assessments. Almost no student, once they have graduated, is going to have to write an essay as part of their job. Asking students to write essays is asking them to develop skills that are mostly unnecessary for the real world that they will be graduating into. That's why I haven't used essays as an assessment in any of my papers since 2005.

Written communication skills remain valuable, but students are more likely to be asked to display them in writing reports, policy briefs, or memos, not essays. Any of those other written forms would be a more authentic assessment than an essay. However, they are all likely to be vulnerable to students' use of ChatGPT.

So, if all of those written forms of assessment are vulnerable, what to do instead? Teachers and lecturers will still want students to demonstrate their learning in a written format. One alternative is to get students to submit draft pieces of writing (essays, reports, etc.) and provide formative feedback. Then, rather than being assessed on the content itself, students are assessed on how well they responded to the feedback. That might even be an authentic assessment for the future of office and policy work. If we are worried now about students writing essays and reports using ChatGPT, you can bet that there are already government reports, policy briefs, or memos where the first draft has been written by ChatGPT. Teaching students how to use these tools and improve on them (even if that means encouraging their use) may well be the best way to prepare students for their future jobs.

A second option is to move from written assessment such as essays to a question-and-answer format like assignments. Of course, assignments are also vulnerable to ChatGPT. Maybe even more so than for essays, unless the questions and the assessment criteria are carefully selected. In subjects where answers are entirely in the form of simple written answers, or answers are purely mathematical, this is likely to be a big problem. Economics is a bit lucky here, because we can write assignment questions that require a diagram and associated explanation. For the moment, ChatGPT cannot draw diagrams, and cannot refer to a particular diagram in a sensible way. That's why I continue to use this style of assessment in my ECONS102 class. However, the days of assignment questions may be numbered as well.

A third option is to eliminate the unsupervised written assessment entirely, and move to supervised writing. When I taught graduate development economics, part of the assessment was made up of short supervised open-book essays. Students, with no access to online sources including ChatGPT, could be asked to write about topics where they have been required to do some background reading (but without knowing exactly what they are going to be asked). Alternatively, we can move full circle, back to the majority of assessment being in-class written tests and exams (this is the approach that, until recently, we used in my ECONS101 class).

Finally, other in-person assessments, such as group or individual presentations, or class participation, remain viable options. However, the subjectivity of such assessment makes it difficult to justify as a large component of students' grades.

Technology is changing the face of education, and the way that we assess students needs to adapt. We shouldn't have to sacrifice academic integrity in the face of new technologies such as ChatGPT, but we do need to be smart about how we adapt teaching and assessment to take account of it. The worst thing that teachers and lecturers could do, would be to try and ignore it and hope that it goes away. ChatGPT and other AI tools aren't going anywhere, their use can't easily be policed, and they are already widely available to (and not doubt already being used by) our students. If we want to ensure that our students are continuing to meet the learning objectives, it is time to kill off the traditional essay. We have other options available to us.

Sunday, 18 December 2022

Gender bias and peer recognition at the top of scientific disciplines

The gender gap in economics is pervasive (see the links at the end of this post), and is most obvious at the top of the discipline (there have only ever been two female Nobel Prize winners in economics). While there is a distinctly larger share of males among economics undergraduates, the share of males increases as you move up to postgraduate, faculty, and senior faculty. The pipeline for ensuring adequate representation of women in economics is clearly broken (and some of the research discussed in the links at the end of this post looks at the reasons why).

However, the discipline has been taking small steps towards becoming more inclusive (although we have a long way to go). Even so, I was quite surprised by some of the results in this recent article by David Card, Stefano DellaVigna (both University of California, Berkeley), Patricia Funk (Università della Svizzera Italiana), and Nagore Iriberri (University of the Basque Country), published in the journal Econometrica (ungated earlier version here). Card et al. looked at the selection of Fellows of the Econometric Society over the period from 1933 to 2019. They fit separate models for 1933-79, 1980-99, and 2000-19, allowing the effects of gender to vary by decade within each period. The results are clear:

Across all three periods we find that publications and citations are strong predictors of election to Fellow. Cumulative Econometrica (EMA) publications play an especially large role, while those in the Review of Economic Studies (REStud) matter slightly less. Publications in the other top 5 journals and in the field journals also matter, as do citations.

While the effects of publications and citations are relatively consistent over time, the impact of author gender shifts dramatically. For the period up to 1979 we estimate a large negative impact of female gender on the probability of selection as a Fellow (145 log points – a penalty equivalent to about 1.5 extra EMA’s in models that control only for top 5 publications). For the 1980s, 1990s, and 2000s we estimate positive but more modest effects (all statistically insignificant). We then estimate a larger and highly significant effect (93 log points) in 2010-2019, equivalent to an additional EMA publication.

In other words, female economists were less likely to be selected as Fellows of the Econometric Society than equivalent male economists up to 1979, there was no statistically significant gender bias (in either direction) from 1980 to 2009, and then female economists were more likely to be selected than equivalent males from 2010 onwards. Looking at the mechanisms, Card et al. find that visibility (appointment to an editorial role at the journal Econometrica) and connections (co-authorships with current Fellows and members of the nominating committee) both matter for selection as a Fellow, but do not alter the significance of the effect of gender.

Is this recent positive gender bias in recognising the achievements of top scholars specific to economics, or more general across all disciplines? This recent NBER Working Paper (ungated version here) by the same co-authors, performs a similar analysis for the National Academy of Science (NAS) and the American Academy of Arts and Science (AAAS), focusing on the fields of psychology, mathematics, and economics, and covering the period from 1960 to 2019. The choice of those three fields is purposeful:

The three fields of focus therefore include one field that is on the higher end of female representation - psychology - one at the bottom end - mathematics - and one that was historically at the bottom but has recently caught up, at least in AAAS - economics.

Again looking at the results by decade, Card et al. find that:

For the earliest time period, 1960-79, we estimate negative or small positive coefficients, though we can never reject the null of a zero difference in the probability of selection for females relative to males with the same record. It thus appears that in this period female candidates faced roughly similar or higher bars for selection as members of the two academies compared to males. We emphasize that this apparent “gender neutrality” should be interpreted in light of the fact that women selected for honors in these years likely faced many obstacles over their careers, and that no women were selected for the NAS in mathematics until 1975 and in economics until 1989.

From the 1990s onward the estimated female coefficient are all positive, and in the last two decades they become larger and often statistically significant. For example, for the 2010-19 period the coefficients for NAS are 1.056 (s.e.=0.364) for psychology, 1.897 (s.e.=0.518) for economics and 1.173 (s.e.=0.371) for mathematics. The latter coefficient implies that in mathematics a female candidate is exp(1.17) = 3.22 times more likely to be elected than a male with the same publication and citation record.

In other words, the selection of female scientists at a higher probability than equivalent male scientists is not specific to economics, but generalises to other disciplines (although it is stronger in economics than in either psychology or mathematics).

Is this turn of events something to be applauded (as it may address decades of under-recognition of female economists, and create new female role models at the top of the discipline) or to be concerned about (since gender bias in either direction means that top performers are not being rewarded based purely on merit)? Card et al. don't provide an answer to that subjective question. However, they do offer some caution on the interpretation of their results (from the more recent piece of research above):

A possible interpretation of this finding is that members of the academies may have decided to try to redress the past under-representation of female scholars and have aimed at election rates for new members that are similar for men and women. In fields with lower female representation, such as economics and mathematics, this requires a more sizable boost to the election probability of female candidates. Conversely, in a field with more equal representation as psychology, this does not require a large difference...

We caution that our estimates are subject to the criticism that female researchers may face a harder time publishing in top journals, or receiving credit for their work. In fact there is some evidence in the recent literature of such barriers. If so, women who succeed in publishing may in fact be better scholars than men with a similar record, potentially justifying a boost in their probabilities of selection as members of the academies.

With that in mind, it is an over-simplification to consider Card et al.'s results as demonstrating a reverse bias in favour of women in recent years. What we don't know from these analyses is the underlying quality of the research that led to their selection. If publication records, credit for co-authored work, and citations are biased against women, then controlling for those in an analysis and finding that women are over-represented in election as Fellows would not be a surprise. The quality of the work could simply be shining through, even if it is not captured in the 'usual' metrics of research performance.

One thing is clear though, for economics at least. We are continuing to take small steps towards a more gender-equal representation at the top of the discipline (although, we are still substantially short of female Nobel Prize winners).

[HT: Marginal Revolution, for the first Card et al. article]

What artists tell us about cultural differences and the gender wage gap

Most of what we read about the gender wage gap is unhelpful, because it simply compares wages between genders. On the face of it, that seems like an obvious comparison. However, women and men tend to work in different occupations, and in different industries, and those differences go some way towards explaining the gender wage gap. But not entirely - there is a persistent gap even when we look within a particular occupation and industry. What explains that gap? There is evidence that suggests about half of the remaining gender wage gap might be explained by culture (see this post, but noting this counterpoint).

So, I was interested to read this new report (with additional summary here, and non-technical article in The Conversation here) by David Throsby, Katya Petetskaya, and Sunny Shin (all Macquarie University). There were a couple of reasons for my interest. First, it looks purely at earnings of artists, which mostly abstracts from any occupation differences or industry differences between the genders. To my mind, looking at artists also avoids any issues arising from the amount of time women may have taken outside the workforce, because the productivity of artists (as measured by their earnings) probably depends less on their labour force tenure than productivity does for other occupations. Second, Throsby et al. perform their comparison for Australian artists from an English speaking background (ESB, or 'mainstream artists') and a non-English speaking background (NESB) separately, allowing us to see the role that culture plays in the gender wage gap.

For ESB artists, Throsby et al. find that women earn between 22 and 27 percent less than men. However, for NESB artists:

We see no statistically significant income penalty in aggregate for female First Nations artists practising in remote areas of Australia. This result appears in contrast to the gender gap that we observe for mainstream artists.

Of course, we can't be sure as to the reason for the lack of gender wage gap among First Nations artists, but the difference with ESB artists is striking. Throsby et al. infer that:

The social structures and cultural norms within which First Nations artists in remote communities live and work reflect the long traditions of economic and social organisation that have evolved in Aboriginal and Torres Strait Islander society since before the colonial period. As such, the roles of men and women can be described as they have been in the past, namely distinctive but neither superior nor inferior. In this context, women occupy a strong and respected position...

When it comes to creative incomes, our results show that First Nations women artists practising in remote areas of Australia do not suffer from the same sorts of income disadvantage that is evident among mainstream artists... This equality could perhaps be explained by more equal incentives and opportunities for male and female artists in remote First Nations communities, but it also appears to reflect an absence of the sort of systemic gender-based discrimination that continues to affect women artists working in the mainstream.

One thing is clear. The gender wage gap is not inevitable, even if it is persistent even when discrimination is not possible (see this post). If it is indeed culture that drives a substantial part of the gender wage gap, it will take long-term cultural change to eliminate the gap.

[HT: The Conversation]

More attractive people prefer less redistribution than less attractive people

The beauty premium is well established in labour economics (see the links at the end of this post for some examples). More attractive people earn more than less attractive people, ceteris paribus (holding everything else equal). That means that differences in attractiveness give rise to differences in income, and contribute to income inequality. The main policy means that governments use to deal with inequality is redistribution (and predistribution, but we need not get into the difference between those terms here). Redistribution would then tend to undo some of the beauty premium. So, it would be interesting to know how more attractive people feel about redistribution, compared with less attractive people.

That question is essentially what this recent article by Andrea Fazio (University of Pavia), published in the journal Economics and Human Biology (ungated earlier version here) looks at. Fazio uses data from the German General Social Survey (ALLBUS) over the period from 2008 to 2018. Survey respondents were asked about their level of agreement with each of the following statements:

• “Income and wealth should be redistributed towards ordinary people".

• “Income should not be based solely on individual achievement. Instead, everybody should have what they and their family need for a decent life".

• “The state must ensure that people can live on a decent income, even in illness, hardship, unemployment and old age".

• “What one gets in life depends not so much on one’s own efforts, but on the economic situation, the situation on the employment market, wage agreements, and the social benefits provided by the state".

The first statement was asked about in the 2008 and 2018 survey waves, and the last three statements were asked about in the 2010 and 2014 survey waves. Attractiveness was rated by the survey interviewers at the start of each interview, on a scale from 1 to 11. Regressing support for redistribution on attractiveness, while controlling for individual characteristics, reveals that:

...a one standard deviation increase in attractiveness is associated with a 0.8 decrease in preferences for redistribution, while a one standard deviation increase in household income is associated with a 0.15 decrease in preferences for redistribution. In other words, the magnitude of the association between beauty and support for redistribution is half the association between household income and support for redistribution...

Most importantly, these results suggest that the correlation between preferences for redistribution and attractiveness is not fully explained by the beauty premium in the labor market...

Fazio finds similar results for the measures based on agreement with the other three statements, and some suggestive evidence that attractiveness is associated with voting behaviour (with more attractive people more likely to vote for the Free Democratic Party (FDP), and:

...albeit being a minor party, the FDP is a liberal center-right party proposing a market-oriented economy. It opposes the state intervention in the economy and advocates for a radical tax reduction...

That seems somewhat self-serving, but Fazio goes a bit further exploring the mechanisms that might drive more attractive people to prefer less redistribution, and finds that personality differences (measured by the 'Big Five' personality traits) and self-esteem don't explain it. That leaves Fazio to conclude that:

Perhaps, the relationship between attractiveness and redistributive preferences might depend on how attractive individuals rationalize the success they gain thanks to their beauty. An example can be the self-serving bias, i.e., people tend to attribute success to their own actions and failure to external factors. Attractiveness improves a considerable number of socio-economic outcomes, but good-looking subjects might hardly recognize that part of their success depends on their beauty.

It's an interesting conjecture, but Fazio isn't able to test it. People are able to rationalise all sorts of things. Perhaps this is one that could be tested by follow-up experimental work? In the meantime, all that we can conclude is that attractive people earn more, they prefer less redistribution, and their preferences for less redistribution are over-and-above the effect of income on preferences for redistribution.

[HT: Marginal Revolution, via this PsyPost article]

The price elasticity of demand for public transport

The cynical vote grab that was the petrol excise tax and public transport fee reduction is coming to an end. However, it has provided one piece of interesting information for us. As RNZ reported today:

A further extension of half-price public transport fares until March is being called "short-sighted" by public transport advocates...

Fares were halved in April, after Russia's invasion of Ukraine created a global energy crisis causing fuel prices to skyrocket.

The government hoped cheaper fares would encourage more people onto buses and trains - better for the planet, and for people's pockets.

A three-month survey run by Waka Kotahi, ending in August, found half-price fares moved 7 percent of journeys onto public transport, and about 3 percent of those journeys would otherwise have been made in cars.

The price elasticity of demand for a good or service is a measure of how responsive the quantity demanded is to a change in price. It is calculated as the percentage change in quantity demanded divided by the percentage change in price. It tells us whether demand is elastic (very responsive to a change in price) or inelastic (not very responsive to a change in price).

In this case, the price of public transport went down by 50 percent, and the quantity demanded increased by 7 percent. To a first approximation, that means that the price elasticity of demand for public transport is equal to [7/-50] = -0.14. That's very inelastic demand. The price halved, but public transport usage barely changed.

Why would demand for public transport be very inelastic? In my ECONS101 class, we go through some of the factors that make demand more or less elastic. They include: (1) the availability of (close) substitutes; (2) the proportion of income spent on the good; (3) the significance of price in the total cost to the consumer; (4) the definition of the market; (5) time horizons; and (6) whether a good is normal or inferior. In most cases, it is the first two factors that have the biggest impact, and that is likely the case here.

First, are there many substitutes for public transport? Passengers could take their own car, or they could use some active mode of transport (for example walking, or cycling). Are they close substitutes to public transport? All of those options are a lot more flexible than public transport. They offer door-to-door (more or less) transport, whereas the timetable and route of public transport is set and inflexible. So, they may not be very close substitutes for public transport at all.

Second, the cost of a public transport fare is not high, as a proportion of passengers' total income. I'm sure that some (maybe many) passengers spend a lot on public transport fares, but as a proportion of their income their public transport spending probably isn't large. A change in the fares probably doesn't affect disposable income by enough to affect the number of public transport trips that each passenger takes. Moreover, halving the public transport fare makes it take up even less of each passenger's income (which was the point), but that would have made demand even less elastic.

So, public transport fares are demand inelastic, because they aren't able to attract large increases in patronage when fares decrease. All of this is bad news for environmental and public transport activists, because if a halving of fares isn't going to generate a large shift in transport modes, it is hard to see that even making all public transport free would have much of an effect either. And then what options are left?

Tuesday, 13 December 2022

Reasons for pre-drinking, and pre-drinking consequences

This week I'm busy with fieldwork, looking at alcohol home delivery (more on that in a future post). Home delivery is a potentially big issue because it has become much more of the norm in the time since lockdowns. Home delivery also changes the dynamics of pre-drinking (and may actually lead to pre-drinking being the whole drinking experience for a lot more people). That would be a big change from some of my earlier research on pre-drinking (see more on that here).

Putting aside for a moment how home delivery affects pre-drinking, I was interested last week to read this new article by Florian Labhart, Koen Smit, Dan Anderson-Luxford, and Emmanuel Kuntsche (all La Trobe University), published in the journal Addictive Behaviors (sorry, I don't see any ungated version online). Labhart et al. collected data from 193 Swiss drinkers (from Lausanne and Zurich) on their motivations for pre-drinking, and the consequences of their pre-drinking. In terms of motivations, they used the Pre-Drinking Motivations Questionnaire (PMQ):

The PMQ assesses motivations to engage in pre-drinking alongside three dimensions; a) fun/intoxication (e.g., ‘Because it makes the rest of the evening more fun’, ‘To go out while already being properly drunk’), b) conviviality (e.g., ‘To meet new people’, ‘To have enough space to all be together’), and facilitation (e.g., ‘To increase self-confidence before going out’, ‘Because I cannot drink alcohol during the rest of the night’).

Consequences included:

...whether or not each of ‘the following situations occurred during or since last night’. Response options were ‘yes’ (coded as 1) or ‘no’ (0). Assessed consequences included hangover (headache, upset stomach, etc.), impaired driving (driving after drinking three or more alcoholic drinks or consuming illegal substances), blackout (inability to remember what happened, even for a short period of time), risky sex (unintended or unprotected sex), fight (involvement in a fight or a quarrel), and injury (injury to yourself or someone else).

Then comparing the consequences for different pre-drinking motivations (PDM) using regression models, they found that:

...higher conviviality PDM were associated with higher odds of risky sex and with lower odds of blackouts. Additionally, higher fun/intoxication PDM were associated with lower odds of risky sex. However, no association was found between PDM and hangovers, fights or injuries, and impaired driving over and above night level alcohol use.

Pre-drinking motivations seem to matter, at least a little bit. Those who are drinking to be convivial ('to meet new people') are more likely to engage in risky sex, which makes sense (for certain values of 'convivial'!). They are also less likely to experience blackouts, which makes sense given that they are less likely to drink excessively than those with other motives. I am a little surprised about the negative association between risky sex and pre-drinking for fun/intoxication. Labhart et al. don't have a convincing explanation for it (in fact, they don't mention it in their discussion at all). It may be a bit of an artefact of the analysis though, because:

...of the 193 participants who reported pre-drinking at least ‘some of the time’ in the baseline questionnaire and reported the consumption of at least one alcoholic drink, 55 were excluded from the analyses since they did not report any experience of alcohol-related consequences during the event-level study.

That seems like a really odd choice to me. If you are interested in the chances of a particular consequence of pre-drinking occurring, there doesn't seem to be a good reason to exclude pre-drinkers who experienced no consequences at all. In fact, omitting them likely biases the results, and it is disappointing that we don't get to see what the unconditional (or, at least, not conditional on at least one consequence) correlations between motivations and consequences are. On top of that, of course, the analysis only shows us what motivations are correlated with which consequences, and don't really tell us anything causal about the relationships.

Pre-drinkers' motivations probably affect how they approach their night of drinking and partying, and different motivations probably do alter the risks of different negative consequences occurring. However, this paper sadly stops short of really giving us a clear understanding of those relationships.

Fleeing drivers respond to incentives, but that doesn't mean that the current policy settings are wrong

The New Zealand Herald reported earlier this week:

Nearly 10,000 people in vehicles fled from police in the past year - more than double the number recorded prior to the police changing their pursuit policy nearly two years ago.

At the same time, the number of those behind the wheel not identified has nearly tripled, while those being held accountable have stayed the same.

These are the key reasons Police Commissioner Andrew Coster referenced in announcing recently that the police pursuit policy would be reviewed next year and a Fleeing Driver Framework introduced...

Between 2010 and 2020, 75 people died in police chases, and two in incidents when police did not pursue.

In December 2020, after a major police review, staff were told a pursuit was only justified when the threat posed by the vehicle prior to failing to stop, and the necessity to immediately apprehend the driver and/or passengers, outweighed the risk of harm created by the pursuit.

In the nearly two years since, there have been no deaths during pursuits, while four people have died in incidents after fleeing from police.

Meanwhile, data released to the Herald shows over that over the same period the number of fleeing driver incidents increased from 4846 - in the 12 months prior - to 9499 in the 12 months to November this year.

The number of incidents where the offender was not immediately identified nearly tripled, from 2419 to 6412, while police proceedings remained relatively steady, moving from 3374 to 3484.

When the police decide to stop a vehicle, the driver of that vehicle has a choice: they can stop, or they can flee. The economist Gary Becker described a rational theory of crime that can help us understand the driver's choice. In Becker's theory, a rational driver would consider the costs and benefits of fleeing, and if the benefits outweigh the costs, they would choose to flee. When the police have a policy that reduces the chance that they will pursue a fleeing driver, that increases the chances that a driver will escape any punishment. That increases the benefits of fleeing, meaning that more drivers will choose to flee. Not all drivers will choose to flee, because some drivers are more susceptible to the social and moral costs of bad behaviour, but more drivers will flee than if the police were more willing to engage in pursuits. So, it should be no surprise that we have observed more fleeing drivers.

However, just because there are more fleeing drivers, that doesn't mean that the policy is a bad policy. Each policy comes with benefits, and costs. The current (since December 2020) police policy has resulted in no deaths during police pursuits, compared with 75 over the prior ten years. That suggests a benefit of the policy might be around 15 lives saved over the two years that it has been in place. How much benefit is that? The Ministry of Transport uses a value of a statistical life of $4.88 million in measuring the benefits of road safety. Using that value for each life saved as a result of police not engaging in pursuits, the benefits of the policy change might be as much as $73.2 million. That's not all of the benefits of the policy change though, as it doesn't take into account benefits from reductions in non-fatal injuries, or damage to vehicles and other property resulting from fewer pursuits. The estimate of $73.2 million in benefits is clearly an underestimate, but useful as a benchmark.

What about the costs of the policy? There have been more fleeing drivers, and police proceedings have not increased. So, that means that more drivers have gotten away with whatever it was they were going to be stopped by the police for doing. That leaves those drivers free to do more of those activities, and that has a social cost. Perhaps that means more speeding, unsafe driving, and other traffic violations, but also more stolen vehicles, which can be used in committing other crimes.

To assess the cost of the police non-pursuit policy, we need a sense of the costs of those additional social harms. One example that has been raised is ram raids, which have increased concurrently with the change in police pursuit policy. Focusing just on ram raids, this Police OIA response suggests that there were about 250 ram raids in the first six months of 2022, a 500 percent increase over 2018. This Stuff article provides the costs to a retailer of two instances of ram raids ($13,500 and $15,000). Combining those figures suggests a social cost of ram raids of about $12.5 million (scaled up to two years). That assumes that all of the ram raid increase is purely due to the change in policy pursuit policy, which is unlikely. However, it doesn't take into account any of the other non-ram-raid increases in social harm that may have arising due to more fleeing drivers. Clearly, the total cost of the policy is going to be more than $12.5 million over two years. The key question is whether all of those other social harms add up to more than the additional $60 million that would offset the value of lives saved in pursuit-related crashes that didn't happen.

Coming back to the New Zealand Herald article, the Police are being forced into a rethink of their current approach to fleeing drivers:

[Police Association president Chris Cahill] said while the policy needed to be reviewed, it could not ignore the fact there were no deaths.

“We don’t want to go back to a situation where people are dying in fleeing driver incidents. Family of those that are killed are also seriously impacted, and so are our members involved.

“That’s why you can’t say the policies are a complete failure.”

But Cahill said it would appear the balance isn’t “quite right”.

It seems to me that the balance may be already right. Pursuing more drivers will increase the risk to other road users, and the benefits that arise from more pursuits seem far too low to be worthwhile. Perhaps the review of the policy will enable the Police to examine the costs and benefits of any change to the policy carefully. The numbers I pulled together in this post were based on 15 minutes of casual web searching. A more thorough approach would give much greater confidence (although I doubt that the conclusion would change).

Finally, I thought that as a country we were supposed to be taking a 'road to zero' approach to road safety - given that we have now experienced zero pursuit-related deaths for a couple of years, isn't a resumption of police pursuits a road away from zero?

Friday, 9 December 2022

The $300,000 job that no one wants

Every time I see a story like this in the news, my first thought is compensating differentials. The New Zealand Herald reported yesterday:

A Perth mining company has been forced to look for workers from New Zealand after Aussies continually turned down a $300,000 job.

Mineral Resources has launched an advertising campaign targeting Kiwi tradies, guaranteeing “a great pay packet”, News Hub reported.

“We’re offering plenty,” Mineral Resources CEO Mike Grey told NZ programme AM.

“The incentives are amazing, and I have no doubt that our salaries double [New Zealand salaries], in some examples, they triple.”

The old adage that when something seems too good to be true, it usually is, applies. When a job is offering a pay package that is far higher than comparable work elsewhere, you should be asking, what is wrong with that job? There must be something about the job that means the employer has to pay a much higher salary in order to attract people to work there.

Wages differ for the same job in different firms or locations. Consider the same job in two different locations. If the job in the first location has attractive non-monetary characteristics (e.g. it is in an area that has high amenity value, where people like to live), then more people will be willing to do that job. This leads to a higher supply of labour for that job, which leads to lower equilibrium wages. In contrast, if the job in the second area has negative non-monetary characteristics (e.g. it is in an area with lower amenity value, where fewer people like to live), then fewer people will be willing to do that job. This leads to a lower supply of labour for that job, which leads to higher equilibrium wages. The difference in wages between the attractive job that lots of people want to do and the ~~dangerous~~ unattractive job that fewer people want to do is called a compensating differential.

Is the $300,000 salary for working in the Australian mines a compensating differential? It seems so:

Despite the high incomes and generous incentives, mining might not be the ideal career for everyone.

One of the downsides to the job is at least two-week blocks of work in isolated areas with sometimes basic living conditions.

The 12-hour shifts often involve heavy physical labour in a high-pressure environment where mistakes can cost a company millions and so result in instant redundancy, meaning a high turnover rate of staff.

The high salary is compensation for the negative non-monetary characteristics of the job. The worst negative characteristic (for most people) would be spending long periods of time in a remote location, far from nice amenities. The premium that the mining companies are offering isn't enough to entice Australians to work there. Australian wages are higher than New Zealand wages, so the pay premium is larger for a New Zealander working in the mines than for an Australian. By advertising in New Zealand, the mining companies are hoping to take advantage of that. Of course, remote Australia is even more remote from New Zealand than it is from the rest of Australia. Maybe they'll just have to offer to pay even more?

Dietrich Vollrath on why the whole world isn't rich

You could argue that the original question of economics is why some countries are rich, and others are poor. After all, it was the topic of Adam Smith's 1776 book An Inquiry into the Nature and Causes of the Wealth of Nations. After 250 years of study, we must have a pretty good idea of the answer. And yet, as Dietrich Vollrath outlined in this article in Asterisk last month, we really don't. Vollrath (who blogs at growthecon.com) does a great job of summarising what we know and, importantly, what we don't know. The latter is, unfortunately, still a lot.

The causes of economic growth and development remain a bit of a mystery. Factors of production obviously matter, but why some countries have rapidly increased their factors of production and others have not is unclear. Institutions also seem to matter. I recommend that you read the whole article, as it gives a very clear sense of the state of knowledge. In terms of institutions though, this bit on democracy seems the most important in terms of what we know with some certainty:

A good example is from Acemoglu and Robinson along with coauthors Suresh Naidu and Pascual Restrepo... They show that the transition to democracy leads to higher economic growth in the future, finding GDP per capita is around 20% higher in a democracy compared to an otherwise identical nondemocracy. What they see is that countries that democratize invest significantly more in public health and education, consistent with the initial work that Mankiw, Romer and Weil and Alwyn Young did on economic growth.

They explicitly take on all of the empirical issues I complained about above. They do not try to quantify “democracy” along some arbitrary scale (e.g., North Korea is a one, the U.S. is a seven, etc.). They instead focus on a simple comparison of places that clearly democratized versus those that did not. They use several methods to try to assure themselves, and us, that their results are coming from the causal effect of democracy on growth, and not the other way around. This includes a sort of natural experiment where democratization is more likely to occur when more neighboring countries are democracies.

Some counterexamples may immediately come to mind. South Korea, whose economy took off in the ’60s, did not democratize until 1988, and China has undergone impressive economic growth without democratizing at all. But once Acemoglu, Naidu, Restrepo and Robinson make the comparison across all countries, it turns out that their experiences are something of an outlier, not the norm.

The research by Acemoglu et al. that Vollrath refers to is here (with ungated earlier version here). Development economics was where I started my journey as an economist. There has been a large-scale shift from 'macro' development towards 'micro' development in recent years. That may help to explain why we don't have answers to the macro questions of development (and is related to critiques that Lant Pritchett has made of randomised control trials in development - for example, see here). Perhaps economics needs to go back to its roots, and study the question that occupied Adam Smith nearly 250 years ago.

[HT: Ranil Dissanayake]

Monday, 5 December 2022

The prevalence of cheating in online multiple-choice exams

In my ECONS101 class, we have weekly online tests, comprised of multiple choice questions, with a couple of additional calculation questions thrown in. While the online tests contribute to students' grades, the contribution is small (each test is worth about 1 percent of the students' grades). The purpose of the online tests in that paper is to get students to engage with learning each topic as we go, and to give them quick feedback on their learning, rather than to test their knowledge of the content and how to apply it. Even though the student code of conduct precludes it, no doubt some students work together on the online tests, and that is part of the reason why they are worth so little towards the students' overall grades.

The pandemic caused an immediate change to assessment procedures. Many lecturers, who previously would have conducted in-class tests or exams, were forced to shift these assessments online. When the contribution of an online test to a student's grade is much greater, there is a much greater incentive for students to work together, and we should expect much greater levels of academic integrity issues. But how much greater?

That is the question addressed in this new article by Flip Klijn (Barcelona School of Economics), Mehdi Mdaghri Alaoui (Universitat Pompeu Fabra), and Marc Vorsatz (Universidad Nacional de Educación a Distancia), published in the Journal of Economic Psychology (open access). Klijn et al. report on a randomised experiment they conducted when classes were rapidly moved online at Universitat Pompeu Fabra, only two weeks before the final exam. Their exam was 100 percent multiple choice, and when moved online they set it up so that students were able to view only one question at a time, and once they answered a particular question, they could not backtrack. By randomising the order in which students were shown particular questions, Klijn et al. test whether students who saw the same questions later are more likely to get those questions correct, and spend less time on them (both of which would indicate that some students who saw questions later were copying answers from those who saw them earlier).

Their data come from an introductory game theory class of 494 students. They find that:

First, the students that received a given problem in the later round performed better in terms of higher correctness and shorter completion time. Second, with respect to the questions of the problem that was not subject to order randomization, no significant differences regarding correctness and completion time are found for the different exam versions.

Finally, they gave half of students a reminder notice about academic integrity halfway through the online exam. However:

...the reminder of the university’s code of ethics... did not affect the correctness of the answers to nor the completion time of subsequent questions.

Of course, Klijn et al. don't know who, if any, of the students were actually cheating. However, they undertake a simulation exercise that establishes an upper-bound of 8.7 percent of students copying from each other. I'm unsure on whether that seems higher, or lower, than I would expect. Klijn et al. conclude with the suggestion that:

...giving all students the same questions seems a risky procedure for on-line exams, especially if there are no further measures to inhibit cheating. In fact, a fair and possibly more cheating-proof procedure in this case would be precisely the opposite of a unique list of questions: for each question, a sufficiently large number of different versions should be generated and then randomly assigned to students. Here, ‘‘different versions’’ refers to scaling, switching, etc. of numerical values, and depending on the permitted procedures by the university’s authorities, a potentially wider range of variations.

That seems feasible in the case of multiple choice (and calculation-style) questions, and in fact is something that I already apply in my ECONS101 weekly online tests (though not for all questions). However, not all online exams are 100 percent multiple choice, and nor should they be as that limits the ability to test students' skills in applying what they have learned. It seems to me, though, that open-ended questions are even more susceptible to academic integrity issues in online tests (and that was our experience in 2020, when we had online tests and assignments, and I sent a large number of students to the student disciplinary committee).

Online assessment is rife with academic integrity issues, and I don't think we have found a good way to address them. I'll post sometime in the new year about our trial in the most recent trimester, where we gave students the option of completing weekly video reflections in place of tests and exams.

Sex, Drugs and Economics