Monday, 28 February 2022

Do payday loans make consumers worse off?

Back in 2020, I wrote a post about the consequences of banning payday loans on the pawnbroking industry. The takeaway message was that banning payday loans simply shifted borrowers into borrowing from pawnbrokers, small-loan lenders and second-mortgage licensees, none of which was necessary good for the borrowers. This new article by Hunt Allcott (Microsoft Research) and co-authors, forthcoming in the journal Review of Economic Studies (ungated earlier version here), looks at a related question: How would payday lending restrictions affect consumer welfare?

Allcott et al. start by undertaking a field experiment and related survey with clients of a large payday lending provider in Indiana. As Allcott et al. explain:

Our experiment ran from January to March 2019 in 41 of the Lender’s storefronts in Indiana, a state with fairly standard lending regulations. Customers taking out payday loans were asked to complete a survey on an iPad. The survey first elicited people’s predicted probability of getting another payday loan from any lender over the next eight weeks. We then introduced two different rewards: “$100 If You Are Debt-Free,” a no-borrowing incentive that they would receive in about 12 weeks only if they did not borrow from any payday lender over the next eight weeks, and “Money for Sure,” a certain cash payment that they would receive in about 12 weeks. We measured participants’ valuations of the no-borrowing incentive through an incentive-compatible adaptive multiple price list (MPL) in which they chose between the incentive and varying amounts of Money for Sure. We also used a second incentivized MPL between “Money for Sure” and a lottery to measure risk aversion. The 1,205 borrowers with valid survey responses were randomized to receive either the no-borrowing incentive, their choice on a randomly selected MPL question, or no reward (the Control group).

Allcott et al. use the field experiment to determine how well borrowers anticipate the extent of their repeat borrowing, and whether they perceive themselves to be time consistent. On those questions, they find that:

...on average, people almost fully anticipate their high likelihood of repeat borrowing. The average borrower perceives a 70% probability of borrowing in the next eight weeks without the incentive, only slightly lower than the Control group’s actual borrowing probability of 74 percent. Experience matters. People who had taken out three or fewer loans from the lender in the six months before the survey - approximately the bottom experience quartile in our sample - under-estimate their future borrowing probability by 20 percentage points. By contrast, more experienced borrowers predict correctly on average...

On average, borrowers value the no-borrowing incentive 30 percent more than they would if they were time consistent and risk neutral. And since their valuations of our survey lottery reveal that they are in fact risk averse, their valuation of the future borrowing reduction induced by the incentive is even larger than this 30 percent “premium” suggests.

So, borrowers on average anticipate their repeat borrowing, and they recognise that they are time inconsistent. Allcott et al. then use their experimental results, along with the results of the associated survey, to construct a theoretical model of payday loan borrowing. They then use their model to simulate the effect of various payday lending restrictions on borrower welfare, and find that:

Because borrowers are close to fully sophisticated about repayment costs, payday loan bans and tighter loan size caps reduce welfare in our model. Limits on repeat borrowing increase welfare in some (but not all) specifications, by inducing faster repayment that is more consistent with long-run preferences.

In other words, banning payday loans, or reducing the maximum size of payday loans, makes borrowers worse off. The flipside of that result is that the availability of payday loans actually makes borrowers better off. However, if policymakers are concerned about payday loans' potential negative effects, the most effective policy (in terms of borrower welfare) is to restrict the number of repeat loans that borrowers can take out. I suspect many policymakers would be surprised by that. However, an open question that is not addressed by this research is to what extent repeat lending restrictions simply force borrowers to alternative lenders like pawnbrokers (as the earlier research I discussed found).

Finally, Allcott et al. fire some shots at 'expert' economists:

Before we released the article, we surveyed academics and non-academics who are knowledgeable about payday lending to elicit their policy views and predictions of our empirical results. We use the 103 responses as a rough measure of “expert” opinion, with the caveat that other experts not in our survey might have different views. The average expert did not correctly predict our main results. For example, the average expert predicted that borrowers would underestimate future borrowing probability by 30 percentage points, which would imply much more naivete than our actual estimate of 4 percentage points.

Ouch! But it does illustrate the unanticipated nature of Allcott et al.'s results. If your model of payday loan borrowing starts from an assumption that borrowers don't anticipate their future borrowing behaviour, then you are more likely to support strong restrictions on payday lending. The poor performance of the experts in anticipating borrowers' naivete also suggests that Allcott et al. should be listened to over these other experts. It is unusual to include results like these in a paper (or even to do this sort of analysis). I wonder if Allcott et al. would have presented these results if the experts had agreed with them?

[HT: Marginal Revolution, last year]

Read more:

Sunday, 27 February 2022

A sweet way for teachers to improve their student evaluations

A couple of weeks ago, I wrote a series of posts on student evaluations of teaching (see here for the latest one, and the links at the bottom of this post for more). The overall conclusions from the literature on student evaluations of teaching is that they are biased (against women and minorities), they don't measure teacher quality very well (if at all), and many teachers (especially male teachers) don't respond to the feedback from the evaluations by improving their teaching.

As if things could get any worse for student evaluations of teaching, I just read this 2007 article by Robert Youmans and Benjamin Jee (both University of Illinois at Chicago), published in the journal Teaching in Psychology (ungated version here). Youmans and Jee investigate the impact of giving students chocolate on the student evaluations of three courses (two statistics courses, and one research methods course, all at the University of Illinois at Chicago). Their experiment was straightforward:

During the ninth week of instruction, all participants completed an informal midsemester evaluation about the lecture section of their course (the discussion sections were not evaluated at this time)... All students received the same nine-question form... For each question, the student provided a rating from 1 (very poor) to 5 (excellent)... 

The experimental manipulation involved the treatment of students prior to their completion of the evaluation forms. In half of the sections, students had the opportunity to take a small bar of chocolate from a bag passed around by the experimenter before the evaluations began. Importantly, the experimenter told participants in these sections that he had chocolate left from another function that he “just wanted to get rid of” and that the students were welcome to take a piece. The fact that the chocolate was the property of the experimenter was emphasized so participants would not misattribute the chocolate as a gift from their instructor or teaching assistant.

Now, if students are providing an unbiased evaluation of the teacher in their course, then giving them chocolate should make no difference. However, Youmans and Jee find that:

Participants who were offered chocolate gave higher ratings on average (M = 4.07, SD = .88) than participants who were not offered chocolate (M = 3.85, SD = .89), F(1, 92) = 3.85, p = .05, d = 0.33.

The effect is relatively small, but statistically significant. Giving students chocolate increases student evaluations of teaching. Sweet!

Read more:

Saturday, 26 February 2022

Book review: An Economist Walks into a Brothel

I just finished reading Alison Schrager's book An Economist Walks into a Brothel. Don't let the title fool you though - this isn't a book about sex, it is a book about risk. Specifically:

This book will show you how to mindfully take a risk and minimise the possibility that the worst will happen.

The book outlines five rules for better assessing and employing risk in our lives:

  1. No risk, no reward - "Risking loss is the price we pay for the chance of getting more";
  2. I am irrational and I know it - "We don't always behave the way economic and financial models predict when faced with a risky decision";
  3. Get the biggest bang for your risk buck - "...diversity to reduce unnecessary risk and keep your potential for more reward intact";
  4. Be the master of your domain - Hedging and insurance; and
  5. Uncertainty happens - "Even the best risk assessment can't account for everything that might happen".

Schrager is a financial economist and journalist, and the five rules are illustrated with a range of excellent stories drawn from Schrager's extensive research with sex workers in Nevada (for which the book is named), the paparazzi, professional poker players, and Kentucky horse breeders, to name a few. However, I found that the best chapter of all illustrated many of the concepts in relation to risk management by big wave surfers:

Instead of going into the ocean and hoping for the best, surfers are schooled in the "art" of risk: how to form calculated, informed risk assessments. The risk mitigation tools appear to be different from those used in financial markets, but they serve a similar purpose. Surfers form well-trained teams to raise the odds of a successful rescue (diversification). They monitor wave conditions, identify hazards (sharks, crowds, rocks, deep water, cold), and make probability estimates on the odds things will go wrong. This is so the surfers can make informed trade-offs about the thrill of riding a big wave safely (hedging). And they use the latest technology to rescue them when they wipe out (insurance).

This book could easily have been titled "An Economist Goes Big Wave Surfing", but perhaps that wouldn't have caught as much attention. Popular finance books are not plentiful, and those that are available are nowhere near as easy to read as this one. I really enjoyed the clear explanations of several finance concepts, especially the difference between hedging and diversification, which is something that many students struggle with. The only bit that I found really challenging was recognising what the 'risk-free' option is, and I still don't think I have a handle on it. Some more examples on that would have been welcome.

Overall, Schrager writes in an easy style, and the range of illustrative examples she uses (as noted above) enhances the appeal of this book, and I love the way that she concludes, noting that readers will have learned some financial economics by stealth!:

We all are smart risk takers in at least one aspect of our lives and have the potential to apply the same reasoning to every decision we make.

We can do this by understanding the science behind risk: how to define risk, how to measure it, how to identify the type of risk we face, and how to manage it. Financial economics is the science of risk, and it provides a structure to help us understand what makes a good risk to take.

If you want to understand a little bit of finance, a little bit about how to manage risk, or you just want to read some interesting stories about how other people manage risk in their work and daily lives, this is a great book to read. Highly recommended!

Friday, 25 February 2022

University tuition fees and student effort

For the last few years, tertiary students in New Zealand have been able to study for their first full year without paying tuition fees. Anecdotal evidence on the impacts on student effort is mixed. Some lecturers believe that, with the financial pressures of a large student loan lessened somewhat, students are making the most of the opportunity to study. Others suggest that it has opened up the possibility of tertiary study to some students who wouldn't have otherwise. Yet others have reported increases in the number of 'ghost students', who enrol in classes but then do nothing at all (no attendance, no assessment, no nothing). My own experience is that there may have been a slight increase in ghosting, but I'd hesitate to extrapolate from my experience (even putting aside pandemic-related disruptions and simply concentrating on the semesters before coronavirus was even a word most people knew).

So, I've kept my eye out for any 'real' evidence of the impacts of eliminating tuition fees on student effort or academic performance. There hasn't been anything obvious from New Zealand so far, but I did see this 2018 article by Pilar Beneito, José Boscá and Javier Ferri (all University of Valencia), published in the journal Economics of Education Review (ungated earlier version here). Beneito et al. use data from students at the University of Valencia (UV), exploiting a law change where: 2012, the Spanish government passed Law 14/2012, in a self-declared attempt to rationalise public expenditure by shifting a higher part of the costs of education onto the students. In the Spanish case, depending on the university and on the number of times the student has registered for the same module, tuition fees may be almost triple those charged before 2012.

So, this isn't investigating the case where tuition fees were removed, but instead investigating what happens when tuition fees are increased, substantially. However, the change didn't apply equally to all students. As Beneito et al. explain:

...UV students do not pay a fixed amount per year of enrolment. Instead, they register and pay tuition fees for a number of modules, and the price of each one is set according to the number of credits the module offers and the number of times the student has taken the module. Students are allowed to take the final examination of a module twice per paid registration (i.e., they have two ‘chances’ per payment). At the beginning of the course 2012-13, Law 14/2012 established a price increase that depends on the number of times the student has registered for a particular module before passing the corresponding exam. According to the law, university tuition fees should cover between 15% and 25% of the total cost of education for a first-time registration, between 30% and 40% for a second registration, between 65% and 75% if a student registers for a third time, and between 90% and 100% for fourth and subsequent registrations.

In other words, each time a student failed or did not complete a given module, the fee they would pay the next time they enrolled in that module would increase. Now, in theory that should provide a strong incentive for students to increase their effort and pass more modules sooner, and Beneito et al. provide a theoretical model that demonstrates that. Then, turning to the data from all students enrolled in business, economics, or medicine between 2010 and 2014 (and employing a difference-in-differences analysis that exploits the fact that a small number of students were exempt from the fees policy), they find:

...positive effects of the fee rise on UV students’ level of effort, reflected in a lower number of registrations required to pass a module and a higher probability of passing with the first registration. The results are more visible in the case of average-ability students, and also in economics and business than in medicine. Positive effects on grades are also evident, although in this case top students show larger estimated responses.

Specifically, they find that the number of module registrations per student dropped by around 0.2 in 2014, and was only statistically significant in that year (the final year of the analysis). The effect was larger for business and economics students than it was for medicine students. However, these results are weakened by the small and very selected nature of the control group - 'exempt' students include:

...students belonging to a particular category of large families (those with 5 or more siblings), students with an officially recognised degree of disability equal to or greater than 33%, and victims of terrorism (this last category fortunately provides very few observations).

I'm not convinced that control group necessarily provides for a robust analysis. However, taken at face value this provides some weak evidence that higher tuition fees increase student effort. By extrapolation, and reversing the direction of change, we might infer that students in New Zealand exert less effort in response to a lowering of tuition fees (to zero in their first year). However, we still really need to see some analysis of the New Zealand case.

Wednesday, 23 February 2022

When there's no exams, students' exam nervousness is lower

I just read this new article by Yoosik Shin (Korea University), published in the journal Economics and Human Biology (sorry I don't see an ungated version online). It might win a prize for the most obvious conclusion, but I'll get to that in a moment. Shin investigates the impact of South Korea's 'Free Semester' (FS) in middle school (and if you're wondering why I read the paper, it's because I misinterpreted the title, and thought it might provide some insights into the impacts of making a semester free in monetary terms, like New Zealand's first year fees free at university). The FS is:

...a one-semester program targeting middle school students. During this semester, students experience a variety of learning modalities and career exploration activities... One of the key features of this program is that written tests or examinations are not taken during this semester. That is, there is no formal grading of exams, which might affect students’ entry to high school, during this semester.

The FS is usually implemented in the second semester of the first year of middle school. The purpose of the FS is to:

...enhance the happiness of students by providing them the opportunities to explore their dreams and aptitudes, without being overwhelmed by the burden of exams...

Apparently, there are debates as to whether the FS increases students' mental wellbeing, or whether it simply increases pressures in the remaining middle school years. Shin uses data from the Korean Education Longitudinal Study 2013, which followed students who were in fifth grade (the second-to-last year of elementary school) in 2013. The dataset includes over 5700 students. When the FS was rolled out, these students were in middle school, but not all encountered the FS. Shin uses this difference in exposure to the FS to implement a difference-in-differences analysis (comparing the difference in exam nervousness, which is measured each year in the sample, before and after the FS was implemented, between students who were and were not affected by it.

Their main findings are easily summarised in Figure 1 from the paper:

Treated and control studies had similar trends in exam nervousness at elementary school, but then in the first year of middle school (when they were exposed to the FS), exam nervousness was lower for those that had the FS than those that did not. However, there was no medium-term impact, as by the second year of middle school both treatment and control students were similar in exam nervousness again. As you can see, it seems obvious. Interestingly, a subgroup analysis shows that the effects are only statistically significant for high-achieving students. When you consider that high-achieving students are more likely to be anxious about their exam performance, that result seems obvious too.

What is interesting is this bit from the conclusion:

...middle schools are now encouraged to apply the learning systems and progress-based evaluation philosophy of the FS to regular semesters.

It would be interesting to see whether students in schools that applied the FS system across multiple semesters (rather than one) differed in high school academic performance from those that experience only one FS semester. That is a more question with a potentially more ambiguous, and more policy relevant, answer.

Monday, 21 February 2022

What is the optimal academic calendar for universities?

Different universities employ different academic calendars. Many universities have a semester calendar. Other universities have a trimester calendar (the University of Waikato fits somewhat in-between these two, with two normal-length trimesters, and a summer trimester that is slightly shorter). Still other universities (although none in New Zealand) employ a calendar of four terms (or quarters). Which option is best for student learning? Some recent research may help us get part of the way to answering that question.

This new article by Valerie Bostwick (Kansas State University), Stefanie Fischer (Monash University), and Matthew Lang (University of California, Riverside), published in the American Economic Journal: Economic Policy (ungated earlier version here) compares outcomes between semesters and quarters. Bostwick et al. first note the theoretical ambiguity in which system is best:

A priori, the effects of the calendar system on student outcomes are ambiguous. A semester calendar has longer terms, requires one to take more courses per term to remain a full-time student, and operates over a different set of months than a quarter calendar. As such, semesters may be more conducive to learning and/or degree attainment, as there is a longer time horizon to master complex material. They may also provide more summer internship opportunities due to their earlier end dates in the spring term. On the other hand, it is possible that the longer terms unique to semesters may allow one to become complacent or procrastinate between exams, leading to poorer performance. Moreover, the greater number of simultaneous courses in a semester term may be difficult to juggle and/or pose scheduling challenges.

Bostwick et al. then use two datasets to investigate the differences in student outcomes between a semester academic calendar and a quarters academic calendar. The first dataset covers nearly all non-profit four-year colleges and universities in the US, covering all cohorts of students that began their studies between 1991 and 2010 (except for the 1994 cohort, where graduation rates were not available in the dataset). Their data has nearly 14,000 annual observations, from over 700 institutions. They then make use of the fact that some universities changed their academic calendar from quarters to semesters, and run both an event study analysis and a difference-in-differences analysis. Both analyses essentially compare the difference in four-year and six-year graduation rates between the period before and after the change in calendar, between universities that changed to semesters and those that did not.

Their results from this first analysis are neatly summarised in Figure 2 from the paper:

The years from -9 to -4 are the time before the shift to semesters for the treated universities. In the years from -3 to -1, students did part of their studies under the quarters calendar and part under the semesters calendar (the universities are 'partially treated'), while in the years from 0 onwards, all students did all of their studies under the semesters calendar. Notice in the top panel (four-year graduation rates), there is a clear downward shift once universities make the shift to semesters (fewer students graduate within four years). In the bottom panel (six-year graduation rates), the effect is smaller and not statistically significant.

Bostwick et al. then go on to look at individual student-level data from the Ohio university system, which allows them to explore the mechanisms that underlie the effects observed at the university level. Specifically, they have data from all students enrolled at 37 campuses for the first time between 1999 and 2015 (over 700,000 observations). Using these data allows them to look at a greater range of student outcomes, including graduation rates, credit workload, grades, and whether they switch major. They also link the student data to employment data from the Ohio Department of Job and Family Services, which allows them to look at in-semester employment and summer internships (which they proxy by summer employment not in the retail or food services industries). The analysis approach is similar to that employed at the university level (event study, and difference-in-differences), which is supported by variation across campuses:

There are 16 campuses in the data that were already on a semester calendar at the start of the sample in 1999. Four campuses switched from a quarter calendar to semesters over the course of the following decade. All of the remaining campuses in the state switched to a semester calendar in the fall of 2012 by mandate of the Ohio Department of Higher Education...

In their student-level analysis, Bostwick et al. first establish that the university-level results (lower rates of successful graduation within four years, or five years) still hold. They then explore the mechanisms for these results noting that:

There are a number of potential channels that could drive dropping-out behavior and/or an increase in time to degree. First, students and advisors may have difficulty navigating the transition to a new calendar system. We rule out this proposed channel because the estimated effects... are clearly evident in the long term...

A second potential channel is that students may find it challenging to juggle more simultaneous courses per term, as is required with a semester calendar. If this is a primary channel, students may earn lower grades or underenroll - that is, take fewer credits per term than what constitutes a full load... Lower grades could lead to an increase in time to degree if students are retaking courses for a better grade. Furthermore, if a student’s grades are low enough, they may face academic probation and potential dismissal from the university.

Finally, reduced scheduling flexibility associated with semesters caused by the longer-term length and higher number of required courses per term may be an important channel. Students might opt to take fewer courses per term to avoid unappealing class times (e.g., early morning classes)... It is also possible that scheduling flexibility impacts the timing and/or likelihood that a student switches majors, as major exploration is more costly under a semester calendar. Students who take longer to settle on a major are likely to experience a longer time to degree.

Bostwick et al. investigate all of these potential mechanisms, finding some weak evidence for a reduction in taking a full course load, but stronger evidence of a reduction in GPA (and increased risk of academic probation) and lower probability of switching major. Discussing this mechanism analysis, they note that (emphasis is theirs):

First, the higher number of courses per term may produce several of our findings. Students may find it difficult to balance more courses and topics simultaneously. This could explain the increase in the probability of falling below the 2.0 GPA cutoff. At the same time, some students may simply enroll in fewer credits per term (i.e., four courses instead of five) to avoid taking too many different courses at once... It is also possible that the higher number of courses in a term presents more of a scheduling challenge, particularly if a student wishes to avoid class times outside of the standard 9-5 school day...

Second, the increased length of the term may be at play. Longer terms could incentivize procrastination. There are longer periods between exams and more time to put off studying. It is possible that this type of behavior leads to lower grades and an increased probability of earning a GPA below a 2.0...

Additionally, longer/fewer terms mean that experimenting with a major takes more time. If, for instance, there are a set number of courses needed to learn about the match between one’s skills/interests and major, then this learning is more costly in a semester calendar, as one must commit to at least half a year in a major, compared to only a third of the year in a quarter system. Our findings on the timing of major switching are consistent with this proposed mechanism: students are no less likely to switch majors overall, but they are doing so later on in their college careers. 

 Finally, Bostwick et al. look at employment outcomes, and find that:

This analysis does not provide compelling evidence that the switch to a semester calendar improves summer employment in the types of jobs that are most likely to represent internship employment...

There is also some evidence of reduced in-semester employment, consistent with higher study workload. Overall, this study provides some compelling evidence that the shift from quarters to semesters has negative impacts on students. The disappointing thing is that there was no analysis of universities that have shifted in the opposite direction (from semesters to quarters), because very few have done so. Also, there is little guidance for universities with a trimester system, which were included in the quarters sample (although the analysis is apparently robust to changes in how those universities are treated).

Now, if we should prefer quarters over semesters, it is worth considering how far universities should shift in that direction. MBA or other executive education classes, for example, are often taught as block courses, over just a few weeks (I have taught on courses like this in the past, both at Waikato and through the New Zealand Institute of Highway Technology).

Victoria University in Melbourne recently moved to a block course format for their first year, and that change was investigated in this recent report (which was also discussed on The Conversation). As part of the change, they abolished traditional lectures, and moved to a format that emphasised smaller class groups. Unfortunately, the analysis in the report lacks the thoroughness and analytical rigour of Bostwick et al., being based mostly on qualitative data drawn from surveys with a small number of staff, academic leadership, and students. The small amount of quantitative results that are presented lack any statistical tests, so we are left to guess how meaningful the changes are. Nevertheless, the study does provide some interesting results that should encourage further exploration of this model, particularly: grades had dropped by 9.2 percentage points from the most recent pre-Block (2017) to post-Block (2019) cohorts, dropping 9.8 percentage points for equity students...

From 2017 to 2018, pass rates increased by 9 per cent for students in the highest socioeconomic status (SES) group, and 15 per cent for those in the lowest. Pass rates for students who were first in their family to attend university increased by 13 per cent, compared to 11 per cent for those who were not...

Now, pass rates can change for a number of reasons, so are not necessarily a reliable indicator of student learning. However, the bigger impacts on pass rates for students from lower socioeconomic groups and those who are first in family to attend university are encouraging (since these are groups that face substantial challenges in their transition to university study). The qualitative analysis also identified that:

Relationships between teaching staff and students were enhanced by the small Block classes...

Of course, small-group learning is not entirely absent in traditional university models, and some noted its similarity with tutorials... or Foundation Studies classes, which typically have small numbers of students... The difference with the Block was that academics were also immersed in the small-group format, for intensive periods with the same students.

Besides making learning more “enjoyable”... the closer relationship has positioned staff and students as partners in the learning experience.

Of course, not all of the positive improvement in staff-student relationships comes from the block format per se, since the smaller class size will facilitate that as well. However, anything that breaks down the divide between students and staff is a good thing (and is something that I work hard on in my large classes, but is much easier in a smaller class) and will help create a more welcoming environment. On that note as well:

The transfer of energy and enthusiasm from teaching staff to students was a clear, simple factor in the Block Model’s success. For equity group students, the teaching staff embodied a highly motivating university environment that welcomes, supports and believes in them.

The main (perhaps only) negative thing that comes out of the report is the effect on workloads. Students reported higher satisfaction with their classes in all aspects except for workload, and for staff:

...the intensive structure places high demands on staff workloads, especially in delivering assessment results in an extremely short timeframe...

So, the block model sounds interesting, and may have positive effects on students, particularly for equity groups. It is definitely worth exploring further, with a more rigorous study of its effects. Overall, it is clear that the academic calendar is an important consideration for universities, and along with the role of online learning, should be considered carefully in the future.

Sunday, 20 February 2022

The willingness to pay for wine bullshit

Consumers often can't tell the difference between two similar substitute products. I've blogged previously about bottled water, but people can't even tell the difference between dog food and pâté (ungated earlier version here). In the bottled water research, people couldn't match different bottled waters to their descriptions. That may be because water descriptions are mostly bullshit - after all, this article by Richard Quandt (Princeton University) notes that wine descriptions are mostly bullshit too.

That brings me to this recent article by Kevin Capehart (California State University), published in the Journal of Wine Economics (sorry, I don't see an ungated version online). Capehart looks at the bullshit wine descriptors that were described as bullshit in Quandt's article (descriptors such as 'silky tannins', 'velvety tannins', 'brawny', and a flavour of 'smoked game'). He then uses various methods to estimate consumers' willingness to pay for bullshit. Specifically, Capehart employs three methods:

I start by using a hedonic regression similar to regressions used by previous studies on wine prices and descriptions...

The second method uses the same dataset used for my hedonic regression, but I draw on approaches for matching rich texts... in order to obtain matching estimates of consumers’ MWTP [marginal willingness-to-pay] for select descriptors. My third method is a stated-preference survey in which I directly ask approximately 500 wine consumers about their MWTP for select descriptors.

For the hedonic regression model, Capehart draws on the descriptions in various online wine catalogues, leading to a dataset of over 51,000 wines. Looking at the effect of different descriptors on wine prices, he finds that:

Despite their joint significance, many of the descriptors have effects that are not statistically significantly different from zero at conventional levels. Examples of descriptors with statistically insignificant effects include “silky” and “silky tannins.”...

Some descriptors do have effects that are statistically different from zero. Of the 106 descriptors, 43 have effects that are significantly significant at the 10% level. Yet, some of those statistically significant effects are not substantively significant. For example, the effect of “velvety tannins” is statistically different from zero (p-value = 0.03), but it is arguably small at only 2.5% (se = 1.1%). A 2.5% change is less than a $1 change for any bottle under $40.

So, some descriptors are associated with higher priced wines, and some with lower price wines. However, mostly the effects are small. Unfortunately, overall the hedonic regression model poses more questions than it answers. As Capehart notes:

If those hedonic results are momentarily accepted, there is much to puzzle over. Why would consumers be willing to pay so much more for wine if an expert described it in terms of the smoked game? How much more or less would they be willing to pay if the expert described the game as being prepared differently, such as by roasting, steaming, or boiling? And what if the expert described the game as a specific type of animal such as a pheasant, boar, deer, squirrel, or some elusive or imaginary creature that few if any have tasted? Questions abound.

Capehart then moves onto using a text-matching estimator:

Any matching estimator tries to match subjects who have received a treatment to control subjects who are as similar as possible, except they did not receive the treatment. After matching, the effect of the treatment on an outcome of interest can be estimated by comparing the outcomes of the matched subjects. Here, the “subjects” are wines, the “treatment” is whether a given Quandt descriptor appears in a wine’s description, and the outcome of interest is the wine’s price.

He essentially uses the 'bag-of-words' approach, which really means noting whether each description contains one or more words (that is, the actual context of their use is ignored). This analysis basically compares wines with very similar descriptions, one of which contains the particular descriptor and one of which does not. In this analysis, Capehart finds that the results are:

...generally consistent or not inconsistent with my hedonic estimates.

Ok, so again wine consumers appear to be willing to pay for some descriptors. Why not ask them about it? That's what the final analysis does, based on a stated preference survey of 469 US wine consumers, conducted online. Essentially, each research participant was asked to choose between two wines, with different prices and descriptions. Based on those hypothetical (stated preference) choices, Capehart finds that:

...most consumers have a zero or near-zero MWTP for velvety rather than silky tannins; that would be consistent with “velvety” and “silky” being synonyms and not inconsistent with my hedonic and matching estimates that suggested at most a small price premium for velvety over silky tannins.

Overall, across the three methods, Capehart concludes that (emphasis is his):

One conclusion is that most consumers seem to have little if any MWTP for wines described by most of the Quandt descriptors. My hedonic approach suggested the majority of the descriptors have a price premium of zero or near-zero. My matching and survey estimates were generally consistent or not inconsistent with my hedonic estimates, at least for the select descriptors considered.

The other conclusion is that some consumers have a non-zero MWTP for wines described by some of the Quandt descriptors. The hedonic approach suggested some descriptors have price premiums (or discounts) that are significant in the statistical sense and arguably significant in the substantive sense. My matching approach suggested the same. And my survey approach suggested some expert and novice wine consumers are willing to pay more than nothing for some descriptors.

In other words, most wine consumers are not willing to pay anything for wine bullshit, but some consumers are. If wine descriptors are mostly bullshit (as Quandt claimed), why would any consumers be willing to pay for wines that have those descriptors? That is the question that Capehart doesn't answer. Perhaps those consumers that are willing to pay a positive amount for a particular descriptor, are willing to do so simply because they feel better for knowing that they are consuming something that has been described as having 'velvety tannins' or the flavour of 'smoked game'? We don't know, so more research will be required in order to uncover the answer to the question of why.

Friday, 18 February 2022

Book review: Economics in Two Lessons

Yesterday I reviewed Henry Hazlitt's Economics in One Lesson. The takeaway from that review should be that something pretty important was missing: a recognition of market failure. And that is essentially the second lesson that John Quiggin's Economics in Two Lessons provides. Quiggin wrote the book as a (somewhat belated, but still important) response to Hazlitt's book. Quiggin's two lessons are:

Lesson One: Market prices reflect and determine opportunity costs faced by consumers and producers...

Lesson Two: Market prices don't reflect all the opportunity costs we face as a society.

In essence, Quiggin's book takes Hazlitt's first lesson, and adds in a second (and probably more important) lesson about market failure. This echoes something that Paul Samuelson wrote in this 2009 article:

When someone preaches "Economics in One Lesson," I advise: Go back for the second lesson.

Quiggin provides that second lesson, along with an extensive critique of Hazlitt's book. I really enjoyed the framing of things in terms of opportunity cost, which is something that is often not made explicit beyond the first week or two of introductory economics. For example:

Mass unemployment is an example, and arguably the most important example, of Lesson Two. The prevailing wage does not reflect the opportunity cost faced by unemployed workers, who would willingly work at this wage and could, under full employment conditions, produce enough to justify their employment.

I also liked Quiggin's focus on the importance of property rights, and like my review yesterday, the recognition that government sets the property rights and legal regime, but that doesn't mean that it cannot be changed. The chapter on redistribution (and predistribution) was also very interesting, and many bit of it will make an appearance in my ECONS102 class in future.

The book is very readable, and like Hazlitt's book, it has its share of humour:

Expecting economic benefits from a natural disaster is like hoping that a car crash will fix your wheel alignment.

The book is not all perfect though. Quiggin goes though in excruciating detail Hazlitt's description of the 'parable of the broken window', which was introduced to economics by Frederic Bastiat. However, I don't think Quiggin is correct in his critique, and neither is Hazlitt correct in his interpretation. Bastiat's point was about the stock of wealth in society, and not about employment or production, which both Hazlitt and Quiggin get hung up on. Breaking windows doesn't make us better off (Quiggin's argument in the case of a recession) or no worse off (Hazlitt's argument), because if the window hadn't been broken at all, society would have the window plus more. Or maybe I have misunderstood both of them - this was the most unclear section of an otherwise very readable book.

Market fundamentalists won't enjoy this book. But for the rest of us, it provides some useful reminders of things that we should keep front of mind.

Thursday, 17 February 2022

Book review: Economics in One Lesson

It's been a while since I last posted a book review. That isn't because I haven't been reading, but rather I have two books that I wanted to review consecutively (one today, and one tomorrow). The first book is Economics in One Lesson, by Henry Hazlitt. This book was first published in 1946, and I believe that it is still in print. I read the 'New Edition' from 1979.

The reason that this book is still in print is that it is a mainstay of laissez faire market fundamentalist economics. Before we come to that point, here's the one lesson for which the book is named:

The art of economics consists in looking not merely at the immediate but at the longer effects of any act or policy; it consists in tracing the consequences of that policy not merely for one group but for all groups.

On the face of it, that doesn't seem too controversial. In fact, understanding that policies have positive effects on some groups, and negative effects on other groups, is fundamental to economics. That is the essence of trade-offs that policymakers have to evaluate in deciding on policy. The problem comes when Hazlitt applies that rule to argue that essentially any policy that deviates a market from the laissez faire outcome makes some group worse off, and therefore should not be enacted. That approach rather misses the point of recognising the trade-offs, and fails to consider the distributional consequences of policy. That is exemplified in Hazlitt's assertion in the final part of the 1979 edition, where he asserts that:

Government's main economic function is to encourage and preserve a free market.

I'm sure that assertion would come as a surprise, and be rightly ridiculed, by the majority of political scientists, as well as many economists. A more sensible consideration of the role of markets is to be found in John McMillan's 2002 book Reinventing the Bazaar (which I reviewed here). In McMillan's view, governments have important roles in market design, but that doesn't mean that in all circumstances the free market is the optimal solution. It certainly isn't the only solution, and it definitively is not the main economic function of governments.

The free market ideal has its place, but it is in most cases an unrealistic ideal. That doesn't mean, as some critics believe, that markets are the worst way to organise economic activity. Instead, it means that we should recognise market failures, and that a necessary function of government is to ameliorate the negative impacts of those failures.

However, even in terms of markets, I was given cause to question how Hazlitt believes that markets work. Consider this passage:

If because of unusual weather conditions there is a sudden increase in the crop of oranges, all the consumers will benefit. The world will be richer by that many more oranges. Oranges will be cheaper. But that very fact may make the orange growers as a group poorer than before, unless the greater supply of oranges compensates or more than compensates for the lower price.

I fail to see how "orange growers as a group" could be poorer than before if they are growing more oranges. Producer surplus (farmer profits in aggregate) must increase. Hazlitt gives the example of a farmer whose crop is no larger than before, being worse off because of the lower orange price. That is correct, but that is not the same as all growers as a group being worse off.

The book does have some good parts, and it has aged well, with many of the examples remaining relevant today. It also has some funny bits, including this passage (comparing the efficiency of private lenders with government lenders):

Thus private lenders (except the relatively small proportion that have got their funds through inheritance) are rigidly selected by a process of survival of the fittest. The government lenders, on the other hand, are either those who have passed civil service examinations, and know how to answer hypothetical questions hypothetically, or they are those who can give the most plausible reasons for making loans and the most plausible explanations of why it wasn't their fault that the loans failed.

Of course, many readers will disagree with the 'rigid selection' of private lenders, given that it was those 'rigidly selected' lenders who paved the way for the Global Financial Crisis. Also, many readers will disagree strongly with Hazlitt's strongly negative position on any form of the welfare state, and on public housing.

Overall, it was useful for me to read this book, given that it continues to hold a lot of weight among market fundamentalists. However, it didn't persuade me on any particular point, and as I'll note in my next review (as well as my review above), there are several obvious aspects on which it can be criticised.

Tuesday, 15 February 2022

Hardly the final words on student evaluations of teaching

I've written a few posts this week about student evaluations of teaching (see here and here), a few others in previous years. One of those earlier posts asked whether student evaluations of teaching are even measuring teaching quality. The research I cited there was a meta-analysis that suggested that there was no correlation between teaching quality (as measured by final grade or final exam mark or similar) and student evaluations of teaching. However, measuring teaching quality objectively using grades could be problematic if there is reverse causation (for example, teachers give higher grades in hopes of receiving better teaching evaluations). A better approach may be to use some measure of teacher value-added, such as the grade in subsequent classes (with different teachers), or grades in standardised tests (that the teacher doesn't grade themselves).

The former approach, based on teacher value-added, is the one adopted in this 2014 article by Michela Braga (Bocconi University), Marco Paccagnella (Bank of Italy), and Michele Pellizzari (University of Geneva), published in the journal Economics of Education Review (ungated earlier version here). They use data from students in the 1998/99 incoming cohort at Bocconi University, where the students were randomly allocated to teaching classes in all of their compulsory courses (which eliminates problems of selection bias). Looking at the effect of future student performance on current teaching evaluations, Braga et al. find that:

Our benchmark class effects are negatively associated with all the items that we consider, suggesting that teachers who are more effective in promoting future performance receive worse evaluations from their students. This relationship is statistically significant for all items (but logistics), and is of sizable magnitude. For example, a one-standard deviation increase in teacher effectiveness reduces the students’ evaluations of overall teaching quality by about 50% of a standard deviation. Such an effect could move a teacher who would otherwise receive a median evaluation down to the 31st percentile of the distribution.

Those results are consistent with the meta-analysis results, that teachers who do a better job of preparing students for their future studies receive worse teaching evaluations. However, when looking at exam performance in the current class, Braga et al. find that:

...the estimated coefficients turn positive and highly significant for all items (but workload). In other words, the teachers of classes that are associated with higher grades in their own exam receive better evaluations from their students. The magnitudes of these effects is smaller than those estimated for our benchmark measures: one standard deviation change in the contemporaneous teacher effect increases the evaluation of overall teaching quality by 24% of a standard deviation and the evaluation of lecturing clarity by 11%.

They interpret those results as showing that teachers who 'teach to the test' for the current semester receive better teaching evaluations. Braga et al. conclude, unsurprisingly, that:

Overall, our results cast serious doubts on the validity of students’ evaluations of professors as measures of teaching quality or effort.

Aside from being a measure of teaching quality or effort, perhaps student evaluations of teaching provide useful information that teachers use to improve? This 2020 article by Margaretha Buurman (Free University Amsterdam) and co-authors, published in the journal Labour Economics (ungated earlier version here), addresses that question using a field experiment. Specifically, from 2011 to 2013 Buurman et al.:

...set up a field experiment at a large Dutch school for intermediate vocational education. Student evaluations were introduced for all teachers in the form of an electronic questionnaire consisting of 19 items. We implemented a feedback treatment where a randomly chosen group of teachers received the outcomes of their students’ evaluations. The other group of teachers was evaluated as well but did not receive any personal feedback. We examine the effect of receiving feedback on student evaluations a year later...

They find that:

...receiving feedback has on average no effect on feedback scores of teachers a year later. We find a precisely estimated zero average treatment effect of 0.04 on a 5-point scale with a standard error of 0.05...

Buurman et al. suggest that this may be because they estimate the effect a year later, and that the evaluations feedback may have shorter run effects. I don't find that convincing. However, there were differences by gender:

Whereas male teachers hardly respond to feedback independent of the content, we find that female teachers’ student evaluation scores increase significantly after learning that their student evaluation score falls below their self-assessment score as well as when they learn their score is worse than that of their team. Moreover, in contrast to male teachers, female teachers adjust their self-assessment downwards after learning that students rate them less favorably than they rated themselves.

That should perhaps worry us, given the gender bias in evaluations. If it causes female teachers to expend additional effort in trying to improve their teaching evaluations to match those of male teachers, then they will be expending more effort on teaching than the male teachers will for the same outcome. In a university context, that would likely have a negative impact on female teachers' research productivity, with negative consequences for their career. This might be an intervention that is best avoided, unless the gender bias in student evaluations of teaching is first addressed (see yesterday's post for one idea).

As the title of this post suggests, this is hardly the final words on student evaluations of teaching. However, we need to understand what works best, what avoids (or minimises) biases against female or minority teachers, and how teachers can best use the outcomes of evaluations to improve their teaching.

Read more:

Monday, 14 February 2022

Can a simple intervention reduce gender bias in student evaluations of teaching?

Following on from yesterday's post, which discussed research demonstrating bias against female teachers in student evaluations of teaching [SETs] (see also this post, although the meta-analysis in yesterday's post suggested that the only social science to have such a bias is economics), it is reasonable to wonder how the bias can be addressed. Would simply drawing students' attention to the bias be enough, or do we need to moderate student evaluations in some way?

The question of whether a simple intervention would work is addressed in this recent article by Anne Boring (Erasmus School of Economics) and Arnaud Philippe (University of Bristol), published in the Journal of Public Economics (ungated earlier version here). Boring and Philippe conducted an experiment in the 2015-16 academic year across seven campuses of Sciences Po in France, where each campus was assigned to one of three groups: (1) a control group; (2) a "purely normative" treatment; or (3) an "informational" treatment. As Boring and Philippe explain:

The administration sent two different emails to students during the evaluation period. One email—the ‘‘purely normative” treatment—encouraged students to be careful not to discriminate in SETs. The other email—the ‘‘informational” treatment—added information to trigger bias consciousness. It included the same statement as the purely normative treatment, plus information from the study on gender biases in SETs. The message contained precise information on the presence of gender biases in SET scores in previous years at that university, including the fact that male students were particularly biased in favor of male teachers.

Of students at the treated campuses, half received the email and half did not. No students at the control campuses received an email. In addition, the emails were sent after the period for students to complete evaluations had already started, so some students completed their evaluations before the treatment, and some after. This design allows Boring and Philippe to adopt a difference-in-differences analysis, comparing the difference in evaluations before and after the email for students at treatment campuses who were assigned to receive the email, with the difference in evaluations before and after the email for students at control campuses. The difference in those two differences is the effect of the intervention. Conducting this analysis, they find that:

...the purely normative treatment had no significant impact on reducing biases in SET scores. However, the informational treatment significantly reduced the gender gap in SET scores, by increasing the scores of female teachers. Overall satisfaction scores for female teachers increased by about 0.30 points (between 0.08 and 0.52 for the confidence interval at 5%), which represents around 30% of a standard error. The informational treatment did not have a significant impact on the scores of male teachers...

The reduction in the gender gap following the informational email seems to be driven by male students increasing their scores for female teachers. On the informational treatment campuses, male students’ mean ratings of female teachers increased from 2.89 to 3.20 after the emails were sent... Furthermore, the scores of the higher quality female teachers (those who generated more learning) seem to have been more positively impacted by the informational email.

That all seems very positive. Also, comparing evaluations from students at control campuses with evaluations from students in the control group at treated campuses before and after the email was sent allows Boring and Philippe to investigate whether the interventions had a spillover effect on those who did not receive the email. They find that:

...the informational treatment had important spillover effects. On informational treatment campuses, we find an impact on students who received the email and on students who did not receive the email. Anecdotal evidence suggests that this email sparked conversations between students within campuses, de facto treating other students.

The anecdotal evidence (based on responses to an email asking students about whether they had discussed the informational email with others) both provides a plausible mechanism to explain the spillover effects, and suggests that the emails may have been effective in spurring important conversations on gender bias. Also, importantly, the informational emails had an enduring effect. Looking at evaluations one semester later, Boring and Philippe find that:

The effect of the informational treatment remains significant during the spring semester: female teachers improved their scores. The normative treatment remained ineffective.

So, it appears that it is possible to reduce gender bias in student evaluations of teaching with a simple intervention.

Read more:

Sunday, 13 February 2022

More on gender bias in student evaluations of teaching

Back in 2020, I wrote a post on gender biases in student evaluations of teaching, highlighting five research papers that showed pretty clearly that student evaluations of teaching (SET) are biased against female teachers. I've recently read some further research on this topic that I thought I would share, some of which supports my original conclusion, and some of which should make us pause, or at least draw a more nuanced conclusion.

The first article is this 2020 one by Shao-Hsun Keng (National University of Kaohsiung), published in the journal Labour Economics (sorry, I don't see an ungated version online). Keng uses data from 2002 to 2015 from the National University of Kaohsiung, covering all departments. They have data on student evaluations, and on student grades, which they use to measure teacher value-added. In the simplest analysis, they find that:

...both male and female students give higher teaching evaluations to male instructors. Female students rate male instructors 11% of a standard deviation higher than female instructors. The effect is even stronger for male students. Male students evaluate male instructors 15% (0.109 + 0.041) of a standard deviation higher than female instructors.

 Interestingly, Keng also finds that:

Students who spend more time studying give higher scores to instructors, while those cutting more classes give lower ratings to instructors.

It is difficult to know which way the causality would run there though. Do students who are doing better in the class recognise the higher-quality teaching with better evaluations? Or do students who are enjoying the teaching more spend more time studying? Also:

Instructors who have higher MOST [Ministry of Sciences and Technology] grants receive 1.2% standard deviation lower teaching evaluations, suggesting that there might be a trade-off between research and teaching.

That suggests that research and teaching are substitutes (see my earlier post on this topic). Keng then goes on the separately analyse STEM and non-STEM departments, and finds that:

Gender bias in favor of male instructors is more prominent among male students, especially in STEM departments. Female students in non-STEM departments, however, show a greater gender bias against female instructors, compared to their counterparts in STEM departments.

In other words, both male and female students are biased against female teachers, but male students are more biased. Male STEM students are more biased than male non-STEM students, but female non-STEM students are more biased than female STEM students. Interesting. Keng then goes on to show that:

...the gender gap in SET grows as the departments become more gender imbalanced.

This effect is greater for female students than for male students, so female students appear to be more sensitive to gender imbalances. This is not as good as it may sound - it means that female students are more biased against female teachers in departments that have a greater proportion of male teachers (such as STEM departments). Finally, Keng uses their measure of value-added to make an argument that the bias against female teachers is related to statistical discrimination. However, I don't find those results persuasive, as they seem to rely on an assumption that as teachers remain at the institution longer, students learn about their quality. However, students are only at the institution for three or four years, and don't typically see the same teachers across multiple years, so it is hard to see that this is a learning effect on the students' side. I'd attribute it more to the teachers better understanding what it take to get good teaching evaluations.

Moving on, the second article is this 2021 article by Amanda Felkey and Cassondra Batz-Barbarich (both Lake Forest College), published in the AEA Papers and Proceedings (sorry, I don't see an ungated version online). Felkey and Batz-Barbarich conduct a meta-analysis of gender bias in student evaluations of teaching. A meta-analysis combines the results across many studies, allowing us to (hopefully) overcome statistical biases arising from looking at a single study. Felkey and Batz-Barbarich base their meta-analysis on US studies covering the period from 1987 to 2017, which includes 15 studies and 39 estimated effect sizes. They also compare economics with other social sciences. They find that:

In the 30 years spanned by our metadata, there was significant gender difference in SETs that favored men for economics courses... Gender difference in the rest of the social sciences favored women on average but was statistically insignificant...

The p-value for other social sciences is 0.734, so is clearly statistically insignificant. The p-value for economics is 0.051, which many would argue is also statistically insignificant (although barely so). However, in a footnote, Felkey and Batz-Barbarich note that:

We found evidence that our results for economics were impacted by publication bias such that the gender difference is actually greater...than our included studies and analyses suggest.

They don't present an analysis that accounts for the publication bias, which might have shown a more statistically significant gender bias. This is bad news for economics, but might it be good news for other disciplines? It's not consistent with the results of other analyses of gender bias in SETs, where it appears across all disciplines (see the Keng study above, or studies in this earlier post). Usually, I would strongly favour the evidence in a meta-analysis over individual studies, but it is difficult when the meta-analysis seems to show something different from the studies I have read. Moreover, Felkey and Batz-Barbarich don't find any evidence of publication bias in disciplines other than economics, which suggests the null finding for those disciplines is robust. Perhaps gender bias in teaching evaluations is really just a feature of economics and STEM disciplines? I'd want to see a more detailed analysis (the AEA Papers and Proceedings don't offer the opportunity for authors to include a lot of detail), before drawing a strong conclusion, but this should make us more carefully evaluate the evidence on gender bias, especially outside of the STEM disciplines.

Read more:

Friday, 11 February 2022

Gender bias in principals' evaluations of primary teachers in Ghana

I've written before about gender bias in student evaluations of teaching (see here, with more to come soon in a future post). There is good reason to worry that student evaluations don't even measure teaching quality (see here, with more on that to come too). However, it turns out that it isn't just students that are biased in evaluating teachers. This article by Sabrin Beg (University of Delaware), Anne Fitzpatrick (University of Massachusetts, Boston), and Adrienne Lucas (University of Delaware), published in the AEA Papers and Proceedings last year (ungated version here), shows that primary school principals, at least in Ghana, are biased as well.

Their data come from the Strengthening Teacher Accountability to Reach All Students (STARS) project, a randomised trial that collected data from 210 schools in 20 districts in Ghana. They asked fourth and fifth-grade teachers to rate their own performance (by comparing themselves to teachers at similar schools), and asked principals to rate their teachers. They also presented teachers and principals with vignettes, where the gender of the teacher described was randomised, and asked the principals to rate the teacher described in the vignette. Finally, they measured the 'teacher value-added' using standardised tests administered at the beginning and end of the year. Comparing ratings between male and female teachers, Beg et al. find that:

Female and male teachers were equally likely to assess themselves as at least more effective than other teachers at similar schools... In contrast, principals were about 11 percentage points less likely to assess female teachers this highly relative to male teachers...

The gender bias of principals was not statistically significant though (a point that Beg et al. do not note in the paper, preferring to concentrate on the magnitude of the coefficient). They also:

...test for gender differences in the objective measure of effectiveness based on student test scores and find that female teachers had on average 0.28 standard deviations higher effectiveness than their male peers...

This puts the statistical insignificance of the principals' gender bias into more context. Using an objective measure of teacher value-added, female teachers are better teachers than male teachers, and yet female teachers are not statistically significantly rated any better than male teachers by principals. No difference in subjective assessments, when objective assessments say that female teachers are better than male teachers, provides evidence of bias.

Coming to the vignettes though, Beg et al. find that:

Principals further demonstrated evidence of bias against women in their hypothetical assessments. Principals rated individuals 0.12 standard deviations less effective when they had a female name instead of a male one...

Again, the difference is not statistically significant (and doesn't provide strong evidence of bias, because there was by construction no difference in teacher quality between male and female teachers in the vignettes). Overall, I was a bit surprised by this study, because when Beg et al. graph principals' subjective assessments of male and female teachers against the teacher value-added, you get this (their Figure 1):

At every level of objective teacher value-added, male teachers (the black line) are subjectively rated better than female teachers (the blue line) by principals. And yet, the difference is statistically insignificant in Beg et al.'s regression model. Perhaps Beg et al. should have included the objective measure in their models (or included confidence intervals in their Figure 1). Overall, this provides some weak additional support for gender bias in the evaluation of teachers.

Read more:

Wednesday, 9 February 2022

Genetic diversity and its enduring effect on economic development in the US

How important (or otherwise) is ethnic diversity for economic development? This is a genuinely difficult question to answer. However, this 2017 article by Philipp Ager (University of Southern Denmark) and Markus Brueckner (Australian National University), published in the journal Economic Inquiry (ungated earlier version here), makes a good attempt in one context. Ager and Brueckner use US county-level data over the period from 1870 to 2020, looking at how genetic diversity in 1870 (measured using the number of European-born people from different countries) was related to the change in output per capita (which is the sum of agricultural output and manufacturing output, and is a proxy for GDP). In their simplest regression models, they find that:

The point estimates on genetic diversity are positive and significantly different from zero at the 1% level in all regressions; quantitatively, they range between 0.09 and 0.12... the coefficient of 0.09... suggests that, on average, a one unit increase in genetic diversity increased U.S. counties’ output by around 10%. An alternative interpretation is that a one standard deviation increase in immigrants’ genetic diversity increased output per capita growth during the 1870–1920 period by around 20% (equivalent to around 0.25 standard deviations; or alternatively, 0.4% per annum).per capita during the 1870–1920 period.

That is quite a sizeable positive effect. Ager and Brueckner then go on to show that there are enduring effects, even more than a century later. The impact on 2010 income per capita is:

...around 0.04 and significantly different from zero at the 1% level.

There is also a statistically significant and positive (and somewhat larger) relationship with income per capita in 2000, 1990, 1980, and 1970. Ager and Brueckner then go even further, looking at the relationship between genetic diversity in 1790 and income per capita in 2000, finding that:

The estimated coefficient on 1790 genetic diversity is around 3.1... and statistically significant at the 1% level. Quantitatively, the estimated coefficient on immigrants’ 1790 genetic diversity suggests that a one standard deviation higher genetic diversity in 1790 was associated with a higher value of income per capita in the year 2000 of around 0.3 standard deviations.

Their results are also robust to a variety of alternative measures of development, non-linearity, controlling for pre-trends, and excluding outlier observations. However, they fall a bit short of demonstrating a causal impact, even when accounting for pre-trends, since counties that are initially more diverse may differ in meaningful ways from counties that are less diverse, not least in the types and variety of immigrants they attract. Nevertheless, the results are interesting, and especially when Ager and Brueckner consider a potential mechanism, finding that, at the state level:

...the genetic diversity variable has a significant positive effect on patents. On the other hand, there is no significant effect of genetic diversity on conflict for the sample at hand...

They don't dwell on these results too much, but I think they demonstrate that diversity is not associated with greater conflict (which would reduce productivity), but is associated with more innovation (which would increase productivity). Greater diversity leading to more innovation and therefore greater development does seem like a plausible (and good news) story. However, it would be interesting to see some more research corroborating these results in other (and perhaps more contemporary) contexts.

Tuesday, 8 February 2022

Experimental evidence on the relevance of migration models

One of the problems with empirical studies of migration (as with a lot of empirical research) is that we don't observe the counterfactual. In the data, we see people migrating, and where they migrate to, but we don't see what they don't do. For migrants, we don't see where they would otherwise have migrated to, or if they wouldn't have migrated at all. For non-migrants, we don't see where they would have migrated to. What would be ideal would be to see potential migrants making lots of migration decisions, or ranking the available options. Lab experiments may help here, since we can ask research participants about a lot of hypothetical choices, where the parameters of the choices are controlled by the researcher. Because the choices are hypothetical, they may not line up with what potential migrants would actually do, but that illustrates the trade-off we face in this sort of research.

This recent working paper by Catia Batista (Universidade Nova de Lisboa) and David McKenzie (World Bank) provides some lab experiment evidence on migration decision-making, and uses the data to test some classic migration theories (the Roy model, that was critiqued by Michael Clemens, as I noted in yesterday's post, and the Sjaastad model, which I briefly discussed in that post as well). Batista and McKenzie collected data from university students in their final year of study - 154 students in Lisbon, and 265 students in Nairobi. Since university graduates are among the population groups most likely to migrate, these are appropriate research participants for asking hypothetical migration questions.

The experimental study asked research participants to choose between two (or sometimes three, or five) destinations, where the destinations had differences in wages, cost of relocating, the risk of unemployment, and the availability (and generosity) of social insurance in case of unemployment. Some scenarios also had incomplete information, and the research participants could buy information before making their decision. Research participants also had randomly allocated observed and unobserved skill levels (which determined the wages) and wealth endowments (which determined whether they could afford the costs of moving, and pay for information). In all, each research participant made 24 choices under different conditions.

There is far too much detail across the individual games (or groups of games) for me to summarise them all here (for that, I recommend reading the working paper). However, the paper did have some important takeaways. First:

Our results show that adding real-world features which take account of liquidity constraints, risk and uncertainty, and incomplete information to the classical Sjaastad/Borjas migration model makes a huge difference in terms of predicting the rate of migration and the selection pattern. The largest impact comes from adding risk (of unemployment) to the migration outcome.

The relative impact of the different features added to the migration decision-making process was based on a regression model that included the various characteristics of the game as explanatory variables for the decision to migrate (rather than not migrate). This is interesting, but I wonder about the external validity of the results beyond the particular values of the features used in the experimental choices. While the values that Batista and McKenzie use are plausible and based on real-world equivalents, I think we need more studies of this type with a wider range of values (and across more contexts) before we can derive some strong conclusions about which real-world features of the migration decision have the largest effect.

One of the most interesting parts of the paper, was where Batista and McKenzie test the independence of irrelevant alternatives assumption that is inherent in many migration models. That assumption essentially says that the relative probability of choosing to migration to a destination x1 rather than x2 doesn't depend on whether the potential migrant could have chosen x3 or not. Batista and McKenzie look at this by comparing the relative choices when there are two options, three options, or five options available (and where some of the options are clearly inferior to the original two options). They find that:

...the independence of irrelevant alternatives (IIA) assumption, which underlies many models of multi-destination migration choices, holds well for simple migration decisions that just involve comparisons of costs and wages in a developed country setting, but even in these simple cases, violations occur for 14-21 percent of our sample in Nairobi. Moreover, when the risk of unemployment and incomplete information are added, IIA no longer holds for 20 percent of the people in our game in Lisbon. Since most real-world migration decisions involve considerable risk and incomplete information, real-world violations of IIA are likely to be non-trivial.

Whether you interpret those results as being favourable for the IIA assumption or not is pretty subjective (and probably depends on some motivated reasoning depending on what migration model you want to use!). It is interesting to me that in the simplest decisions, there was a non-trivial level of violations of the IIA assumption in the Nairobi sample. Violations of IIA were not systematically different based on any of the characteristics of the research participants, including their migration preferences, level of risk aversion, or ambiguity aversion. Again, this would be worth following up further in other contexts.

Finally, and probably the most important (and a useful caution) for those who prefer the Sjaastad model of migration (like Michael Clemens and myself), income maximisation was not universal, as:

...even in the simplest settings, people do not always make the destination choice which maximizes net income. Instead, cost minimization seems to be a key decision factor in the migration decision – particularly for individuals in the Nairobi sample.

It would have been nice to see another regression model demonstrating the factors that were associated with making cost minimising, rather and income maximising, choices. I wonder about the extent to which risk aversion was playing a role here (the Nairobi sample appeared to be more risk averse across two of the three measures than the Lisbon sample), especially as Batista and McKenzie note that:

Seventy-five percent in Lisbon and 84 percent in Nairobi of those who do not migrate in Game 6, even though they have positive expected returns from doing so, are risk averse according to the Dohmen et al. (2011) measure.

A risk averse potential migrant might be more inclined to cost-minimise, rather than income-maximise, especially as in Game 6 that Batista and McKenzie refer to, the chance of unemployment was endogenous, and depended on the number of other research participants in the same session who chose to migrate to that destination. If you are worried about unemployment, and risk averse, then you are more likely not to migrate. Again, something to explore more, and in additional contexts.

Overall, I think there is a lot more that can be done to explore migration experimentally. This paper provides a useful starting point for what will, hopefully, build into an interesting contribution to our understanding of migration decision-making.

[HT: David McKenzie on the Development Impact blog

Monday, 7 February 2022

Michael Clemens vs. the Roy model of migration

In a new working paper, Michael Clemens (Center for Global Development) presents a convincing case against the Roy model of migration. Summarising the implications of the Roy model, he notes that:

As workers in the Roy (1951) model choose an occupation, so they are assumed to choose a country of residence, where they swell a labor aggregate in a fixed production function with diminishing returns. These assumptions usefully predict some facts at partial equilibrium (Ariu 2018): poor countries tend to experience net emigration while rich countries experience net immigration, but not everyone migrates and not all at once...

The model predicts causes: Rising trade and capital flows should substitute for migration, and vice versa, via factor price equalization... Migration today should reduce migration tomorrow, as the gain is arbitraged away.

The model also predicts effects: Typical migrants from poor, unequal countries should be the least productive workers, as they have the most to gain. Native workers should be directly harmed, through labor-market competition and fiscal redistribution—until capital accumulation merely leaves them where they started (but leaves capital owners even wealthier). Skill-selective immigration restrictions should simply shift the harm to the world’s most vulnerable, impoverishing poor countries by ‘brain drain’.

Clemens notes the substantial rise in global migration over the last fifty years, then summarises the evidence against the Roy model:

Did the advance of migration result from a retreat of its theoretical substitutes—trade and capital flows? Just the opposite. Global flows of goods and capital exploded during the same years...

Did migration arise from failed economic development in poor countries? Just the opposite. Global migration surged from the European Core in the mid-19th century, the European Periphery around the turn of the 20th century, and from Latin America and Asia in the second half of the 20th... These surges coincided with the arrival of modern economic growth and each region’s ascent out of poverty.

Did initial waves of migration reduce the incentive for further migration, by spatially equilibrating the labor market? Just the opposite. Migration tends to beget even more migration, for generations. Prior migrants raise the net benefits and incidence of new migration by providing information, capital, and inspiration...

Did the large rise in migration from the developing world broadly substitute for workers in the destinations? No. As several prior reviews have found, Edo et al. (2020, 1367) conclude that “the impact of immigration on the average wage and employment of native-born workers is zero or slightly positive in the medium to long term”...

Did more migration broadly substitute for other forms of globalization, through factor-price equalization? Certainly not. More migration has raised the volume and scope of trade... and flows of capital and technology...

In other words, Clemens argues that it is time to retire the Roy model of migration. The need to make this argument was a little bit of a surprise to me. Most of the literature on migration I have read since completing my PhD, including much of the literature that I have cited in my own work on migration (including with my PhD students), actually relies on the neoclassical (and arguably more realistic) model of Larry Sjaastad (see here). The seminal paper by Sjaastad dates from the 1960s, so this model is not a new insight! The Sjaastad model relates migration flows to the global supply and demand for labour. Individual migration decisions in the model are likened to an investment in human capital, where a person migrates if the benefits from migration outweigh the costs.

Anyway, it's not clear to me that the Roy model is much used any more. To my mind, the Roy model has seen more extensive use in labour economics, but its failures in explaining migration flows might give us cause to question its continued use there as well.