Wednesday, 27 October 2021

The pandemic may have revealed all we need to know about online learning

Regular readers of this blog will know that I am highly sceptical of online learning, blended learning, flipped classrooms, and the like. That's come from a nuanced understanding of the research literature, and especially from a concern about heterogeneity. Students respond differently to learning in the online environment, and in ways that I believe are unhelpful. Students who are motivated and engaged and/or have a high level of 'self-regulation' perform at least as well in online learning as they do in a traditional face-to-face setting, and sometime perform better. Students who lack motivation, are disengaged, and/or have a low level of self-regulation flounder in online learning, and perform much worse.

The problem with much of the research literature, though, is a lack of randomisation. Even when a particular study employs randomisation, randomisation into online learning occurs at the level of the student, not the level of the section or course. That is, particular lecturers opt in to being part of the study (often, they are the researchers who are undertaking the study).

An alternative to a pure, randomised experiment is a natural experiment - where some unexpected change in a real world setting provides a way of comparing those in online and traditional face-to-face learning. That's where the pandemic comes in. Prior to lockdowns and stay-at-home orders prevented in-person teaching, some students were studying online. Other students were studying in person, but were forced into online learning. Comparing the two groups can give us some idea of the effect of online learning on student performance, and there are a number of studies starting to appear that do just that. I'm going to focus this post on four such studies.

The first study is this NBER working paper (ungated) by Duha Altindag (Auburn University), Elif Filiz (University of Southern Mississippi), and Erdal Tekin (American University). Altindag was one of the co-authors on the article I discussed on Sunday. Their data come from a "medium-sized, public R1 university" (probably Auburn University), and includes a sample of over 18,000 students and over 1000 instructors. They essentially compare student performance in classes in Spring and Fall 2019 with the same students' performance in classes in Spring 2020, where pandemic restrictions shut the campus down partway through the semester, forcing all in-person teaching to online. Importantly:

This shift occurred after the midterm grades were assigned. Therefore, students obtained a set of midterm grades with F2F [face-to-face] instruction and another set (Final Grades) after the switch to online instruction.

Altindag et al. find that, once they account for heterogeneity across instructors:

...students in F2F instruction are 2.4 percentage points (69% of the mean of the online classes) less likely to withdraw from a course than those in online instruction in Fall 2019... Moreover, students in F2F courses are 4.1 percentage points (4 percent) more likely to receive a passing grade, i.e., A, B, C, or D, than their counterparts in online courses.

However, importantly, Altindag et al. go on to look at heterogeneous effects for different student groups, and find that:

Strikingly, for honor students, there seems to be no difference between online and F2F instruction... Students in the Honors program perform equally well regardless of whether the course is offered online or in person... When we turn to students in regular courses, however, the results are very different and resembles the earlier pattern that we discussed in the previous results...

So, the negative impacts of online learning were concentrated among non-honours students, as I suggested at the start of this post. Better students are not advantaged by online learning in an absolute sense, but they are advantaged relatively because the less-able students do much worse in an online setting. Also interestingly, in this study there were no statistically significant differences in the impact of online learning by gender or race. However, they also show some suggestive evidence that having access to better broadband internet reduces the negative impact of online learning (which should not be surprising), but doesn't eliminate it.

Altindag et al. also show that the negative impact of online learning was concentrated in courses where instructors were more vigilant about academic integrity and cheating, which suggests that we should be cautious about taking for granted that grades in an online setting are always a good measure of student learning. 

The second study is this working paper by Kelli Bird, Benjamin Castleman, and Gabrielle Lohner (all University of Virginia). They used data from over 295,000 students enrolled in the Virginia Community College System over the five Spring terms from 2016 to 2020 (with the last one being affected by the pandemic). As this is a community college sample, it is older that the sample in the first study, more likely to be working and studying part-time, and has lower high school education performance. However, the results are eerily similar:

The move from in-person to virtual instruction resulted in a 6.7 percentage point decrease in course completion. This translates to a 8.5 percent decrease when compared to the pre-COVID course completion rate for in-person students of 79.4 percent. This decrease in course completion was due to a relative increase in both course withdrawal (5.2 pp) and course failure (1.4 pp). We find very similar point estimates when we estimate models separately for instructors teaching both modalities versus only one modality, suggesting that faculty experience teaching a given course online does not mitigate the negative effects of students abruptly switching to online instruction. The negative impacts are largest for students with lower GPAs or no prior credit accumulation.

Notice that, not only are the effects negative, they are more negative for students with lower GPAs. Again, Bird et al. note that:

One caveat is that VCCS implemented an emergency grading policy during Spring 2020 designed to minimize the negative impact of COVID on student grades; instructors may have been more lenient with their grading. As such, we view these estimates as a lower-bound of the negative impact of the shift to virtual instruction.

The third study is this IZA Discussion Paper by Michael Kofoed (United States Military Academy) and co-authors. The setting for this study is again different, being based on students from the US Military Academy at Westpoint. This provides some advantages though. As Kofoed et al. explain:

Generally, West Point students have little control over their daily academic schedules. This policy did not change during the COVID-19 pandemic. We received permission to use this already existing random assignment to assign students to either an in-person or online class section. In addition, to allow for in-person instruction, each instructor agreed to teach half of their four section teaching load... online and half in-person.

This provides a 'cleaner' experiment for the effect on online learning, because students were randomised to either online or in-person instruction, and almost all instructors taught in both formats, which allows Kofoed et al. to avoid any problems of instructors self-selecting into one mode or the other. However, their sample is more limited in size, to the 551 students enrolled in introductory microeconomics. Based on this sample, they find that: instruction reduced a students final grade by 0.236 standard deviations or around 1.650 percentage points (out of 100). This result corresponds to about one half of a +/- grade. Next to control for differences in instructor talent, attentiveness, or experience, we add instructor fixed effects to our model. This addition reduces the estimated treatment effect to -0.220 standard deviations; a slight decrease in magnitude....

Importantly, the results when disaggregated by student ability are similar to the other studies:

...learning gaps are greater for those students whose high school academic preparation was in the bottom quarter of the distribution. Here, we find that being in an online class section reduced their final grades by 0.267 standard deviations, translating to around 1.869 percentage points of the student’s final grade.

Unlike Altindag et al., Kofoed et al. find that online learning is worse for male students, but there are no significant differences by race. Kofoed et al. also ran a post-term survey to investigate the mechanisms underlying their results. The survey showed that:

...students felt less connected to their instructors and peers and claimed that their instructors cared less about them.

This highlights the importance of social connections within the learning context, regardless of whether learning is online or in-person. Online, those opportunities can easily be lost (which relates back to this post from earlier this month), and it appears that not only does online education reduce the value of the broader education experience, it may reduce the quality of the learning as well.

Kofoed et al. were clearly very concerned about their results, as:

From an ethical perspective, we should note that while it is Academy-wide policy to randomly assign students to classes, we did adjust the final grade of students in online sections according to our findings and prioritized lower [College Entrance Examination Rank] score students for in-person classes during Spring Semester 2021.

Finally, the fourth study is this recent article by Erik Merkus and Felix Schafmeister (both Stockholm School of Economics), published in the journal Economics Letters (open access). The setting for this study is again different, being students enrolled in an international trade course at a Swedish University. The focus is also different - it compares in-person and online tutorials. That is, rather than the entire class being online, each student experienced some of the tutorials online and other tutorials in person, over the course of the semester. As Merkus and Schafmeister explain:

...due to capacity constraints of available lecture rooms, in any given week only two thirds of students were allowed to attend in person, while the remaining third was assigned to follow online. To ensure fair treatment, students could attend the in-class sessions on a rolling basis, with each student attending some tutorials in person and others online. The allocation was done on a first-name basis to limit self-selection of students into online or in-person teaching in specific weeks.

They then link student performance for the 258 students in their sample in the final examination questions with whether the student was assigned to an in-person tutorial for that particular week (they don't compare whether students actually attended or not - this is an 'intent-to-treat' analysis). Unlike the other three studies, Merkus and Schafmeister find that:

...having the tutorial online is associated with a reduction in test scores of around 4% of a standard deviation, but this effect does not reach statistical significance.

That may suggest that it is not all bad news for online learning, but notice that they compare online and in-person tutorials only, while the rest of the course is conducted online. There is no comparison group of students who studied the entire course in person. These results are difficult to reconcile with Kofoed et al., because tutorials should be the most socially-interactive component of classroom learning, so if students feel that the social element is much less (per Kofoed et al.), then why would the effect be negligible (per Merkus and Schafmeister). The setting clearly matters, and perhaps that is enough to explain these differences. However, Merkus and Schafmeister didn't look at heterogeneity by student ability, which I have noted many times before is a problem.

Many universities (including my own) are seizing the opportunity presented by the pandemic to push forward plans to move a much greater share of teaching into online settings. I strongly believe that we need to pause and evaluate before we move too far ahead with those plans. To me, the research is continuing to suggest that, by adopting online learning modes, we create a learning environment that is hostile to disengaged, less-motivated students. You might argue that those are the students we should care least about. However, the real problem is that the online learning environment itself might increase or exacerbate feelings of disengagement (as the Kofoed et al. survey results show). If universities really care about the learning outcomes of students, then we're not at the point where they should be going 'all in' on online education.

Read more:

Sunday, 24 October 2021

COVID-19 risk and compensating differentials in a university setting

A compensating differential is the difference in the wage between a job with desirable non-monetary characteristics and a job with undesirable non-monetary characteristics, holding all other factors (like human capital or skill requirements, experience of the worker, etc.) constant. When a job has attractive non-monetary characteristics (e.g. it is clean, safe, or fun), then more people will be willing to do that job. This leads to a higher supply of labour for that job, which leads to lower equilibrium wages. In contrast, when a job has negative non-monetary characteristics (e.g. it is dirty, dangerous, or boring), then fewer people will be willing to do that job. This leads to a lower supply of labour for that job, which leads to higher equilibrium wages. It is the difference in wages between jobs with attractive non-monetary characteristics and jobs with negative non-monetary characteristics that we refer to as a compensating differential (essentially, workers are being compensated for taking on jobs with negative non-monetary characteristics, through higher wages).

The current pandemic presents a situation where many jobs have suddenly had a new and negative non-monetary characteristic added to them - the risk of becoming infected with the coronavirus. The idea of compensating differentials suggests that workers who suddenly face a job that is riskier than before should receive an increase in wages (and indeed, we have seen that, such as the pay bonus that some supermarket workers have received).

There hasn't been much in the way of systematic research on the compensating differentials arising from the pandemic. No doubt we can expect some in the future. An early example is this new paper by Duha Altindag, Samuel Cole, and Alan Seals (all Auburn University). It turns out that Auburn University didn't strictly follow the CDC requirements for safe social distancing in class, leading to some classes having too many students, and therefore being higher risk. As Altindag et al. note:

Possibly due to the cost concerns, Auburn University did not implement any policy about maintaining six feet of distance between students within the classrooms... Instead, the university set an enrollment limit of half of the normal seating capacity in classrooms, despite the Center for Disease Control (CDC) guidelines and the public health orders of the state... This practice of the university led to about 50% of all face-to-face (F2F) courses in Spring 2021 being delivered in “risky” classrooms, in that the number of enrolled students in classes exceeded their classrooms’ CDC-prescribed safe capacity (the maximum number of students that can be seated in the room while allowing a six-foot distance between all students).

Altindag et al. looked at differences in which staff taught the risky (or 'very risky' - classes where the number of enrolled students was more than double the safe room capacity) classes, and then looked at differences in pay between those teaching risky classes and those teaching less risky classes. For the differences in pay, they are able to adopt an instrumental variables approach, using the presence of fixed furniture in the teaching room as an instrument. As they explain:

Our instrument, Dispersible Class, is an indicator for whether students in a classroom can spread away from each other while attending the lectures. This can only happen in in-person classes that take place in rooms with movable furniture or in online courses in which students have already spread away from one another.

I worry a little about the sensitivity of the results to the inclusion of fully online classes. By construction, the instrument (Dispersible Class) always takes a value of one for online classes, and the online classes are by definition non-risky, so the variation they are picking up is entirely driven by the riskiness of the face-to-face classes. That is what you want in this analysis, but why include the online classes in the analysis at all since they aren't contributing any variation?

Anyway, nit-picking aside, when Altindag et al. look at who teaches the risky classes, they find that:

...GTAs [Graduate Teaching Assistants] and adjunct instructors, who are ranked low within the University hierarchy, are about eight to ten percentage points more likely to teach a risky class compared to the tenured faculty (full and associate professors) and administrators (such as the department chairs, deans, and others) who teach courses in the same department...

...female instructors are more likely to be teaching risky classes. Additionally... younger faculty face higher risk in their classrooms.

The results are similar for 'very risky' classes. Young faculty and low-ranked faculty (and, possibly, female faculty) have less bargaining power with departmental chairs, so are more likely to acquiesce to a request to teach particular classes, and that is what Altindag et al. find. Those academics have consequently taken on more COVID-19 risk. But, are they compensated for this risk? In their instrumental variables analysis, Altindag et al. find that:

...instructors who teach at least one risky class earn 22.5 percent more than their counterparts who deliver only safe course sections... Relative to the average monthly wage of an instructor in our sample, this effect corresponds to approximately $2,100. In a four-month semester, this impact corresponds to $8,400.

Again, the results are similar for 'very risky' classes. So, even though junior faculty and female faculty take on more risky classes, they are compensated for that additional risk. Note that the estimates of the compensating differential control for the instructor's demographic characteristics, academic level, and experience at Auburn. Altindag et al. find that the compensating differential is roughly the same at all academic levels.

One other criticism is that perhaps the types of classes that junior faculty typically teach happen to be those that are riskier. Altindag et al. address this by running their analysis using data on classes from the previous year, when COVID-19 was not a thing. They find no statistically significant differences in who teaches 'hypothetically-riskier' classes, and no statistically significant wage premium for those teaching 'hypothetically-riskier' classes. That provides some confidence that the effects they pick up in their main analysis relate to risk in pandemic times.

This paper raises an interesting question. Faculty are compensated for coronavirus risk, through higher wages. However, it isn't only faculty who face higher risk. Students attending those classes are at higher risk as well. Is there a compensating differential for students, and if so, how would we measure it? That is a question for future research.

[HT: Marginal Revolution]

Read more:

Saturday, 23 October 2021

The drinking age, prohibition, and alcohol-related harm in India

India is an interesting research setting for investigating the effects of policies, because states can have very different policies in place. Consider alcohol: According to Wikipedia, alcohol is banned in the states of Bihar, Gujarat, Mizoram, and Nagaland, as well as most of the union territory of Lakshadweep. In states where alcohol is legal, the minimum legal drinking age (MLDA) varies from 18 years to 25 years. And the laws change relatively frequently. Mizoram banned alcohol most recently in 2019.

Indian states provide a lot of variation to use for testing the effects of alcohol regulation. And that is what this 2019 article by Dara Lee Luca (Mathematica Policy Research), Emily Owens (University of California, Irvine), and Gunjan Sharma (Sacred Heart University), published in the IZA Journal of Development and Migration (open access), takes advantage of. They first collated exhaustive data on alcohol regulations changes at the state level, focusing on prohibition and changes in the MLDA. They note that:

Between 1980 and 2008, the time frame for our analysis, the MLDA ranged from 18 to 25 years across the country, and some states had blanket prohibition policies. In addition, we identified six states that changed their MLDA at least once; Bihar increased its MLDA from 18 to 21 in 1985, and Tamil Nadu repealed prohibition and enacted an 18-year-old MLDA in 1990, then subsequently increased it to 21 in 2005. Andhra Pradesh and Haryana both enacted prohibitionary policies in 1995 (the MLDA in Andhra Pradesh had been 21, and 25 in Haryana) only to later repeal them in 1998 and 1999.

In all, Luca et al. have data on law changes in 18 states over the period from 1980 to 2009, and for 19 states in a more limited number of years. They then look at a number of different outcome variables, drawn from the 1998-1999 and 2005-2006 waves of the National Family Health Survey, as well as crimes and mortality data. They first show that: who are legally allowed to drink are more likely to report drinking, and the relationship is statistically significant. Given that the mean of alcohol consumption for men in the data is approximately 24%, this 5 percentage point change in likelihood of drinking is substantial, representing a 22% increase in the likelihood of drinking.

So, alcohol regulation does affect drinking behaviour (which seems obvious, but is much less obvious for a developing country like India than it would be for most developed countries). Having established that alcohol consumption is related to regulation, Luca et al. then go on to find that:

...husbands who are legally allowed to drink are both substantially more likely to consume alcohol and commit domestic violence against their partners...

...policies restricting alcohol access may have a secondary social benefit of reducing some forms of violence against women, including molestation, sexual harassment, and cruelty by husband and relatives. At the same time, changes in the MLDA do not appear to be associated with reductions in criminal behavior more broadly. We find suggestive evidence that stricter regulation is associated with lower fatalities rates from motor vehicle accidents and alcohol consumption, but also deaths due to consuming spurious liquor (alcohol that is produced illicitly).

In other words, there is evidence that stricter alcohol regulations are associated with lower levels of alcohol related harm, particular domestic violence and violence against women. Now, these results aren't causal although they are consistent with a causal story. Interestingly, Luca et al. choose not to use instrumental variables analysis (which could provide causal evidence), because the regulations proved to only be weak instruments (and they were also worried about violations of the exclusion restriction, because changes in alcohol regulation might have direct impacts on criminal behaviour). Luca et al. still assert that their results 'suggest a causal channel', and to the extent that we accept that, it highlights the importance of alcohol regulation in minimising alcohol-related harm in a developing country context.

Friday, 22 October 2021

The beauty premium at the intersection of race and gender

I've written a lot about the beauty premium in labour markets (see the links at the end of this post), including most recently earlier this week. However, most studies that I am aware of look at the beauty premium for a single ethnic group, or even a single gender, and don't consider that the premium might different systematically between ethnicity-gender groups. So, I was interested to read this recent article by Ellis Monk (Harvard University), Michael Esposito, and Hedwig Lee (both University of Washington, St. Louis), published in the American Journal of Sociology (ungated version here). Their premise is simple (emphasis is theirs):

Given the racialization and gendering of perceived beauty, we should expect such interactions. In short, while Black men may face double jeopardy on the labor market (race and beauty)... Black women may face triple jeopardy (race, gender, and beauty).

Monk et al. use data from the first four waves of the National Longitudinal Study of Adolescent to Adult Health (Add Health), for 6090 White, Black or Hispanic working people who appeared in all four waves of the survey. Like other studies, they measure beauty based on the ratings given by the interviewers, and then they derive an overall beauty score for each research participant. You might be concerned that the race/gender of the interviewer matters. However, Monk et al. note that:

...the vast majority of interviewers (with measured demographic characteristics) are White (just under 70%), and female. Furthermore, the sample of interviewers was highly educated, with 20% having a postgraduate degree, 28% having a college degree, and 31% having some college experience. Again, this interviewer pool represents actors that respondents may typically encounter as gatekeepers in the labor market. This is helpful for the purposes of our study...

They find few differences in attractiveness ratings given by different race/gender interviewers, although:

Black female interviewers appeared to give slightly lower ratings overall than White women (except for when evaluating Hispanic respondents)... Black male interviewers tended to give lower scores to male respondents regardless of their race/ethnicity.

Monk et al. don't feel a need to condition their results on the race/gender of the interviewers, but since they are evaluating beauty based on four different interviewers' ratings, it probably isn't a big deal.

Anyway, onto the results, which can be neatly summarised by their Figure 2:

Looking at the figure, they find that there is a beauty premium for all six race/gender groups, but the beauty premium differs among those groups. In particular, the beauty premium is largest for Black men and women. However, expressing it that way doesn't quite capture what is going on. It isn't so much a larger positive beauty premium, as a larger penalty for unattractiveness. Notice that the disparity in income between Blacks and Whites is much smaller for the most attractive people than for the least attractive people. In fact, the incomes of the most attractive Black women are higher on average than the incomes of the most attractive White women (controlling for age, education, marital status, and other characteristics). The differences are quite substantial. Monk et al. note that (emphasis is theirs):

White males with very low levels of perceived attractiveness are estimated to earn 88 cents to every dollar likely to be paid to White males who are perceived to possess very high levels of attractiveness. This is similar in magnitude to the canonical Black-White race gap, wherein using the same set of controls we find that a Black person earns 87 cents to every dollar a white person makes.

Sizable income disparities are observed among subjects judged to be least and most physically attractive in each other subpopulation analyzed as well. The ratio of predicted earnings of individuals at the 5th percentile of perceived attractiveness compared to individuals at the 95th percentile of perceived attractiveness is 0.83 among White females; 0.78 among Hispanic males; and 0.80 among Hispanic females. Again, note that returns to attractiveness are most pronounced among Black respondents: Black females at the 5th percentile of attractiveness ratings are estimated to earn 63 cents to every dollar of Black females at the 95th percentile of attractiveness. Black males at the 5th percentile of attractiveness are expected to earn 61 cents to every dollar earned by Black males at the 95th percentile of attractiveness.

 Clearly, these results are important for the future measurement and understanding of the beauty premium in labour markets. As Monk et al. note, they also contradict Daniel Hamermesh's speculative comment in his book Beauty Pays (which I reviewed here) that:

 ...the effects of beauty within the African-American population might be smaller [than among whites]...

Monk et al. conclude that:

...perceived physical attractiveness is a powerful, yet often sociopolitically neglected and underappreciated dimension of social difference and inequality regardless of race and gender. Further still, its consequences are intersectional...

We should be accounting for that intersectionality in future studies of the beauty premium.

[HT: Marginal Revolution]

Read more: