Sunday, 13 February 2022

More on gender bias in student evaluations of teaching

Back in 2020, I wrote a post on gender biases in student evaluations of teaching, highlighting five research papers that showed pretty clearly that student evaluations of teaching (SET) are biased against female teachers. I've recently read some further research on this topic that I thought I would share, some of which supports my original conclusion, and some of which should make us pause, or at least draw a more nuanced conclusion.

The first article is this 2020 one by Shao-Hsun Keng (National University of Kaohsiung), published in the journal Labour Economics (sorry, I don't see an ungated version online). Keng uses data from 2002 to 2015 from the National University of Kaohsiung, covering all departments. They have data on student evaluations, and on student grades, which they use to measure teacher value-added. In the simplest analysis, they find that:

...both male and female students give higher teaching evaluations to male instructors. Female students rate male instructors 11% of a standard deviation higher than female instructors. The effect is even stronger for male students. Male students evaluate male instructors 15% (0.109 + 0.041) of a standard deviation higher than female instructors.

 Interestingly, Keng also finds that:

Students who spend more time studying give higher scores to instructors, while those cutting more classes give lower ratings to instructors.

It is difficult to know which way the causality would run there though. Do students who are doing better in the class recognise the higher-quality teaching with better evaluations? Or do students who are enjoying the teaching more spend more time studying? Also:

Instructors who have higher MOST [Ministry of Sciences and Technology] grants receive 1.2% standard deviation lower teaching evaluations, suggesting that there might be a trade-off between research and teaching.

That suggests that research and teaching are substitutes (see my earlier post on this topic). Keng then goes on the separately analyse STEM and non-STEM departments, and finds that:

Gender bias in favor of male instructors is more prominent among male students, especially in STEM departments. Female students in non-STEM departments, however, show a greater gender bias against female instructors, compared to their counterparts in STEM departments.

In other words, both male and female students are biased against female teachers, but male students are more biased. Male STEM students are more biased than male non-STEM students, but female non-STEM students are more biased than female STEM students. Interesting. Keng then goes on to show that:

...the gender gap in SET grows as the departments become more gender imbalanced.

This effect is greater for female students than for male students, so female students appear to be more sensitive to gender imbalances. This is not as good as it may sound - it means that female students are more biased against female teachers in departments that have a greater proportion of male teachers (such as STEM departments). Finally, Keng uses their measure of value-added to make an argument that the bias against female teachers is related to statistical discrimination. However, I don't find those results persuasive, as they seem to rely on an assumption that as teachers remain at the institution longer, students learn about their quality. However, students are only at the institution for three or four years, and don't typically see the same teachers across multiple years, so it is hard to see that this is a learning effect on the students' side. I'd attribute it more to the teachers better understanding what it take to get good teaching evaluations.

Moving on, the second article is this 2021 article by Amanda Felkey and Cassondra Batz-Barbarich (both Lake Forest College), published in the AEA Papers and Proceedings (sorry, I don't see an ungated version online). Felkey and Batz-Barbarich conduct a meta-analysis of gender bias in student evaluations of teaching. A meta-analysis combines the results across many studies, allowing us to (hopefully) overcome statistical biases arising from looking at a single study. Felkey and Batz-Barbarich base their meta-analysis on US studies covering the period from 1987 to 2017, which includes 15 studies and 39 estimated effect sizes. They also compare economics with other social sciences. They find that:

In the 30 years spanned by our metadata, there was significant gender difference in SETs that favored men for economics courses... Gender difference in the rest of the social sciences favored women on average but was statistically insignificant...

The p-value for other social sciences is 0.734, so is clearly statistically insignificant. The p-value for economics is 0.051, which many would argue is also statistically insignificant (although barely so). However, in a footnote, Felkey and Batz-Barbarich note that:

We found evidence that our results for economics were impacted by publication bias such that the gender difference is actually greater...than our included studies and analyses suggest.

They don't present an analysis that accounts for the publication bias, which might have shown a more statistically significant gender bias. This is bad news for economics, but might it be good news for other disciplines? It's not consistent with the results of other analyses of gender bias in SETs, where it appears across all disciplines (see the Keng study above, or studies in this earlier post). Usually, I would strongly favour the evidence in a meta-analysis over individual studies, but it is difficult when the meta-analysis seems to show something different from the studies I have read. Moreover, Felkey and Batz-Barbarich don't find any evidence of publication bias in disciplines other than economics, which suggests the null finding for those disciplines is robust. Perhaps gender bias in teaching evaluations is really just a feature of economics and STEM disciplines? I'd want to see a more detailed analysis (the AEA Papers and Proceedings don't offer the opportunity for authors to include a lot of detail), before drawing a strong conclusion, but this should make us more carefully evaluate the evidence on gender bias, especially outside of the STEM disciplines.

Read more:

No comments:

Post a Comment