Friday, 8 March 2019

Teaching evaluations and grade inflation

Teaching evaluations (or student evaluations of teaching, SETs) are the most common way in which university lecturers' teaching performance is measured. Students are asked, at the end of a course, to evaluate their lecturer against a bunch of criteria, and then those results are used to derive some summary statistic that is supposed to represent the quality of teaching for that semester. If students are genuinely rating their lecturers on teaching quality, then the summary statistic should reflect actual teaching quality.

There is some evidence for this. In a 2012 article (gated, sorry I don't see an ungated version anywhere online), by Trinidad Beleche (Food and Drug Administration), David Fairris and Mindy Marks (both University of California - Riverside), and published in the journal Economics of Education Review, the authors demonstrate that students do reward genuine learning with higher teaching evaluations. They used data from 1106 students attending a four-year public university in the U.S. over 2007-2009. Importantly, they had a pre-test and post-test evaluation of student learning, so they could measure (separately from grades) students' learning objectively. However, while they found support for evaluations being associated with greater (objectively measured) learning, they found that:
...the relationship between knowledge gain and evaluation scores is very small. A one standard deviation increase in learning (as measured by the post-test) is associated with a 0.05–0.065 increase in course evaluation scores on a five point scale.
So, when students learn more (on average), the teaching evaluations are slightly higher. However, despite the relationship being statistically significant, it is very small. So, that shouldn't fill us with confidence that teaching evaluations accurately capture teaching quality.

In fact, there is good reason to question whether teaching evaluations reflect teaching quality at all (many of the problems are summarised in this article by John Lawrence). For instance, students might rate lecturers based on how much they liked the course or lecturer, rather than the quality of teaching. For difficult courses, this could lead to a low rating even if the teacher is excellent. Similarly, students might not recognise the amount of genuine learning they have experienced during the semester, and so their rating does not reflect actual teaching quality. Perhaps they base their evaluation of their learning on the grade they receive in the course, or the grade they expect to receive at the end of the course (if the evaluation is done before final grades are available). If the student receives (or expects to receive) a higher grade, then they infer that they must have learned more, and so they reward the lecturer with a better teaching evaluation.

If this is what is happening, then that creates some interesting incentives for lecturers. Lecturers' teaching performance may affect their chance of promotion, their chance of tenure, or their future salary. So, there is an incentive to get better teaching evaluations. One way to achieve better teaching evaluations is to put more effort into teaching. However, that is costly to the lecturer, in terms of time and cognitive effort, as well as incurring an opportunity cost of time spent away from research.

However, maybe there is an easier way? If teaching evaluations were a function of grades, then a higher grade distribution would lead to a higher teaching evaluation, holding all else equal. So, if a lecturer ensures they give students higher grades, they can increase their teaching evaluation, without having to go to the effort of increasing their teaching quality. We would see grade inflation.

Is there any evidence for this? Over the summer, I read a couple of papers that suggest there is. First, this 2012 article (also no ungated version), published in the Economics of Education Review, by Andrew Ewing (Eckerd College), provides an initial answer. Ewing used data from the College of Arts and Sciences at the University of Washington, over the period 1996 to 2006, which included over 5400 lecturers, teaching over 53,000 courses. He found that:
...no matter the estimation procedure, there is a significant positive effect of relative expected grades on evaluation scores. The magnitude of this impact ranges from a 0.167 point increase in SET score for every point increase on the relative expected grade scale for a particular sub-college to a 0.701 point increase for the Economics department.
So, when students are expecting higher grades (on average), they give higher teaching evaluations. It is interesting that this effect appears to be largest for economics. That suggests that economics lecturers were rewarded the most for inflated grade distributions. It's hard to see why that would be the case though. You might expect economists to be more likely to respond to incentives such as high grades leading to better teaching evaluations. However, that doesn't explain why they would be rewarded more for doing so (that might be interesting to reflect on later).

A more recent (2017) article also published in the Economics of Education Review (ungated earlier version here), by Devon Gorry (Utah State University), provides further evidence for the effect of grades on teaching evaluations. This paper is interesting because it evaluates what happens to grades (and teaching evaluations), when a policy regarding the recommended grade point average (GPA) for each course was changed. Specifically:
Starting in the spring of 2014, the business school implemented a policy that established a recommended average grade in required business courses... The policy states that grades in required business courses “typically should not exceed a class average of 2.8 in [introductory] courses and 3.2 in [intermediate] courses.”
Courses that typically had GPAs higher than those ceilings were impacted by the policy change, and Gorry used that to evaluate the impact of teaching evaluations. He found that:
...the policy did lead to a decrease in class GPA by 0.132 points for treated classes. When broken out by introductory and intermediate courses, the results are similar with a decrease in GPA of 0.155 points in introductory courses and 0.132 points in intermediate courses... professors gave significantly fewer As. There were insignificant increases in Bs, Cs, and Fs...
In introductory courses, professors met the ceiling of 2.8 by giving fewer As and Bs and substituting with significantly more Cs and a meaningful but insignificant increase in Fs... In intermediate courses, professors met the ceiling of 3.2 by giving fewer As and more Bs...
The overall results show that the grade ceiling policy is associated with a statistically significant decrease in teaching ratings by 0.150 points on a 5 point scale. This corresponds to about a quarter of a standard deviation on the teaching rating scale... 
In other words, the policy did reduce grades, and students responded by giving worse teaching evaluations. The implication is that lecturers could increase their teaching evaluations by giving higher grades. Teaching evaluations provide incentives for grade inflation.

[HT for the John Lawrence article: Tony Smith]

No comments:

Post a Comment