Monday, 5 June 2017

On multiple choice vs. constructed response in tests and exams

Periodically I run into arguments about whether it is appropriate to use multiple choice questions as a form of assessment, or as a large component of assessment that combines multiple choice and constructed response (such as fill-in-the-blanks, calculations, short answer questions, or short or long essays) questions. Before I get into those debates though, it is worth considering what the purpose of assessment is.

From the perspective of the institution, assessment can be seen to have one or more of two main purposes. The first purpose is to certify that students have some minimum standard of knowledge (what is termed criterion-referencing), and a pass-fail type assessment is all that is required. The second purpose is to rank students in order to assign grades (what is termed norms-referencing), and an assessment that facilitates ranking students (thus providing a range of grades) is required.

However, from the perspective of the student (and the lecturer or teacher), the purpose of assessment is to signal the students' ability in the subject matter that is being assessed. Signalling is necessary to overcome the information asymmetry between student and lecturer or teacher - students know how competent they are, but the lecturer doesn't (or at least, the students should have a better idea than the lecturer does prior to the assessment). Students signal their ability by how well they answer the questions they are faced with. A signal is effective if it is costly, and if it is costly in a way that makes it unattractive to those with lower quality attributes to attempt (or makes it impossible for them to do so). In this case, it is difficult to do well in an assessment, and more difficult (or impossible) for less able students, which makes assessment an effective signal of student ability.

Now that we recognise that, for students, assessment is all about signalling, we can see why most lecturers would probably prefer to construct assessments based purely on constructed response questions. It is difficult for less able students to hide from a short answer or essay question, whereas it may be possible for them to do somewhat better on multiple choice questions, given that the answer is provided for them (they just have to select it). The quality of the signal (how well the assessment reveals student ability) is much greater for constructed response questions than it is for multiple choice questions. The trade-off for lecturers is that constructed response questions are more time intensive (costly) to mark, and the marking is more subjective.

The arguments about the use of multiple choice questions I encounter usually come from both directions - there should be more multiple choice (and less constructed response), and there should be less multiple choice (and more constructed response). The former argument says that, since assessment is about assigning grades, all you need from assessment is some way to rank students. Since both multiple choice and constructed response rank students, provided the rankings are fairly similar we should prefer multiple choice questions (and potentially eliminate constructed response questions entirely). The latter argument says that multiple choice questions are biased in favour of certain groups of students (usually male students), or that they are poor predictors of student ability since students can simply guess the answer.

It is difficult to assess the merits of these arguments. To date I have resisted the first argument (in fact, ECON110 has no multiple choice questions at all), and the second argument is best answered empirically. Fortunately, Stephen Hickson (University of Canterbury) has done some of the leg work in terms of answering questions about the impact of question format in introductory economics classes, published in two papers in 2010 and 2011. The analyses are based on data from over 8000 students in introductory economics classes at the University of Canterbury over the period from 2002 to 2007.

In the first paper, published in the journal New Zealand Economic Papers (ungated earlier version here), Hickson looks at which students are advantaged (or disadvantaged) by the question format (multiple choice, MC, or constructed response, CR) used in tests and exams. He finds that, when students' characteristics are investigated individually:
...females perform worse in both MC and CR but the disadvantage is significantly greater in MC compared to CR... All ethnicities perform worse compared with the European group in both MC and CR with a greater relative disadvantage in CR. The same is true for students whose first language is not English and for international students compared to domestic.
However, when controlling for all characteristics simultaneously (and controlling for student ability), only gender and language remain statistically significant. Specifically, it appears that female students have a relative advantage in constructed response (and relative disadvantage in multiple choice), while for students for whom English is not their first language, the reverse is true. Hickson goes on to show that the female advantage in constructed response applies for macroeconomics, but not microeconomics, but I find those results less believable, since they also show that students with Asian, Pacific, and 'other' ethnicities have an advantage in constructed response in microeconomics.

The takeaway from that first paper is that a mix of multiple choice and constructed response is probably preferable to avoid gender bias in the assessment, and that non-native English speakers will be disadvantaged the greater the proportion of the assessment is constructed response.

The second paper, which Hickson co-authored with Bob Reed (also University of Canterbury), was published in the International Review of Economics Education (ungated earlier version here). This paper addresses the question of whether multiple choice and constructed response questions test the same underlying knowledge (which, if they did, would ensure the ranking of students would be much the same regardless of question type). In the paper, they essentially run regressions that use multiple choice results to predict constructed response results, and vice versa. They find that:
...the regression of CR scores on MC scores leaves a substantial residual. The first innovation of our study is that we are able to demonstrate that this residual is empirically linked to student achievement. Since the residual represents the component of CR scores that cannot be explained by MC scores, and since it is significantly correlated with learning outcomes, we infer that CR questions contain “new information about student achievement” and therefore do not measure the same thing as MC questions...
...we exploit the panel nature of our data to construct a quasi-counterfactual experiment. We show that combining one CR and one MC component always predicts student achievement better than combining two MC components.
In the latter experiment, they found that a combination of constructed response and multiple choice always does a better job of predicting students' GPA than does a combination of multiple choice and more multiple choice. They didn't test whether a combination of constructed response and more constructed response was better still, which would have been interesting. However, the takeaway from this paper is that multiple choice and constructed response questions are measuring different things, and a mix is certainly preferable to asking only multiple choice questions. On a related point, Hickson and Reed note:
Bloom’s (1956) taxonomy predicts that MC questions are more likely to test the lower levels of educational objectives (i.e., Knowledge, Comprehension, Application, and, perhaps, Analysis). While CR questions test these as well, they are uniquely suited for assessing the more advanced learning goals (Synthesis and Evaluation).
The more of the higher-level skills you want to assess, the more constructed response should probably be used. This provides another argument for using a combination of multiple choice (to test lower levels on Bloom's taxonomy) and constructed response (to test the higher levels), which is what we do in ECON100.

No comments:

Post a comment