Generative AI should be changing the way that universities assess students. I say "should be", rather than "is", because it seems to me that a lot of teaching staff really have their head in the sand on this, continuing to assess in a very similar way, and simply attaching a warning label ("thou shalt not use generative AI") to each assessment, as if that will make a difference. The futility of that approach is the topic of this new article by Thomas Corbin, Phillip Dawson (both Deakin University), and Danny Liu (University of Sydney), published in the journal Assessment and Evaluation in Higher Education (open access).
Corbin et al. focus attention on the university level frameworks, but many of the things that they say apply equally to each paper. When considering how to approach the impact of generative AI on assessment, and how assessment needs to change as a result of generative AI Corbin et al. distinguish two approaches: (1) discursive changes, which involve telling students what is and what is not permitted; and (2) structural changes, which involve changing the assessment itself so that the way that students may use AI (or not) is specifically factored into the assessment.
Corbin et al. make the important point that:
...existing frameworks predominantly rely on merely discursive methods which introduces significant vulnerabilities related to compliance and enforceability, ultimately undermining assessment validity and institutional reputation. Although these systems may have value in other areas, for example by assisting teachers to conceptualise the different ways AI may be used in a task, from a validity standpoint any change which is merely discursive and not structural is likely to cause more harm than good.
Discursive approaches include 'traffic light' systems, or various assessment scales, where teachers communicate to students what generative AI use is or is not allowed. They also include requirements for students to disclose the use of generative AI in their assessments. The problem with discursive changes to assessment is obvious:
Without reliable detection mechanisms, prohibitions against AI use remain merely discursive. This technological limitation exposes a more fundamental issue with discursive approaches. That is, they rely entirely on student compliance with rules that cannot be enforced.
There is not reliable way of detecting generative AI use in student assessment. The best that teachers can do is to rely on vibes. Or when a student writes in their essay that they are 'delving' into a 'rich tapestry' or a 'multifaceted realm' and trying to find the 'intricate balance' or a 'symbiotic relationship'.
Corbin et al. instead advocate for structural changes, which they define as:
Modifications that directly alter the nature, format, or mechanics of how a task must be completed, such that the success of these changes is not reliant on the student’s understanding, interpretation, or compliance with instructions. Instead, these changes reshape the underlying framework of the task, constraining or opening the student’s approach in ways that are built into the assessment itself.
They illustrate with some examples, starting with:
A traditional take-home essay (asynchronous) provides students with ample opportunity to use AI without detection, regardless of what instructions are provided. In contrast, a supervised in-class writing exercise (synchronous) inherently limits AI assistance by its very structure.
Justin Wolfers would approve. However, Corbin et al. rightly note that:
This doesn’t mean that all assessment should become synchronous and supervised; certainly, asynchronous assessment has valuable benefits for developing certain skills. The key is aligning the assessment structure with what we genuinely want to measure. If we want to develop a student’s ability to think deeply and develop complex arguments over time, an asynchronous format may be appropriate, but we would need to build in structural assessment elements that capture the development process rather than just the final product.
Corbin et al. don't leave us hanging. Even though they can't solve all of our AI-related assessment issues, they do offer some suggestions:
First, structural changes frequently involve reorienting assessment from output to process. Rather than evaluating only the final product, which could potentially be AI-generated, assessment may be designed to capture the student’s development and attainment of understanding and skill over time. This might mean building in authenticated checkpoints where students must demonstrate their evolving thinking. For instance, rather than simply submitting a final essay, students might need to participate in live discussions about their developing ideas or demonstrate how their thinking evolved through structured peer feedback sessions...
Second, structural changes often involve viewing assessment validity at the unit or module level rather than the task level. Instead of trying to ensure each individual assignment is AI-proof (an increasingly futile endeavour), educators can design interconnected assessments where later tasks explicitly build on a student’s earlier work.
This relates back to two earlier posts of mine. This post talks about assessment specifically, while this post talks about changing the way that students interact with generative AI in learning and assessment tasks, so that they skills are scaffolded through their degree. We do need to make changes to assessment practices. It is possible for assessments to change in ways that take account of students' access to generative AI. It is not necessary to make generative AI use forbidden in all situations. It is probably equally unhelpful to make it 'open season' on generative AI use either. As with all things, there is a balance to be had, and university teachers need to find that balance.
Corbin et al. conclude that discursive changes to assessment:
...remain powerless to prevent AI use when they rely solely on student compliance. They say much but change little. They direct behaviour they cannot monitor. They prohibit actions they cannot detect. In other words, when it comes to appropriate assessment change for a time of AI, talk is cheap.
Simply using a set of written rules on when and how students can use generative AI is, at best, ineffective, and at worst, may actively harm student learning. Those rules are cheap talk. We can do much better.
[HT: Maria Neal]
Read more:
No comments:
Post a Comment