Sunday, 13 April 2025

Anthropic on how university students are using generative AI

This week Anthropic released a fascinating and important report on university students' use of generative AI (specifically, how they are using Anthropic's Claude AI). The report is based on anonymised data from over 570,000 conversations (by people with a university-affiliated email address, who the AI judged to be students, not staff) over an 18-day period. 

The report has a number of important insights, but I want to focus on two in particular. First, it tells us how students are interacting with Claude. Anthropic summarises this with the following taxonomy:

Anthropic then note that:

These four interaction styles were represented at similar rates (each between 23% and 29% of conversations), showing the range of uses students have for AI.

Now, most people are probably most interested in how students are interacting with Claude, in order to determine the extent of cheating in assessment that is going on. In terms of the four categories, I'd suggest that there is a clear hierarchy. Direct output creation is the most likely to be cheating, since asking AI to write an essay or project report is likely to fit in there. Next is direct problem solving, since asking AI to provide answers to take-home tests and multiple-choice quizzes is likely to be in that category. However, students asking direct questions, using generative AI in place of a search engine, would also be captured, and that likely isn't cheating and is likely to contribute to student learning (indeed, that is how we encourage students to use Harriet, our ECONS101 AI tutor). Third is collaborative output creation, since that may involve rewriting an essay to thwart plagiarism tools, or using AI to provide critiques of other output, or debugging code. However, there is no doubt a lot of genuine collaborative effort that is allowed as part of assessment guidelines that will fit into that category. Finally, collaborative problem solving is likely to be the least problematic. A 'Socratic tutor' or guided learning approach would fit in here, for example.

In relation to cheating, Anthropic notes that:

...nearly half (~47%) of student-AI conversations were Direct—that is, seeking answers or content with minimal engagement. Whereas many of these serve legitimate learning purposes (like asking conceptual questions or generating study guides), we did find concerning Direct conversation examples including:

  • Provide answers to machine learning multiple-choice questions
  • Provide direct answers to English language test questions
  • Rewrite marketing and business texts to avoid plagiarism detection

These raise important questions about academic integrity, the development of critical thinking skills, and how to best assess student learning. Even Collaborative conversations can have questionable learning outcomes. For example, “solve probability and statistics homework problems with explanations,” might involve multiple conversational turns between AI and student, but still offloads significant thinking to the AI.

They are absolutely right that how to best assess student learning is an important question. And as Justin Wolfers notes, any high-stakes at-home assessment is essentially a non-starter in terms of credibility. That certainly limits the options available to lecturers and teachers.

The second important insight from the report is the cognitive level at which students are engaging with Claude. This is the really worrying aspect (and the modes of interaction are worrying enough already), because:

We saw an inverted pattern of Bloom's Taxonomy domains exhibited by the AI:

  • Claude was primarily completing higher-order cognitive functions, with Creating (39.8%) and Analyzing (30.2%) being the most common operations from Bloom’s Taxonomy.
  • Lower-order cognitive tasks were less prevalent: Applying (10.9%), Understanding (10.0%), and Remembering (1.8%).

In other words, students were outsourcing tasks that were higher on Bloom's taxonomy. As I noted in this post last year:

Teachers might hope that generative AI is better at the lower levels - things like definitions, classification, understanding and application of simple theories, models, and techniques. And indeed, it is. Teachers might also hope that generative AI is less good at the higher levels - things like synthesising papers, evaluating arguments, and presenting its own arguments. Unfortunately, it also appears that generative AI is also good at those skills. However, context does matter. In my experience, and this is subject to change because generative AI models are improving rapidly, generative AI can mimic the ability of even good students at tasks at low levels of Bloom's taxonomy, which means that tasks at that end lack any robustness to generative AI. However, at tasks higher on Bloom's taxonomy, generative AI can mimic the ability of failing and not-so-good students, but is still outperformed by good students. So, many assessments like essays or assignments that require higher-level skills may still be a robust way of identifying the top students, but will be much less useful for distinguishing between students who are failing and students who are not-so-good.

It seems that either I was wrong in my assessment of the strengths of generative AI at different levels of Bloom's taxonomy, or that despite the weaknesses of generative AI at the higher levels, students still prefer to use it that way. That might reflect comparative advantage. Perhaps generative AI is better at both lower-level and higher-level tasks (it has absolute advantage in both), but it has comparative advantage in the higher-level tasks? In that case, students may find it more useful to have the generative AI work on the higher-level tasks, while they complete the lower-level tasks themselves (I feel like that would be a useful topic to explore in a future post). Anyway, Anthropic point to the worries when they note that:

...AI systems may provide a crutch for students, stifling the development of foundational skills needed to support higher-order thinking.

This was definitely an interesting report and provides some genuine insights into university student use of generative AI. I feel like there is much more to learn here, including how to steer students towards more collaborative modes of engagement with the AI, which are likely to lead to more learning. It also highlights (yet again, as if we need further reminders) the vulnerability of much assessment to students' use of AI.

[HT: Marginal Revolution]

Read more:

No comments:

Post a Comment