Wednesday, 26 March 2025

GPT-4 tells us how literary characters would play the dictator game

Have you ever wondered what it would be like to interact with your favourite literary characters? What interesting conversation might we have with Elizabeth Bennet or Clarissa Dalloway? Or, who would win if we played a game of monopoly with Ebenezer Scrooge or Jay Gatsby? Large language models like ChatGPT can provide us with a partial answer to that question, because they can be prompted to take on any persona. And because of the wealth of information available in their training data, LLMs are likely to be very convincing at cosplaying famous literary characters.

So, I was really interested to read this new article by Gabriel Abrams (Sidwell Friends High School), published in the journal Digital Scholarship in the Humanities (ungated earlier version here). Abrams asked GPT-4 to play the role of a large number of famous literary characters when playing the 'dictator game'. To review, in the dictator game the player is given an amount of money, and can choose how much of that money to keep for themselves, and how much to give to another player. Essentially, the dictator game provides an estimate of fairness and altruism.

Abrams first asked GPT-4 to identify the 25 most well-known fictional characters in each century from the 17th Century to the 21st Century. Then, for each character, Abrams asked GPT-4 to play the dictator game, as well as to identify the particular personality traits that would affect the character's decision in the game. Abrams then took each personality trait and asked GPT-4 to assign it a valence (positive, neutral, or negative). Finally, Abrams summarised the results by Century, finding that:

There is a general and largely monotonic decrease in selfish behavior over centuries for literary characters. Fifty per cent of the decisions of characters from the 17th century are selfish compared to just 19 per cent from the 21st century...

Humans are more selfish than the AI characters with 51 per cent of humans making selfish decisions compared to 28 per cent of the characters...

So, over time literary characters have become less selfish, but overall the characters are more selfish than real humans. An interesting question, which can't be answered with this data, is whether the change in selfishness also reflects a decrease in selfishness in the population generally (because the selfishness of humans was measured in the 21st Century only). Interestingly, looking at personality traits:

Modeled characters’ personality traits generally have a strong positive valence. The weighted average valence across the 262 personality traits was a surprisingly high +0.47...

I associate many literary figures with their negative traits, and less so with positive traits. Maybe that's just me. Or maybe, the traits that GPT-4 thought were most relevant to the choice in the dictator game tended to be more positive traits. Given that the dictator game is really about altruism and fairness, then that might explain it. Over time, there hasn't been a clear trend in valence:

The 21st century had the highest valence at +0.74... The least positive centuries were the 17th and 19th with +0.28 and +0.29, respectively...

Abrams then turned to the specific personality traits, identifying the traits that were more common (overweighted) or less common (underweighted) in each century, compared with overall. This is summarised in Table 6 from the paper:

There are some interesting changes there, with empathetic shifting from being the most underweighted trait to being the most overweighted trait, while manipulative shifts in the opposite direction (from most overweighted to third-most underweighted). Interesting, and not necessarily what I would have expected. Abrams concludes that:

The Shakespearean characters of the 17th century make markedly more selfish decisions than those of Dickens, Dostoevsky, Hemingway and Joyce, who in turn are more selfish than those of Ishiguro and Ferrante in the 21st century.

Historical literary characters have a surprisingly strong net positive valence. It is possible that there is some selection bias. For instance, scholars or audiences may make classics of books with mainly attractive characters.

That makes sense. One thing that I found missing in the paper was a character-level assessment. It would have been interesting to see the results for favourite (and least favourite) characters individually, and see how they compare with what we might have expected. That could have been added to supplementary materials for the paper, and would have been an interesting read.

Nevertheless, this paper was an interesting exploration of just some of what LLMs can be used for in research. As I've noted before, LLMs have essentially killed off online data collection using tools like mTurk, because the mTurkers may simply use LLMs to respond to the survey or experiment. Researchers can now cut out the middleman, and use LLMs directly to cosplay for research participants based on any collection of characteristics (age, gender, ethnicity, location, etc.). The big question now is, when LLMs are used in this way, is some of the real underlying variation in human responses lost (because LLMs will tend to give a 'median' response for the group they are cosplaying)? The answer to that question will become clear as researchers continue on this path.

[HT: Marginal Revolution, back in 2023]

No comments:

Post a Comment