Thursday, 23 November 2023

New results on the bat-and-ball problem

A bat and a ball cost $1.10 in total. The bat costs $1.00 more than the ball. How much does the ball cost?

If you guessed ten cents, you would be in the majority. You would also be quite wrong. The correct answer is five cents. This 'bat-and-ball' problem is quite famous (and you may have seen it, or a question like it, before - a variant was in a pub quiz that I competed in a few weeks ago, for example). The problem is one of three questions included in the Cognitive Reflection Test, which purports to measure whether people engage in cognitive reflection, or are more prone give into 'intuitive thinking'. It also relates to what Daniel Kahneman referred to in his book Thinking Fast and Slow as System 1 and System 2 thinking. System 1 is intuitive and automatic (and gives a ready answer of ten cents to the bat-and-ball problem), while System 2 is slower and reflective (and is more likely to lead to the correct answer of five cents).

However, a new article by Andrew Meyer (Chinese University of Hong Kong) and Shane Frederick (Yale University), published in the journal Cognition (open access), may give us reason to question the theory of System 1 and System 2 thinking (or reason to question the validity of the bat-and-ball question). Frederick is the author who introduced the Cognitive Reflection Test, so the results reported in this paper should be considered especially notable.

Meyer and Frederick conducted a number of studies of the bat-and-ball problem, showing a number of increasingly disquieting results. First:

...verifying the intuitive response requires nothing more than adding $1.00 and $0.10 to ensure that they sum to $1.10 (they do) and subtracting $0.10 from $1.00 to ensure that they differ by $1.00 (they don't). Since essentially everyone can perform these verification tests, the high error rate means that they aren't being performed or that respondents are drawing the wrong conclusion despite performing them.

If respondents aren't attempting to verify their answer, encouraging them to do so may help. We tested this in five studies involving a total of 3219 participants who were randomly assigned to either a control condition or to one of four warning conditions shown below. Two studies were administered to students who used paper and pencil. The rest were web-based surveys of a broader population...

The warnings improved performance, but not by much... This suggests that they failed to engage a checking process, or that the checking process was insufficient to remedy the error...

Specifically, only 13 percent of research participants in the pure control group got the bat-and-ball problem correct. In the treatment group that received the simplest warning (which simply warned: "Be careful! Many people miss this problem"), this increased to 23 percent. There were modest increases in performance across other studies that Meyer and Frederick report (with various different wordings of the warning), ranging from -9 percentage points to +17 percentage points. They don't report a measure of statistical significance, but the magnitude of the change is not large, and warnings to check the answer don't eliminate the intuitive response. Evidently, research participants aren't great at checking their answer. Or maybe, they simply don't perform any check at all. What about being more directive that research participants should check their answer if their original answer was ten cents:

Since these warnings were ineffective, we next tried an even stronger manipulation by telling respondents that 10 cents is not the answer. We conducted eight such experiments, with a total of 7766 participants. In five studies (three online and two paper and pencil), participants were randomly assigned to either the control condition or to a Hint condition in which the words “HINT: 10 cents is not the answer” appeared next to the response blank...

In three other studies (two online and one in-lab), we used a within-participant design in which the Hint was provided after the participant's initial response. In those studies, respondents could revise their initial (unhinted) response, and we recorded both their initial and final responses...

The hint that the answer wasn't 10 cents helped substantially, but, more notably, many – and sometimes most – still failed to solve the problem...

Receiving the hint increased performance in the bat-and-ball problem by between +17 percentage points and +23 percentage points in a between-subjects comparison (comparing research participants who received the hint with those that didn't receive the hint), and between +16 and +22 percentage points in a within-subjects comparison (where research participants could change their answer after they received the hint). The latter results lead Meyer and Frederick to note that:

Though the bat and ball problem is often used to categorize people as reflective (those who say 5) or intuitive (those who say 10), these results suggest that the “intuitive” group can – and should – be further divided into the “careless” (who answer 10, but revise to 5 when told they are wrong) and the “hopeless” (who are unable or unwilling to compute the correct response, even when told that 10 is not the answer).

Why would so many research participants still maintain that the answer is ten cents, even when they are explicitly told that ten cents is not the correct answer? Meyer and Frederick suggest that:

This result has hallmarks of simultaneous contradictory belief (Sloman, 1996), because respondents who report that $1.00 and $0.10 differ by $1.00 obviously do not actually believe this. It is also akin to research on Wason's four card task showing that participants will rationalize their faulty selections, rather than change them (Beattie & Baron, 1988; Wason & Evans, 1974). It could also be considered as an Einstellung effect (Luchins, 1942), in which prior operations blind respondents to an important feature of the current task or as an illustration of confirmation bias, in which initial erroneous interpretations interfere with the processes needed to arrive at a correct interpretation (Bruner & Potter, 1964; Nickerson, 1998).

I would put a lot of this down to motivated reasoning. However, it gets even worse:

...we ran two studies on GCS in which we asked respondents to either consider the correct answer (N = 2002) or to simply enter it (N = 1001)...

Asking respondents to consider the correct answer more than doubled solution rates, but only to 31%. Asking them to simply enter the correct answer worked better, as 77% did so, though, notably, the intuitive response emerged even here.

So, when research participants are asked to consider if the answer could be five cents, more than half still get it wrong. And even when research participants were told that the answer is five cents, and directed to write down five cents as the answer, nearly a quarter of research participants still get the answer wrong. That leads Meyer and Frederick to conclude that:

...the very existence of such manipulations (and their lack of complete efficacy) undermines a conclusion many draw from dual process theories of reasoning: that judgmental errors can be avoided merely by getting respondents to slow down and think harder...

Meyer and Frederick use all of these results (and others) to suggest that people engage in an 'approximate checker' process, wherein if the intuitive result provided by System 1 is approximately correct, then the more deliberative System 2 doesn't go through a complete process of checking. They demonstrate this with some further results that show that:

As the price difference between the bat and ball decreases, participants slow down... and solution rates rise markedly – from 14% to 57%...

So, perhaps these results are not fatal for the idea of System 1 and System 2 thinking, but psychologists and behavioural scientists need to re-think the conditions under which System 2 operates, and whether it always operates optimally. The results also suggest that the bat-and-ball problem may not actually show quite what it purports to - at least, it doesn't necessarily show cognitive reflection, as even when such reflection is explicitly invoked (through asking research participants to check their answer, or telling them to consider if the answer might be five cents), many do not exhibit such reflection (or else, they reflect and still get the answer wrong. Meyer and Frederick finish by noting that:

...the remarkable durability of that error paints a more pessimistic picture of human reasoning than we were initially inclined to accept; those whose thoughts most require additional deliberation benefit little from whatever additional deliberation can be induced.

[HT: Marginal Revolution. back in September]

1 comment:

  1. Any published sucess rates on this problem
    x + y = $1.10
    x - y = $1.00
    Solve for x and y

    ReplyDelete