I have to admit to experiencing a non-trivial amount of schadenfreude this year, as the Kansas City Chiefs find themselves with a losing record in December for the first time in a decade. My mild animosity towards the Chiefs is based entirely on their supreme performance over that decade. After they've had a few losing seasons, I won't care anymore (which is how I feel about the Patriots right about now). However, there are plenty of people who have griped about the Chiefs, and claimed that the Chiefs receive favourable referee calls.
I'd label that a conspiracy theory, but it has apparently caught the attention of researchers. This recent article by Spencer Barnes (University of Texas at El Paso), Ted Dischman (an independent researcher), and Brandon Mendez (University of South Carolina), published in the journal Financial Review (sorry, I don't see an ungated version online), explicitly tests whether the Kansas City Chiefs receive favourable referee calls. Specifically, Barnes et al.:
...compare penalty calls benefiting the Mahomes-era Kansas City Chiefs (from 2018 to 2023) and the Brady-era New England Patriots (2015–2019) across the regular and postseason...
Barnes et al. argue that:
...financial pressures, particularly those related to TV revenue (the primary source of revenue for the NFL), serve as the underlying mechanism.
In other words, Barnes et al. claim that the NFL has a strong financial incentive to bias officiating in favour of the 2018-2023 Kansas City Chiefs, to a greater extent than any bias in favour of the 2015-2019 New England Patriots. As we’ll see, the empirical strategy is poorly chosen, parts of the results are misinterpreted, and the proposed TV-revenue mechanism is implausible. All up, you shouldn't believe this paper's results.
What did they do? Barnes et al. use play-by-play data covering the 2015 to 2023 seasons. They restrict their attention to defensive penalties only, which gives them a sample of 13,136 penalties across 2435 games. They apply a fairly simple linear regression model to the data:
Here we find the first problem with their analysis. If you want to show that the Mahomes-era Kansas City Chiefs benefited from more defensive penalties than other teams, you should be running a difference-in-differences analysis. Essentially, you compare the difference between the Chiefs and other teams, between the period before and the period after Patrick Mahomes started playing. In other words, you should test whether the Chiefs’ advantage in penalties grows after Mahomes started playing, compared with their earlier advantage and with other teams over the same period. Barnes et al. simply test for a level difference between the Chiefs and other teams during that time (using the 'Dynasty' variable), but fail to account for whether the Chiefs might already benefit from more defensive penalties before Mahomes became the starting quarterback (in 2018). Indeed, Figure 1 in the paper shows that the Chiefs did benefit from more defensive penalties per game before 2018:
That difference prior to 2018 should be controlled for. Having said that, the difference from the rest of the NFL teams looks bigger from 2018 onwards (but mostly concentrated in 2018-19, and in 2023), so if they had used the more correct difference-in-differences model (or, when comparing regular and post-season, a triple-differences model), they might still have found a statistically significant effect.
There is a further, albeit more minor, issue with the analysis. Barnes et al. control for 'defensive team fixed effects', which they argue controls "for differences in how opposing teams play defense and how frequently they are penalized". However, teams change the way they play defence, particularly when the defensive coordinator changes. So really, they should have used defensive-team-by-season fixed effects there, which would allow the way a team plays (and gets penalised) to vary from season to season, and control for that.
Barnes et al. look at the effect on several outcome variables:
Our primary dependent variables capture different dimensions of officiating decisions. The first is Penalty Yards, which measures the total yards gained or lost due to penalty calls. If the NFL or its officials favor a particular team, we expect them to benefit from potentially more penalty yards assessed against their opponents. The second variable, First Down, is a binary indicator that takes a value of 1 if a penalty call results in an automatic first down. Because first downs have a direct impact on a team’s ability to sustain drives and score points, this measure captures whether penalties disproportionately help a team advance the ball. The third variable, Subjective, is a binary indicator equal to 1 if the defensive penalty falls into a category requiring referee discretion...
The 'Subjective' variable is described in the appendix to the paper, and appears to be far too inclusive since it includes penalties like 'Face Mask' and 'Horse Collar Tackle' that seem to me not to be particularly subjective (and those two categories alone made up 6 percent of all penalties, and a much higher proportion of the 'subjective' penalties).
Putting aside the issues with the analysis for a moment, Barnes et al. find that:
...penalties against Kansas City during the regular season result in 2.02 fewer yards (𝑝 < 0.01), are 8 percentage points less likely to have a penalty call that results in a first down (𝑝 < 0.01), and are 7 percentage points less likely to have subjective penalties (𝑝 < 0.05) compared to the rest of the NFL. This pattern is decisively reversed in postseason contests, where penalties against the Chiefs offense yield 2.36 more yards (𝑝 < 0.05), are 23 percentage points more likely to have a penalty call that results in a first down (𝑝 < 0.01), and are 28 percentage points more likely to have subjective calls (𝑝 < 0.01) compared to the rest of the NFL in the playoffs.
Barnes et al. have explained this incorrectly. Notice their wording suggests the penalties are called on Kansas City (i.e. hurting the Chiefs). Their analysis actually shows that penalties against Kansas City Chiefs' opponents result in 2.02 fewer yards during the regular season, and penalties against Kansas City Chiefs' opponents (not the Chiefs offense) yield 2.36 more yards in the postseason. At least, that is according to the notes to their Table 3, which says:
The dependent variable in Columns (1) and (4) is the realized yardage for the offensive team resulting from a penalty on the defensive team... The independent variable of interest, Kansas City Chiefs, is a binary indicator variable that equals 1 if the offensive team is the Kansas City Chiefs and 0 otherwise.
So, the correct way of interpreting those results is penalties against the opposing defence, not penalties against Kansas City. Barnes et al. then turn to applying the same analysis to the 2015-2019 New England Patriots, and find effects that are mostly statistically insignificant (and small). For other teams that might arguably be called a 'dynasty' (for a sufficiently low bar for what constitutes a dynasty, Barnes et al. find no evidence of differences in defensive penalty calls. That sample includes the Philadelphia Eagles (2017-2023), the Los Angeles Rams (2018-2023), and the San Francisco 49ers (2019-2023).
At this point, the problem with the mechanism starts to become clear. Barnes et al. start to look at TV viewership, and argue that:
If certain teams, particularly those associated with high-profile players, systematically attract larger audiences, then maintaining the success or visibility of those teams may align with the league’s broader financial interests.
If the NFL wanted to attract a larger audience, and aimed to do so by biasing officiating in favour of a particular team, why on earth would they choose a small market team like Kansas City? Surely they would want to boost a large-market team? According to this ranking, Kansas City is only the 35th-largest sports media market in the US. Now, Patrick Mahomes is a star quarterback (he was the 10th overall pick in the 2016 NFL draft), so maybe it's the combination of star quarterback and media market that matters. However, Tom Brady was also a star quarterback, and Boston is the 10th-largest sports media market. So, why weren't the Patriots getting favourable calls in 2015-2019? If, as Barnes et al. seem to argue, the NFL was going through some particular challenges in 2016, then Kansas City is still not the obvious choice for biased officiating. They should have favoured the LA Rams (in the second-largest sports media market, with star quarterback Jared Goff, the first overall pick in the 2016 NFL draft).
Barnes' et al.'s argument falls apart. Their TV viewership analysis does show that:
...the Chiefs’ emergence as a marquee team coincided with a material increase in viewership interest, consistent with the broader financial incentives we hypothesize.
However, that analysis also has issues, because they don't control for the win/loss record of the teams in each game (and winning teams likely attract more TV viewers). And, all it really tells you is that Patrick Mahomes attracts a big TV audience. He is a good player. That's what they do. Higher ratings for teams with star players is not evidence that referees are biased. As noted above, if the NFL thought that way, they should have preferred biasing the officiating towards the LA Rams instead, and Barnes et al.'s analysis shows that didn't happen.
As a final point, there is a real risk that the analysis in this paper gets causality backwards. Did the Chiefs get favourable referee calls because they are a dynasty, or did they become a dynasty because they received favourable referee calls at key moments? Barnes et al. never consider the possibility of reverse causality. Overall, the paper does much more to flatter an existing conspiracy theory than to seriously test it. Even if we take their estimates at face value, nothing in the paper convincingly links referee calls to incentives to increase NFL TV viewership.
[HT: Marginal Revolution]


No comments:
Post a Comment