In research, it is not uncommon for two different researchers, faced with the same research question and the same dataset, to come to very different conclusions about the answer to the research question. They may have simply used different research methods. Two researchers using the same methods
and the same dataset could similarly come to different conclusions, if they have, included different variables in their analyses, excluded or included different subgroups or outlying observations, and so on. It is less common for two researchers to look at the same analysis and come to very different conclusions (differences in interpretation are, however, fairly common).
So, with that in mind I was interested to read this week
new research by Adam Ward, Paul Bracewell, and Ying Cui (all from the Wellington-based
Dot Loves Data), published in the journal
Kotuitui: New Zealand Journal of Social Sciences Online (ungated). The paper generated a fair amount of media interest (see
here and
here), because it investigated the relationship between tavern locations and assaults. Those who are familiar with my research will recognise that this is one of my keen areas of research interest. Adam Ward actually sent me an early copy of this paper last year, but I have to admit I didn't have time to read it and that's a real shame, as will become clear below.
The authors used police data on assaults and Ministry of Justice data on the location of taverns for the 2016 calendar year. They essentially tested whether meshblocks (in urban areas, a meshblock is around the size of a city block, and they are bigger in rural areas) that had at least one assault differed from those that had no assaults, in terms of the distance to the nearest tavern, and the density of taverns within 500 metres. They tested this separately for 'peak assaults' (those occurring on Friday or Saturday nights) and 'off-peak assaults' (those occurring at other times). They also tested separately for differences between meshblocks that had multiple (more than one) assault and those that had one or fewer assaults.
One really cool thing they did, and which other studies have not done, is run a placebo test. They created a variable based on a sample of fast-food outlets, shopping centres, supermarkets, and petrol stations (which they called 'traffic generators'), and also used that variable in place of the tavern variables in some models, to see what happened. I had a Summer Research Scholarship student do something similar for Hamilton data a few years ago, using hairdressers, bakeries, service stations, and fast food outlets - hopefully, I'll find time to properly write up that research this year!
Anyway, coming back to the Ward et al. paper, they concluded that:
...our results show that whilst tavern density and proximity are more strongly associated to assault occurrence at peak times compared to traffic generator density and proximity, the reverse is true at off-peak times. It is not surprising that tavern density and proximity should be more strongly associated with assault occurrence at peak times compared to traffic generators given that the majority of the traffic generators within our sample are unlikely to open throughout peak hours.
They are suggest that their results show that taverns are associated with peak assaults more than 'traffic generators' (their placebo). However, here's their Table 2, with the key rows highlighted (you might need to zoom in to see it clearly):
The top two highlighted rows show the results for peak assaults (comparing meshblocks that had any assault in 2016 with those that had none). The top row (model 1) uses tavern variables, and the second row (model 2) uses 'traffic generator' variables. Comparing the two rows, the standardised coefficient for tavern proximity is clearly
much larger for taverns than it is for traffic generators (and the odds ratio is larger for taverns than for 'traffic generators'). This suggests that the distance to the closest traffic generator has a much greater effect on peak assaults than the distance to the closest tavern. Similarly, the standardised coefficient for tavern density is clearly smaller for taverns than it is for traffic generators (and the odds ratio is smaller for taverns than for 'traffic generators'). This suggests that the number of taverns within 500m has a smaller effect on peak assaults in a meshblock than the number of traffic generators within that distance. If you look at the second pair of highlighted rows (for multiple assaults), you'll notice the same conclusions can be reached. You've also probably noticed that these conclusions are the
opposite of Ward et al.'s conclusions from above - taverns
aren't associated with more peak assaults than traffic generators; they're associated with
fewer peak assaults.
What explains the difference? Ward et al. focus their attention on the last column: the Gini coefficient. In the context of a logistic regression model, the Gini coefficient is simply a measure of how good the
overall model is at classifying meshblocks, in this case classifying them into those meshblocks with assaults and those without assaults. You shouldn't interpret the Gini coefficient as telling you anything about the
individual variables in the model, in the same way that you can't infer anything about individual variables from looking at the R-squared in a linear regression model. Just because the overall model provides a better fit (according to the Gini coefficient), it doesn't mean that the size of the effects demonstrated by the coefficients in the model are larger (which you can only determine by looking at the coefficients or odds ratios)!
Aside from the conclusions, the choice of a logistic regression model is somewhat odd. It treats a meshblock that had two assaults in a year the same as a meshblock that had twenty assaults. A count-based (e.g. Poisson) model would have made more sense, and to be honest their results are probably driven by their choice of model. A Poisson model would have better accounted for the greater incidence of assaults (when there was more than one) in the vicinity of taverns, and Poisson models have become increasingly common in this literature (and are the appropriate models to use in this context for theoretical reasons, which are explained in an article I have written with Bill Cochrane and Michael Livingston that is nearly complete, and which I'll blog about in the future).
Having said that, the paper isn't all bad. As I said above, I liked the placebo approach. Also, my quote of the day (which I am sure to refer to in future work) is:
...31% of peak time assaults occur within 100 m of a tavern whereas this geometric region constitutes only 2.3% of New Zealand’s land mass and contains only 6.3% of the population.
That's a conclusion I can't disagree with.