Thursday, 28 April 2016

Big brother is watching your drunken tweets, and using them for research into alcohol use

Around Easter, I spent the last bit of my study leave at La Trobe University in Melbourne, working with Michael Livingston. We've been working on the relationship between alcohol outlet density and crime for a number of years, but in this case we were looking at developing a unified theory of the relationship. The necessity for a unified theory arises because there are a number of competing (but, as it turns out, quite complementary) theories for why having more alcohol outlets in an area would lead to more violence (I'll post more on this later).

One of the earliest theories for this relationship is called 'availability theory'. The theory is reasonably simple, but is underlined by a lot of stuff we teach in introductory economics. If you have more alcohol outlets in an area, then the 'full cost' of alcohol falls, and because alcohol is now less costly people will drink more. And when people drink more, more bad things (violence, property damage, accidents, etc.) happen.

When we say the 'full cost' of alcohol, we mean not only the price of the alcohol itself, but also the cost of travelling to and from the location of purchase. So, there are two mechanisms where having more outlets in an area leads to a lower 'full cost' of alcohol. First, having more outlets probably means that people have to travel less far to obtain alcohol, so the travel cost is less. Second, having more outlets probably means an increase in competition between outlets, and we know that competition tends to result in lower prices.

Unfortunately, the empirical support for availability theory is pretty patchy. Some studies find that greater alcohol outlet density is associated with more consumption of alcohol, while others find no effect (e.g. see here or here). Added to that, unpublished work that Bill Cochrane and I have done shows that having more outlets in an area is not associated with lower pricing (using cross-sectional data - we've been collecting longitudinal data now for a number of years, so it's getting time for us to revisit the analysis with better data). Which is why other theories explaining why alcohol outlet density is associated with violence (for example) have arisen.

Which brings me to this recent paper by Nabil Hossain, Tianran Hu, Roghayeh Feizi, Ann Marie White, Jiebo Luo, and Henry Kautz (all from University of Rochester). The authors use a machine learning algorithm to identify drunk tweets. MIT Technology Review explains:
The team began by collecting geotagged tweets sent during the year up to July 2014 from New York City and from Monroe County on the northern border of the state, which includes the city of Rochester. From this set, they filter all the tweets that mention alcohol or alcohol-related words, such as drunk, beer, party, and so on.
They then used workers on Amazon’s Mechanical Turk crowdsourcing service to analyze the tweets in more detail. For each tweet, they asked three Turkers to decide whether the message referred to alcohol and if so whether it referred to the tweeter drinking alcohol. Finally, they asked whether the tweet was sent at the same time the tweeter was imbibing.
They then used the geolocated Twitter data and asked the mTurk volunteers to identify tweets that were sent from home. They then used that data to train a machine learning algorithm to identify other tweets that were sent from home. Now, armed with a dataset of users' homes (to within about 100m) and their tweets while drinking, the researchers are able to establish a correlation between drinking and alcohol outlet density. Again, MIT Technology Review explains:
...Hossain and co point out that a higher proportion of tweets in New York City are associated with alcohol than in Monroe County. “One possible explanation is that a crowded city such as NYC with highly dense alcohol outlets and many people socializing is likely to have a higher rate of drinking,” they say.
What’s more, the geolocation data reveals that a higher proportion of people drink at home (or within 100 meters of home) in New York City than in Monroe County, where a high proportion of people drink further than a kilometer from home...
They also found a correlation between the density of alcohol outlets in a region and the number of tweets indicating that somebody is drinking now.
The latter result is of course correlation, not necessarily causation. Perhaps having more alcohol outlets in an area causes people to drink more, but on the other hand perhaps people who drink more are more likely to choose to live where there is readier access to alcohol. However, it is an interesting approach, and worth following up, perhaps if an appropriate instrument for alcohol outlet density can be found (to try and overcome the issue of potential reverse causality). Maybe availability theory isn't dead after all?

[HT: Marginal Revolution]

No comments:

Post a Comment