Saturday 24 November 2018

The debate over a well-cited article on online piracy

Recorded music on CDs and recorded music as digital files are substitute goods. So, when online music piracy was at its height in the 2000s, it is natural to expect that there would be some negative impact on recorded music sales. For many years, I discussed this with my ECON110 (now ECONS102) class. However, in the background, one of the most famous research articles on the topic actually found that there was essentially no statistically significant effect of online piracy on music sales.

That 2007 article was written by Felix Oberholzer-Gee (Harvard) and Koleman Strumpf (Kansas University), and published in the Journal of Political Economy (one of the Top Five journals I blogged about last week; ungated earlier version here). Oberholzer-Gee and Strumpf used 17 weeks of data from two file-sharing servers, matched to U.S. album sales. The key issue with any analysis like this is:
...the popularity of an album is likely to drive both file sharing and sales, implying that the parameter of interest γ will be estimated with a positive bias. The album fixed effects vi control for some aspects of popularity, but only imperfectly so because the popularity of many releases in our sample changes quite dramatically during the study period.
The standard approach for economists in this situation is to use instrumental variables (which I have discussed here). Essentially, this involves finding some variable that is expected to be related to U.S. file sharing, but shouldn’t plausibly have a direct effect on album sales in the U.S. Oberholzer-Gee and Strumpf use school holidays in Germany. Their argument is that:
German users provide about one out of every six U.S. downloads, making Germany the most important foreign supplier of songs... German school vacations produce an increase in the supply of files and make it easier for U.S. users to download music.
They then find that:
...file sharing has had only a limited effect on record sales. After we instrument for downloads, the estimated effect of file sharing on sales is not statistically distinguishable from zero. The economic effect of the point estimates is also small.... we can reject the hypothesis that file sharing cost the industry more than 24.1 million albums annually (3 percent of sales and less than one-third of the observed decline in 2002).
Surprisingly, this 2007 article has been a recent target for criticism (although, to be fair, it was also a target for criticism at the time it was published). Stan Liebowitz (University of Texas at Dallas) wrote a strongly worded critique, which was published in the open access Econ Journal Watch in September 2016. Liebowitz criticises the 2007 paper for a number of things, not least of which is the choice of instrument. It is worth quoting from Liebowitz's introduction at length:
First, I demonstrate that the OS measurement of piracy—derived from their never-released dataset—appears to be of dubious quality since the aggregated weekly numbers vary by implausibly large amounts not found in other measures of piracy and are inconsistent with consumer behavior in related markets. Second, the average value of NGSV (German K–12 students on vacation) reported by OS is shown to be mismeasured by a factor of four, making its use in the later econometrics highly suspicious. Relatedly, the coefficient on NGSV in their first-stage regression is shown to be too large to possibly be correct: Its size implies that American piracy is effectively dominated by German school holidays, which is a rather farfetched proposition. Then, I demonstrate that the aggregate relationship between German school holidays and American downloading (as measured by OS) has the opposite sign of the one hypothesized by OS and supposedly supported by their implausibly large first-stage regression results.
After pointing out these questionable results, I examine OS’s chosen method. A detailed factual analysis of the impact of German school holidays on German files available to Americans leads to the conclusion that the extra files available to Americans from German school holidays made up less than two-tenths of one percent of all files available to Americans. This result means that it is essentially impossible for the impact of German school holidays to rise above the background noise in any regression analysis of American piracy.
I leave it to you to read the full critique, if you are interested. Oberholzer-Gee and Strumpf were invited to reply in Econ Journal Watch. However, instead they published a response in the journal Information Economics and Policy (sorry, I don't see an ungated version online) the following year.  However, the response is a great example of how not to respond to a critique of your research. They essentially ignored the key elements of Liebowitz's critique, and he responded in Econ Journal Watch again in the May 2017 issue:
Comparing their IEP article to my original EJW article reveals that their IEP article often did not respond to my actual criticisms but instead responded, in a cursorily plausible manner, to straw men of their own creation. Further, they made numerous factual assertions that are clearly refuted by the data, when tested.
In the latest critique, Liebowitz notes an additional possible error in Oberholzer-Gee and Strumpf's data. It seems to me that the data error is unlikely (it is more likely that the figure that represents the data is wrong), but since they haven't made their data available to anyone, it is impossible to know either way.

Overall, this debate is a lesson in two things. First, it demonstrates how not to respond to reasonable criticism - that is, by avoiding the real questions and answering some straw man arguments instead. Related to that is making your data available. Restricting access to the data (except in cases where the data are protected by confidentiality requirements) makes it seem as if you have something to hide! In this case, the raw data might have been confidential, but the weekly data used in the analysis are derivative and may not be. Second, as Leibowitz notes in his first critique, most journal editors are simply not interested in publishing comments on articles published in their journal, where the comments might draw attention to flaws in the original articles. I've struck that myself with Applied Economics, and ended up writing a shortened version of a comment on this blog instead (see here). It isn't always the case though, and I had a comment published in Education Sciences a couple of months ago. The obstructiveness of authors and journal editors to debate on published articles is a serious flaw in the current peer reviewed research system.

In the case of Oberholzer-Gee and Strumpf's online piracy article, I think it needs to be seriously down-weighted. At least until they are willing to allow their data and results to be carefully scrutinised.

1 comment:

  1. Great write up and interesting read.

    While I was reading the original Oberholzer-Gee and Strumpf article I presumed that the article was well written without flaw. However, after looking at the critiques and the author's response it seems that the mere fact of not releasing the data and failing to validly reply to the criticism was a bit of a red flag that something odd was going on.

    ReplyDelete