Saturday, 26 June 2021

Wikipedia could be good for research

Wikipedia may be the most maligned resource of all among academics. It has a reputation for inaccuracy that was earned during its early days, and that reputation continues to encourage scepticism over its value as a research tool. Despite that reputation and scepticism, it is clearly one of the most widely used tools by students, if not by faculty. And the quality and comprehensiveness of articles have improved greatly over time. So, I was interested to read this 2017 paper by Neil Thompson (MIT) and Douglas Hanley (University of Pittsburgh), which looks at the impact of Wikipedia on research.

Thompson and Hanley did two things. First, they used a 'big data' approach, looking at semantic word usage on Wikipedia, and whether subsequent word usage in academic articles changed to match the Wikipedia entries. Specifically, they used:

...a full edit-history of Wikipedia (20 terabytes) and full-text versions of every article from 1995 onward from more than 5,000 Elsevier academic journals (0.6 terabytes). This allows us to look at the addition of any Wikipedia article and to ask if afterwards the prose in the scientific literature echoes the Wikipedia article’s. The advantage of this approach is that we can look very broadly across Wikipedia articles.

Thompson and Hanley look at two fields: (1) chemistry; and (2) econometrics. They quickly report that the number of reads of econometrics articles on Wikipedia is far too small to generate significant effects, and bury those results in an appendix to the paper (but if you look at them, they are broadly consistent with the results from chemistry). They focus their main conclusions on the chemistry results, looking at a six-month window before, and a six-month window after, the creation of a Wikipedia article. They find that:

The positive and highly statistically significant coefficient on “After” in the regression confirms that articles published afterwards are indeed more similar.

The regression estimates are quite meaningless without more context, so they compare the effect of a Wikipedia article with the effect of a review article published in a journal, and find that:

...a Wikipedia article’s effect is roughly half as large as that of a review article. 

That's a reasonably large effect. However, their big data approach demonstrates correlation though, not causation. So, in the second part of the paper, Thompson and Hanley report on a randomised controlled trial, where they created new Wikipedia articles, and tested their effects on subsequent word usage in academic articles. As they describe:

To establish the causal impact of Wikipedia, we performed an experiment. We commissioned subject matter experts to create new Wikipedia articles on scientific topics not covered in Wikipedia. These newly-created articles were randomized, with half being added to Wikipedia and half being held back as a control group... If Wikipedia shapes the scientific literature, then the “treatment” articles should have a bigger effect on the scientific literature than the “control” articles.

Comparing their treatment and control articles, they find that:

...the scientific content from the articles we upload to Wikipedia makes its way into the scientific literature more than the content from control articles that we don’t upload...and these effects are large.

How large? They provide a back-of-the-envelope calculation that implies that:

...41 Elsevier journal articles in Chemistry were affected. If we then scale this up to account for Elsevier’s share of the journal market... then we would estimate each Wikipedia article is influencing ~ 250 scientific articles (to some extent).

They then go on to explore the results in more detail, by looking at which sections of academic articles are affected, and report that:

...there is a statistically significant effect in all sections except the abstract. The size and statistical significance is weakest in the Methods section and strongest in the Introduction. This suggests that our Wikipedia articles are having their largest effect on the contextualization of science and the connections that the authors are making to the rest of the field.

Finally, and interestingly, they find that the academic literature that is referenced on Wikipedia earns 91% more citations. This last point suggests that academics should be more active as editors on Wikipedia, not least to get their own articles referenced there and increase their citation count!

Overall, the results suggest that Wikipedia has a significant effect on the way that the academic literature is synthesised and interpreted, similar to the effect that review articles have. If that is the case, then it again suggests that academics can have a positive influence on their discipline by ensuring that Wikipedia articles accurately reflect the current state of knowledge.

[HT: Ethan Mollick on Twitter, via Marginal Revolution]

No comments:

Post a Comment