- The model is opaque, or invisible - people whose data are included in the model (let's call them 'subjects') don't know how the model functions, or understand how the inputs are converted into outputs;
- The model is unfair, or works against the subject's best interests; and
- The model is scalable - it can grow exponentially.
The marketing of these universities is a far cry from the early promise of the Internet as a great equalizing and democratizing force. If it was true during the early dot-com days that "nobody knows you're a dog," it's the exact opposite today. We are ranked, categorized, and scored in hundreds of models, on the basis of our revealed preferences and patterns. This establishes a powerful basis for legitimate ad campaigns, but it also fuels their predatory cousins: ads that pinpoint people in great need and sell them false or overpriced promises. They find inequality and feast on it. The result is that they perpetuate our existing social stratification, with all of its injustices.One of the big takeaways for me was how often models are constructed and used on insufficient data. O'Neill gives the example of teacher value-added, which works well in aggregate but when you consider it at the level of an individual teacher, their annual 'value-added' is based on perhaps 25 data points (the students in their class). Hardly a robust basis for making life-changing hiring and firing decisions.
However, there are also parts of the book that I disagree with. In the chapter on insurance, O'Neill argues against using models to predict risk profiles for the insured (so that those of higher-risk would pay higher premiums than those of lower-risk). O'Neill argues that this is unfair, since some people who are really low-risk end up grouped with higher-risk people and pay premiums that are too high (e.g. think of a really-careful young male driver, if there is such a person). O'Neill could do with a greater understanding of the problems of asymmetric information, adverse selection, and signalling. Having larger pools of insured people that include both low-risk and high-risk people paying the same premium leads to higher premiums. Essentially the low-risk people are subsidising the high-risk people (how is that not more unfair?). The low-risk people drop out of the market, leaving only the high-risk people behind, who are not profitable for insurers to insure. The insurance market could eventually collapse (for a more detailed explanation, see this post). The insurer's models are a way of screening the insured, revealing their private information about how risky to insure they are.
Despite that, I really enjoyed the book. The chapter on voting and micro-targeting of political campaigns in particular was very enlightening - particularly given more recent events such as the 2016 U.S. Presidential election and the Brexit vote. The sections on the teacher value-added models have caused me to re-evaluate the strengths and weaknesses of those models.
Finally, O'Neill argues for a more ethical use of models. This isn't so far from what Emmanuel Derman argues in his book "Models. Behaving. Badly." (which I reviewed here back in 2013), and is a cause well worth supporting.