Thursday, 31 March 2022

Bye bye America's Cup, and the winner's curse that goes with it

Local sports fans might not be too happy, but fans of sensible economic decisions should be happy about Team Mercenary's decision to host the next America's Cup regatta in Barcelona. The New Zealand government and Auckland Council had pledged $99 million towards the hosting, but apparently that wasn't enough. As the New Zealand Herald reported yesterday:

The yachting syndicate today announced Barcelona officially as the next venue of the America's Cup, replacing Auckland of hosting duties for the 2024 event.

Emirates Team New Zealand and the Royal New Zealand Yacht Squadron made the announcement overnight, confirming the 37th running of the event to be held in September and October of 2024.

Team New Zealand successfully defended the Auld Mug last year with a 7-3 victory over Luna Rossa and then rejected a $99 million bid from the New Zealand Government and Auckland Council to host the next America's Cup, meaning the event looked likely to be hosted abroad.

This came after the Government had invested $136.5m in the previous defence, alongside $250m from Auckland Council.

A cost-benefit report last year found the New Zealand economy was left $293 million worse off.

The last sentence is the most damning, and the reason why we should be glad to see the end of the taxpayer subsidies (although there are multiple reports that disagree over the measurement of the costs and benefits of the last event). Now, there is good reason to be sceptical of economic impact studies generally, and studies of the America's Cup are no better than most (see here and here). However, the bias in such studies is almost always upwards. So, it is likely that the last running of the America's Cup left New Zealand a lot worse off than just $293 million out of pocket. That's a pretty expensive party, but it's certainly not going to drive economic growth. That money is better spent elsewhere.

It could have gone much worse, if the government had decided to engage in an extensive bidding war over the rights to host the regatta. Since the real benefit of hosting is largely unknown, in a bidding war the highest bidder is likely to be whichever bidder most over-estimates the benefits (since that largely determines what they are willing to bid). This is what economists refer to as the winner's curse. Presumably, Barcelona will either genuinely benefit to a greater extent than New Zealand would have from hosting (and therefore offered more money to Team Mercenary), or Barcelona over-estimated the benefits by more than New Zealand did (and therefore offered more money to Team Mercenary), or both. Probably both. Regardless, whether hosting the America's Cup is worthwhile is not something New Zealand need worry about for a while at least. That should provide some consolation at least for losing the big party.

Wednesday, 30 March 2022

The bias against research on gender bias?

There is a large (and growing) literature on gender bias and the gender gap. Regular readers of this blog will no doubt have noted that it is a recurring theme. In fact, there is so much research on gender bias that it is hard to believe that there could be a bias against such research. Nevertheless, that is the conclusion of this 2018 article by Aleksandra Cislak (Nicolaus Copernicus University), Magdalena Formanowicz (University of Bern), and Tamar Saguy (Interdisciplinary Center Herzliya), published in the journal Scientometrics (open access).

Cislak et al. collated data on publications listed in PsycINFO and PsycARTICLES over the period from 2008 to 2015, on gender bias or racial bias. After removing irrelevant articles and duplicates, their analysis was based on slightly more than 1000 articles over that time. They then compared articles on gender bias with articles on racial bias, in terms of prestige. Prestige was measured in two ways: (1) by the impact factor of the journal they were published in, for the year of publication); and (2) by whether the research had been funded. They found that:

...research on gender bias was funded less often (B = - .20; SE = .09; p = .02) and published in lower Impact Factor journals (B = - .67; SE = .20; p = .001).

So, research on gender bias appears from this study to have attracted less prestige than research on racial bias. Cislak et al. take that as evidence that there is bias against research on gender bias. However, there is good reason to doubt their conclusion. It relies on an assumption that, in the absence of any bias, there would be no difference in the impact factor or funding between research on gender bias and research on racial bias. That assumption strikes me as difficult to support.

In order for there to be bias, the ideal experiment would be two otherwise identical groups of articles, one group on gender bias, and one group on racial bias, submitted to the same journals. That would hold constant the quality of the articles, the quality of the journals, general editorial policies and practices, authorship, authors' incentives (for writing long comprehensive articles, or shorter articles on sub-topics), and article context (other than gender or racial bias). Differences in acceptance rates between these two groups of articles might be taken as evidence of bias.

Instead, we have an observational study based on different articles published by different journals. It tells us almost nothing about the editorial process from submission to publication. We really have no idea if there observed difference arises because of bias, or because of differences in article quality or something else. In fact, the data could even be consistent with bias in favour of articles on gender bias, if the acceptance rate of submitted gender bias articles was higher than the acceptance rate of submitted racial bias articles of otherwise similar quality and other attributes. Maybe the bar for acceptance of an article on racial bias is set higher than the bar for acceptance of an article on gender bias? Cislak et al. engage in some hand-waving about the quality of the research being the same because of the use of "similar methods and paradigms". However, that's not a very convincing argument, and they don't actually control for research quality (noting whether articles use quantitative or qualitative methods is not a control for research quality).

Similarly, if the source of funding is not held constant between gender bias and racial bias, it tells us little about bias (unless we are considering bias in the availability of funding, which Cislak et al. probably argue they are getting at). Nevertheless, without knowing the rate of acceptance of funding applications for gender bias and racial bias, there is no reason to believe that more articles being funded is evidence of bias in either direction. 

In short, this research is unconvincing. Show me an audit study on this topic, and I'd likely give it more weight. But this observational research simply doesn't cut it.

Tuesday, 29 March 2022

Cohort effects and the downturn in US fertility since the Great Recession

There is an excellent new article in the Journal of Economic Perspectives (open access, with a less technical summary here) by Melissa Kearney (University of Maryland), Phillip Levine (Wellesley College), and Luke Pardue (University of Maryland) on fertility rates in the U.S. In particular, the article looks to explain this puzzle (from their Figure 1):

The figure tracks the period fertility rate - the number of births per 1000 women of childbearing age (15-44 years) in each year. Notice that the trend is reasonably flat from 1980 to 2007, and then the trend turns sharply downwards after that. Recessions are known to generate short-run decreases in fertility, and you can see the recessions in the early 1980s and 1990-91 quite readily. But after those recessions, the birth rate climbs back up to the trend. Not so with the Great Recession, with birth rates continuing downwards for more than a decade, despite the end of the recessionary period.

Kearney et al. do an excellent job of unpacking the available evidence on how U.S. birth rates have changed. They start by looking at different demographic groups, and find a dramatic decline in births to teenage mothers. However, that change predated the Great Recession, with teen births trending downwards since the early 1990s. At other ages, since 2007 there has been a decline in births to mothers in their 20s, and a slowing of the previous increase in the number of births to older mothers. Decomposing the change in births over time by demographic group (age, race, and education level), Kearney et al. find that:

...changing birth rates within demographic groups is responsible for the declining birth rate since 2007, not changing population shares. From 2007 to 2019, the birth rate declined by 10.8 births per 1,000 women 15 to 44 (from 69.1 to 58.3)... Across all groups, had birth rates been constant and only population shares shifted between 2007 and 2019, the birth rate would, in fact, have risen by 2.6 births per thousand. On the other hand, if population shares were held constant and only within-group birth rates moved over that period (the change captured by the first term), the overall birth rate would have fallen by 12.8 births per 1,000 women...

The three teen categories by race/ethnicity explain 37 percent of the overall decline. Hispanic teens contributed the largest share, explaining 14 percent of the overall decline; their birth rate fell dramatically, from 82.2 to 24.7 over the period.

Other demographic groups with smaller declines in their birth rate also contributed extensively to the overall decline because of their relatively large population shares. For instance, the third-largest contributing group is White women between the ages of 25 and 29 with college degrees; their birth rate fell from 101.1 to 65.1, accounting for 11.9 percent of the overall decline.

So, again, the biggest contributors to the decline in births has been a decline in births to younger women, in their teens and 20s. So, what has caused that change? Kearney et al. look at a variety of policy and economic variables, and find that:

...when we sum the estimated coefficients on our ten economic/policy variables with their average change between 2007 and 2018, their combined effect is 6.2 percent of the total decline in the birth rate from 69.1 to 58.3 births per 1,000 women age 15 to 34 between 2007 and 2018.

What does that leave? Well, up to this point in the paper, I'd been silently yelling "It's a cohort effect!" And I wasn't to be disappointed, because that's where Kearney et al. turned next. The results are neatly summarised in their Figure 5:

The figure tracks the average number of births for women at different ages, grouped by five-year birth cohort. So, each line tracks all women born in a cohort. The first three cohorts (women born in 1968-72, 1973-77, and 1977-82) are pretty similar. But then things change dramatically. The cohort of women born in 1983-87 had less births at each age than the earlier cohorts. The cohort of women born in 1988-92 had less still, and the cohort of women born in 1993-97 have had even less. Now, given that these more recent cohorts of women have not completed their fertility (they could yet have more babies), it is possible that there will be some catch-up. But, as Kearney et al. note:

...the number of births they would have to have at older ages to catch up to the lifetime childbearing rates of earlier cohorts is so large that it seems unlikely they will do so.

So, what has been causing this generational change in fertility behaviour? Here, Kearney et al. become more speculative (which is the best we can do at this stage), and refer to the 'second demographic transition':

The theory of the second demographic transition highlights instead an overall shift to a greater emphasis on individual autonomy, with a corresponding de-emphasis on marriage and parenthood. The specific manifestations of this shift are taken to include a decoupling of marriage and childbearing, a change in the relationship between education and childbearing, a rise in childlessness, and the establishment of a two-child norm for those having children.

Specifically, Kearney et al. note that, although there was no abrupt change exactly in 2007:

...women who grew up in the 1990s were the daughters of the 1970s generation and women who grew up in the 1970s and 1980s were daughters of the 1950s and 1960s generation. It seems plausible that these more recent cohorts of women were likely to be raised with stronger expectations of having life pursuits outside their roles as wives and mothers. It also seems likely that the cohorts of young adults who grew up primarily in the 1990s or later - and reached prime childbearing years around and post 2007 - experienced more intensive parenting from their own parents than those who grew up primarily in the 1970s and 1980s. They would have a different idea about what parenting involves. We speculate that these differences in formed aspirations and childhood experiences could potentially explain why more recent cohorts of young women are having fewer children than previous cohorts.

It doesn't quite answer the question of why this sudden change occurred in 2007, so it's a little unsatisfying as an ending to the paper. However, this is a very thorough piece of work, and had me wondering about what the cohort effects look like for New Zealand. That might make an interesting research project for a suitably motivated Honours student.


Monday, 28 March 2022

If you work out how to beat the bookies, the bookies strike back

There is a famous quote in gambling circles, "if you bet the Super Bowl, you are a losing player". Most, if not all, regular gamblers probably believe they have some system that allows them to beat the bookies. [*] In reality, close to none of them do. That's because the bookmakers have pretty good models that make them fairly accurate at predicting the odds of various outcomes (if they didn't, then they wouldn't remain profitable bookies for long!).

So, the sports odds we observe in the real world pretty accurately reflect the underlying probabilities (this is a much better example of the efficient markets hypothesis than share markets or asset prices!). However, not all bookies offer the same odds, since the bookies try to balance the financial risk that they face from any particular outcome. So, a bookie that receives a lot of bets on the underdog will adjust the underdog's odds downwards (and the favourite's odds upwards), in order to maintain some balance.

Is it possible to exploit those differences in different bookmakers' odds on the same event in order to make money? True arbitrage opportunities are rare, as that would require one bookie to have one outcome favoured, while at the same time another bookies has a different outcome favoured. However, it may nevertheless be possible to take advantage of the difference in odds. The method is explained in this 2017 paper by Lisandro Kaunitz (University of Tokyo), Shenjun Zhong (Monash University), and Javier Kreiner (CargoX). As they explain:

Our betting system differed from previous betting strategies in that, instead of trying to build a model to compete with bookmakers’ forecasting expertise, we used their publicly available odds as a proxy of the true probability of a game outcome. With these proxies we searched for mispricing opportunities, i.e., games with odds offered above the estimated fair value...

The 'fair value' of a bet was based on the average odds of at least three bookies (Kaunitz et al. followed the odds of 32 different bookmakers for domestic and international football (soccer) games. Their success was remarkable:

Our strategy returned sustained profits over years of simulated betting with historical data, and months of paper trading and betting with actual money...

Specifically, the simulated betting:

...reached an accuracy of 44.4% and yielded a 3.5% return over the analysis period. For example, for an imaginary stake of $50 per bet, this corresponds to an equivalent profit of $98,865 across 56,435 bets...

The paper trading:

...obtained an accuracy of 44.4% and a return of 5.5%, earning $1,128.50 across 407 bets for the case of $50 bets...

And finally, betting with real money:

...obtained an accuracy of 47.% and a profit of $957.50 across 265 bets, equivalent to a 8.5% return...

But, just when Kaunitz et al. were probably starting to plan their semi-retirement on the Costa Azul, the bookies struck back:

Although we played according to the sports betting industry rules, a few months after we began to place bets with actual money bookmakers started to severely limit our accounts. We had some of our bets limited in the stake amount we could lay and bookmakers sometimes required “manual inspection” of our wagers before accepting them. In most cases, bookmakers denied us the opportunity to bet or suggested a value lower than our fixed bet of $50...

Kaunitz et al. conclude that the whole system is rigged. They found a way to exploit the imperfections in bookmakers' odds, but that only succeeded in drawing attention to them. The bookies only want losing players to play.

[HT; Marginal Revolution, here (for the Dilan Esper Twitter thread), and here (for the Kaunitz et al. paper)]


[*] I admit that I am not immune to this effect. Back when I was a PhD student, I had an extremely successful run betting on NHL games, making hundreds of dollars in winnings over a few months. I thought I had a good system. However, the betting strategy didn't transfer to NBA games, where I lost all of my NHL winnings and more in the space of a few weeks. I haven't gambled on sports since.

Sunday, 27 March 2022

The importance of writing quality for economists

As researchers, we hope that the quality of our research shines through when we make a presentation or submit an article for publication. However, how the quality of our research is perceived depends on the quality of our communication. I've been to a number of seminars and conference presentations where the underlying research might be good, but the presentation was so bad it was difficult to tell. And similar for research articles. As a journal reviewer, if I can't understand what has been done, then I'm much more likely to recommend that a submission be rejected.

So, quality of communication is important in research. But how important is it? That's the underlying question behind this new working paper by Jan Feld (Victoria University of Wellington), Corinna Lines, and Libby Ross (both plain language consultants at Write Limited). They undertook an experimental evaluation of writing quality of articles written by economics PhD students. As they explain:

To find the causal effect of academic writing, we need to compare well-written papers with poorly written papers that are otherwise identical. This is what we do in this study.

We estimate the causal effect of writing quality by comparing how experts judge the quality of 30 papers originally written by PhD students in economics. We had two versions of each paper: one original and one that had been language-edited. The language editing was done by two professional editors, who aimed to make the papers easier to read and understand. We then asked 18 writing experts and 30 economists to judge some of the original and edited papers. Each of these experts judged five papers in their original versions and five papers in their edited version, spending around 5 minutes per paper. None of the experts saw both versions of the same paper. None of the experts knew that some of the papers were edited. The writing experts judged the writing quality and the economists judged the academic quality of the papers.

Feld et al. emailed PhD students and their supervisors at all eight New Zealand universities and invited them to participate. I honestly don't remember this invitation, but it was in the middle of pandemic-induced online teaching, and so many things just passed me by. Anyway, they got a sample of 30 papers from 22 PhD students. The effect of the writing editing is substantial when the papers are evaluated by writing experts:

Writing experts judged the edited papers as 0.6 standard deviations (SD) better written overall (1.22 points on an 11-point scale). They further judged the language-edited papers as allowing the reader to find the key message more easily (0.58 SD), having fewer mistakes (0.67 SD), being easier to read (0.53 SD), and being more concise (0.50 SD).

Feld et al. asked 30 Australian economists to evaluate the New Zealand PhD students' papers (so I didn't miss an invite to be a reviewer, at least!). The economists were less swayed by writing quality than the writing experts were though:

Economists evaluated the edited versions as being 0.2 SD better overall (0.4 points on an 11-point scale). They were also 8.4 percentage points more likely to accept the paper for a conference, and were 4.1 percentage points more likely to believe that the paper would get published in a good economics journal.

Nevertheless, those results are statistically significant. Feld et al. also found statistically significant decrease in economists' view of the probability that they would desk-reject the paper, and a marginally significant increase in the perception of the writing quality (as opposed to the paper quality, in the above quote). And in case you were worried that these results are purely subjective, Feld et al. include an objective measure (the Flesch-Kincaid readability score), and find that:

The language editing also affected the readability as measured by the Flesch-Kincaid grade level score. The introductions of edited papers have a readability score corresponding to grade level 14.7, compared to 15.3 of the introductions of original papers. This improvement of 0.6 grade-levels is statistically significant at the 1 percent level. For comparison, our introduction has a Flesch-Kincaid grade level score of 12.5.

As a matter of interest, the first paragraph of this post rates a grade level score of 10.8 (according to this online calculator). The whole post (excluding the indented quotations from the paper) rates a grade level score of 11.5.

Importantly, when Feld et al. stratify their analysis into well-written (top half of the original unedited papers, as assessed by the writing experts) and poorly-written papers (bottom half of the original unedited papers), the results are what you would expect. For poorly written papers:

All point estimates are statistically significant and large. For example, poorly written papers that have been language edited are judged 1.05 SD better written than poorly written papers in their original version.

That was the effect on the writing experts' judgment of quality. The economists rated the quality of the edited papers 0.29 standard deviations higher than the unedited papers (and 0.4 standard deviations higher in terms of writing quality). There were no statistically significant effects of the editing on the economists' evaluations of well-written papers.

So, writing quality matters. But my takeaway from the paper is that writing quality matters differently depending on the audience. If economists are trying to appeal to non-specialists, then writing quality matters a lot, and it matters both for high-quality and low-quality research. There is potentially a strong case to be made for economists (and no doubt, other researchers as well) to train in science communication. If economists can only speak to other economists, then the impact of our research on policy and practice is going to be somewhat limited.

On the other hand, if economists are trying to appeal only to other economists (rather than to professional writers or other non-specialists), then only those below-average writing quality seem to benefit from professional editing. This is an important finding for PhD students (who were the study population in this study). PhD students are typically trying to get their papers accepted for publication in economics journals, and will likely have economists as their thesis examiners. In those contexts, professionally edited writing is only of benefit for low-quality papers. For high-quality papers, the quality of the research is more apparent to economist readers. However, in the New Zealand context, so many of our PhD students are international students, with English as a second (or third or fourth) language.

It seems to me that there are definitely gains to be had for many students in having their PhD chapters (or articles) professionally edited. The challenge may be in whether the PhD regulations allow it. For instance, the University of Waikato guidelines for proof-reading of theses suggest that professional editing is not allowed. It seems that, if any editing is to be done, it falls on the academic supervisors to do. Perhaps these guidelines need reconsideration, given the potential gains in clarity of argument to be had, with no changes in substantive content (after all, the editors in this case were not academic experts). Students (and thesis examiners) would potentially benefit greatly from this.

[HT: Marginal Revolution]

Friday, 25 March 2022

Supermarkets are not natural monopolies, and should NOT be regulated as public utilities

In The Conversation last week, Robert Hamlin (University of Otago) wrote:

The Commerce Commission’s report into New Zealand’s supermarket sector has been criticised for not going far enough to reduce food prices, but the answer to the current duopoly might lie in treating the sector as a public utility instead of a private industry...

This fairer supermarket sector could be achieved if the industry power players were governed as regulated public utilities, much like power and water. But such an approach would need to be legislated and has to combine simplicity with easy and effective enforcement.

To do this, the government should implement some key regulatory principles.

New regulations would need to ensure supermarkets do not engage in wholesale or manufacturing activity. The key to supermarket power is their control of the retail point of sale. If supermarkets are to be regulated as public utilities, then it is essential they are restricted solely to this activity.

The problem with Hamlin's argument is that he equates supermarkets with other utilities like power and water. However, supermarkets and public utilities like power and water differ in two fundamental ways. First, power and water are natural monopolies. They have large up-front costs, and then the marginal costs of production and distribution are fairly low. Natural monopolies are a tricky problem for governments, because as I noted in this earlier post, if the government regulates them such that total welfare is maximised, the natural monopolies make a loss and reduce investment and service quality, and may even shut down entirely. The second-best solution here is to regulate the natural monopoly, but not to such an extent that it makes economic losses. That is essentially what Hamlin is arguing for, when he writes:

As public utilities, individual supermarket sites should only be allowed to charge a single fixed and publicly stated margin on the goods they sell. This is a novel requirement, but it is core to the process of regulating a supermarket as a utility.

Supermarkets act as a middleman between consumers and producers. The mutual ignorance of what is happening on the other side of the retail barrier allows the supermarkets to manipulate consumers and suppliers at will. It is the key process that converts supermarket power to profit.

The requirement that supermarkets must apply a single, publicly posted margin to all the products in their store sets this capacity to zero, and promptly makes the retailer a fully transparent channel for suppliers and consumers.

This 'publicly-posted margin' is essentially what economists refer to as 'cost-plus' regulation. As Eric Crampton noted earlier this week, that solution is totally impractical, because of the second difference between public utilities and supermarkets: supermarkets sell many products. Public utilities typically sell one product, e.g. water, or electricity. That makes it relatively straightforward to impose a 'cost-plus' pricing regulation, since there is only one product to calculate this cost-plus margin for. working out the costs is not straightforward because of the mixed of fixed and variable costs, depreciation and other things. But that process is even more difficult when a firm sells many products. As Crampton wrote:

 A big part of the fight at ComCom was around calculating rates of return. How capital costs get treated matters. How land costs under the supermarkets are counted matters. There are piles of complex lease agreements around those that need to be worked through, and would themselves be endogenous to whatever stupid rule you set to regulate rates of return. 

It isn't straightforward. 

And then this guy wants to run it product-by-product as some kind of mark-up regulation with a fixed mark-up on each good? How's that going to work? Different goods have different turnover. A foot of shelf-space that turns over three times a day pays for itself differently than a foot of shelf-space that turns over once every three days. 

The policing of this kind of thing would be impossible. If you force a single markup on all products based on the price at which the retailer bought it, you force slow-moving goods off the shelves. If you allow some measure of the cost of shelf-space to enter in, you're going to be chasing your tail forever in policing it. It's just so impossibly stupid.

However, it's not just heterogeneity of products that make this proposal unworkable. It's the heterogeneity of supermarkets as well. Public utilities are easy to regulate, because there is few of them (they are natural monopolies, after all). But supermarkets are actually quite diverse. There is about a hundred-fold difference in turnover between a corner dairy and a large urban supermarket. Even within the supermarket category (i.e. excluding dairies), I wouldn't be surprised if there was a fifty-fold difference in turnover between supermarkets in small urban areas like Te Kauwhata and large cities like Hamilton. Should the government impose the same cost-plus regulation on all supermarkets, regardless of size? This could easily make small supermarkets unviable, leaving consumers with less choice and ultimately worse off.

Then, there are the incentive effects. When firms have to worry about their profits and margins, then they have an incentive to keep costs low as it keeps their profits high. However, if a firm has a cost-plus regulation in place, then they can earn the regulated margin regardless of their costs. There is less incentive for keeping costs low. This proposal could easily have the unintended consequence of higher prices for consumers in the long run, more waste and less efficiency in the supermarket sector.

Finally, the supermarket firms are not just retailers, but wholesalers. By itself, this proposal on retail prices would need to be carefully designed. Otherwise, the supermarkets will simply route around it by separating out their wholesale operations into a different business, which sets the wholesale prices, upon which the retail prices (cost-plus wholesale) will be based. Then the supermarket profits will simply back up one step as wholesale, rather than retail, profits. [*] This might be one way that the supermarkets will respond to the Commerce Commission's recommendation that the supermarkets be required to offer wholesale supply to other grocery retailers (see here) anyway.

All in all, regulating supermarkets as public utilities is thoroughly impractical, and possibly counter-productive. The Commerce Commission has made its recommendations. In general, they seem a sensible way of opening the retail grocery market (if not the wholesale market) to more competition. We should see whether those changes work before we open the door to crazy ideas.


[*] This would work for Countdown, which owns all the retail stores, but possibly not so well for Foodstuffs, where the stores are owner-operated. However, I'm sure Foodstuffs could find some way to make a flavour of this work.

Thursday, 24 March 2022

Confused (or cynical) policy on fuel taxes and public transport prices

I've been meaning to write about the government's announcement last week of changes in petrol excise tax and public transport fees. From the government's press release:

The Government will cut 25 cents a litre off fuel for three months as part of a cost of living package aimed at giving Kiwi families immediate relief through the current global energy crisis triggered by the war in Ukraine, Prime Minister Jacinda Ardern announced today. 

Fuel excise duties and road user chargers will be reduced by 25 cents each and the price of public transport will be halved as part of a package of measures to reduce transport cost pressures on middle and low income households.

“We cannot control the war in Ukraine nor the continued volatility of fuel prices but we can take steps to reduce the impact on New Zealand families,” Jacinda Ardern said.

That's fair enough. Around the same time as this announcement, the Prime Minister finally acknowledged that New Zealand is facing a cost of living crisis (which in itself is unnecessary hyperbole, since literally everything seems to be a crisis these days - the word crisis is starting to lose all meaning). Inflation is at the highest level in a generation. Pay rises are not keeping up, meaning that real wages are falling. But all of that was the case before Russia invaded Ukraine. And fuel prices are only part of the cost of living story. So why wait until now to address cost of living? Despite the war, cost of living didn't suddenly become an issue last week. Or did it? The cynical view, expressed for example by Jack Tame, is that:

...petrol taxes would never have been cut if Labour had been well ahead in last week's poll. They saw the poll numbers. They freaked out. They dropped almost $400m to try and win back some popularity.

A slightly less cynical take is that, for the reduction in public transport fees at least, the government may have planned to include the change in the Budget (to be announced in May), but felt the need to bring them forward (although, the reason why they would announce it early so gets us back to the recent negative political polls).

Anyway, there are two serious problems with this policy package of reducing excise taxes on fuel, and increasing public transport subsidies. First, it isn't well targeted. My wife and I really appreciated being able to fill our car with petrol for $30 less on our way back from Whanganui last week. But surely the purpose of the petrol excise reduction was not to assist in defraying the cost of inter-city travel for families in the top quintile of earnings? If the government really wants to help low-income families dealing with a higher cost of living, they should increase Working for Families, increase benefit rates, or pay a one-off payment through the benefit or tax system. Then the money goes to those who really need it. And if they don't need it for fuel, they can use it for something else. To be fair, the Prime Minister reminded us that they are doing some of that as well:

“In addition on April 1 a suite of permanent increases to household incomes will see 60 percent of families earning more from Working for Families, as well as increases to superannuation and benefits. On May 1, one million New Zealanders will also start receiving the Winter Energy Payment which will provide $30 a week extra to many.”

However, those are not new changes, having been announced much earlier (see here and here). The government could have made the benefit and WFF increases even larger if they wanted to mitigate further increases in the cost of living. Alternatively, given that the excise tax reduction is temporary, perhaps the government could have given a temporary increase in benefits and WFF (although, it would be much more difficult for those changes to be undone later, with potentially negative political consequences).

The second problem has been well laid out elsewhere (see these posts by Eric Crampton or Matt Nolan). If climate change really is this generation's nuclear-free moment, why on earth would the government undo some of the good work that the Emissions Trading Scheme is doing, by making carbon-emitting vehicles cheaper to run? Yes, the lower public transport fees may induce some commuters to switch to public transport, but lower fuel prices totally work counter to that, decreasing the incentives for commuters to switch to public transport.

All up, it's hard to see those policy changes as anything but a cynical vote grab. They aren't targeted at reducing costs or increasing incomes for those who truly need it. They're undoing an otherwise positive effect of high fuel prices on carbon emissions. And they're unlikely to have a positive effect (and may even be counter-productive) in terms of public transport patronage. Possibly, the government is hoping that the voting public has the same low level of economic literacy that they do. Things may not be that bad, yet. On the plus side, the government now seems to recognise that an excise is a tax.

Tuesday, 22 March 2022

Using microcommitments and social accountability to improve online and hybrid learning

One of the biggest challenges with online teaching is proving to be getting students to engage. It is much easier for students to take a passive (and much less effective) approach to their learning when they don't have to be present in class every week. The online approach generally fails to make students accountable for their learning, even though in theory the students should be more accountable to themselves, because it relies on their being more self-directed. But it needn't be that way.

This 2021 article by Amanda Felkey (Lake Forest College), Eva Dziadula (University of Notre Dame), Eric Chiang (Florida Atlantic University), and Jose Vazquez (University of Illinois at Urbana-Champaign), published in the AEA Papers and Proceedings (sorry, I don't see an ungated version) describes an experiment on increasing social accountability for students. Specifically:

...we conducted a randomized controlled experiment in fall 2019. Experiment participants were recruited from economics courses taught by six instructors at three universities, using three teaching modalities - face to face, online, and hybrid...

...students were randomly assigned to the control or treatment group. Between two exams in their courses, all participating students were sent daily content that corresponded to the material they were learning in their course. The content was designed to engage students for no more than five minutes per day. These small actions included problems, prompts to practice important concepts, and questions compelling students to relate course material to their own lives. No problems or answers were turned in by students or reviewed by instructors...

All students received the same daily content. Those in the control group received the content via text message, nudging them to do the small task. Students in the treatment group received the content from a platform, accessed via a link sent by text message, containing the same daily task but with a commitment device called a microcommitment. These students were asked whether they commit to doing the task that day. If they committed, the platform would follow up in the afternoon and ask whether they did the task... Commitments and completed tasks appeared on a social feed that provided social accountability.

So, what happens when students can see if the other students in their class are completing tasks? Felkey et al. find that:

An OLS analysis finds that microcommitments have a positive and significant effect on student performance... score. As a proportion of the class average, relative performance increased significantly by 0.015 to 0.017, or approximately 1.3 additional percentage points on the post-intervention exam for those in the treatment group. The positive effect of microcommitments was driven by improved performance among students in online and hybrid courses and is equivalent to approximately 3.5 additional percentage points on the post-intervention exam. The effect of microcommitments with social accountability on students in face-to-face courses is insignificant.

That the positive effect was concentrated on online and hybrid classes is important, given that the ongoing pandemic is ensuring a continuation of those teaching modes. Felkey et al. note that:

Perhaps this intervention with commitment devices and social accountability partially substitutes for the lack of instructor contact when a course is taught in an online or hybrid modality.

That seems sensible to me. If the problem with online learning is the lack of engagement, then perhaps the microcommitments and social accountability enforces a higher level of engagement among the online students. An important question, though, is which students benefit? As I've noted several times before, online learning can be good for self-directed high-ability students, but detrimental for low-ability students who lack a high degree of self-direction. Felkey et al. find that:

...microcommitments positively and significantly affect students with inferior previous academic performance as measured by reported GPA.

That might be the most important finding from this research. However, before we get carried away (and beyond just noting that this is a single study in a crowded literature):

While we would most like to reach students with the lowest self-efficacy with performance-enhancing interventions, microcommitments do not have a significant effect on performance among these students. However, they do positively affect performance among students with above-average self-efficacy.

Their measure of self-efficacy is based on the Motivated Strategies for Learning Questionnaire (MSLQ), which measures learning strategies and academic motivation. So, while the microcommitments and social accountability works for low-performing students, it doesn't work for students with low academic motivation. It would be nice to know how the interactions work here, but despite mentioning the results in the text, Felkey et al. don't report the results by GPA either in the paper or in the online appendix. Importantly, since lower-performing students and more motivated students both do better with the microcommitments, how to low-performing and unmotivated students perform? We don't get an answer to that.

Nevertheless, this paper provides us with some small hope that there are strategies that can be implemented to engage students in online learning and to mitigate some of the negative effects of learning in that environment, especially for low-performing students. The questions are, whether this extends to other contexts, and how it can be implemented in various learning management systems (like Moodle or Blackboard).

Read more:

Monday, 21 March 2022

Social capital and success in the adult film industry

In economics, capital can be simply defined as 'things that can be used to produce other things'. Taking a broad interpretation of 'things' allows us to recognise intangible forms of capital, such as human capital (our education, knowledge, and experience) and social capital (our connections with others, trust, and shared understandings). These intangible forms of capital make people more productive (which is what makes them capital), but intangible forms of capital are more difficult to measure than physical or financial forms of capital.

How much more productive does social capital make people? That is a difficult question to answer, in most part because of the difficulty of measuring social capital. However, I recently ran across an interesting attempt to measure the gains from social capital in the adult film industry, described in this article by Jochen Lüdering (Justus-Liebig-Universität Gießen), published in the journal Applied Economics in 2018 (ungated earlier version here).

Lüdering collated data on performers in the adult film industry, based on the Internet Adult Film Database, and covering the period from 1970 to 2013. He then captured the 'network structure' of the industry, based on each performer (as a 'node') and all of the other performers that they appeared in at least one film with (the connections between nodes are referred to as 'undirected links'). The network structure can be summarised in various ways in order to try to measure social capital. Lüdering uses 'Eigenvector centrality', which is a measure of how much 'influence' a node (in this case, an adult film performer) has on the rest of the network, based on how well-connected they are, and how well-connected their connections are.

In terms of the 'success' (as Lüdering refers to it) of each performer, Lüdering uses the length of time each performer remained in the industry. He justifies this as:

...this article relies on survival (time active) in the industry as a measure of economic success for individual performers and argues that one person stays in the industry as long as (economic) benefits exceed costs. In this argument, ‘costs’ should be defined in very broad terms and should include all kinds of adverse effects for the person.

The argument is that more 'successful' performers will receive greater benefits from their performances, and therefore remain in the industry for longer. There are good arguments why that might not be the case though, if performers have a 'target' for lifetime earnings and stop performing when they reach their target (there is evidence that taxi drivers labour supply exhibits such income targeting, for example, but this evidence is contested). Putting aside that concern, in a survival analysis Lüdering finds that:

Ones eigenvector centrality is both significant and negative; this implies that a central actor in the network has a lower probability to experience an event (i.e. drop out of the industry) in the following period. Being disconnected from the network increases the risk to experience an event...

For a person twice as central as the median performer, the relative hazard is reduced by about one-third. 

In other words, social capital (as proxied by connectedness or influence within the network of adult film performers) is correlated with the length of time an adult film performer remains in the industry. More-connected performers remain in the industry for longer, which Lüdering suggests means that more-connected performers are more successful. I'm not sure I necessarily agree (as per my critique on the measure of success above). The length of time a performer remains in the industry doesn't necessarily tell us about the value they generate, or their productivity. We would need some additional data (which Lüdering acknowledges), and a follow-up study, for that. However, it is clear how important social capital is in this context.

Sunday, 20 March 2022

The impact of the coronavirus pandemic on university students and grades

The coronavirus pandemic, and the associated lockdowns and stay-at-home orders, had a substantial impact on teachers and learners at all levels. The anecdotal evidence for this is legion, and there is a growing academic literature as well (as one example, see my working paper on the impacts on New Zealand university students, co-authored with Barbara Fogarty-Perry and Gemma Piercy, here).

Like us, many researchers took the opportunity to field surveys of their students (our New Zealand survey was part of an international project - see here). One such researcher was Núria Rodríguez-Planas (City University of New York), who has had two new articles published, both based on the impact on students at Queens College (QC) at the City University of New York. The first article was published in the journal Economics of Education Review (ungated earlier version here), and focuses on survey evidence on the impacts reported by students (similar to my study linked above). The survey was fielded between July and September 2020, and received 3163 responses (a response rate of about 20 percent).

Now, measuring the impact of the pandemic and lockdowns is tricky, because we don't know what would have happened in the absence of the pandemic. Rodríguez-Planas argues that (emphasis is hers): the extent that the student has private information on how his academic performance and economic wellbeing were prior to the pandemic, I can estimate an individual-level subjective treatment effect by asking students directly how the pandemic has changed their experience or perceptions. This is indeed what I did in the online survey...

It's a pretty weak justification, but we'll just accept it at face value, noting that we really don't know what the counterfactual is (and it would be very difficult to establish one, since all students everywhere were affected similarly). In terms of subjective impact then, Rodríguez-Planas finds that: many as 34% of QC students considered dropping a class during the spring 2020 semester. The most frequent reasons students gave were being concerned that their grade would jeopardize their future financial assistance (27%), the need to work (9%), and the need to care for a sick family member (7%). In addition, some students also stated getting sick with COVID-19 (5%) or the need to move due to the pandemic (3%). Consistent with students’ concerns that the pandemic would hurt their grade and jeopardize their future financial assistance, 32% of the sample reported having difficulties maintaining their desired level of academic performance because of the pandemic and 22% reported having difficulties maintaining financial aid due to the pandemic.

However, the more interesting results arise when Rodríguez-Planas compares low-income students (proxied by those who had ever received a Pell grant) and high-income students (those who had never received a Pell grant). She finds that:

...Pell recipients were 12 percentage points more likely to consider withdrawing a class during the spring 2020 semester than students who never received the Pell grant. This represents a 41% increase relative to the average effect for never Pell recipients of 28.5%. This greater withdrawing consideration during the spring 2020 semester is driven by Pell recipients more likely to: (1) be concerned that their grade would jeopardize financial assistance (64%); (2) care for a family member (44%); and (3) get sick with COVID-19 (49%) than the never Pell recipients. These estimates are statistically significant at the 5% level or lower...

Pell recipients were 20% more likely to report losing a job due to COVID-19 and 17% more likely to report being laid-off or furloughed because of COVID-19 than never Pell recipients. They were also 17% more likely to report a reduction in wage and salary earnings than students who never received the Pell grant... Importantly, Pell recipients were 65% more likely to have faced food and shelter insecurity, and 17% more likely to have experienced difficulties to replace a lost job or internship than never Pell recipients. All these impacts are statistically significant at the 5% level or lower.

Then, turning to a comparison of first-in-family students with others, and transfer students with others, Rodríguez-Planas finds that:

First-generation students are more likely to report difficulties with academic performance and continuing college education than their counterfactual. They are also more likely to report changing their graduation plans by postponing graduation and considering graduate school than their counterfactual. Similarly, transfer students are more likely to report difficulties with continuing college education, changing their graduation plans by considering graduate school or taking more classes than their counterfactual.

So, in summary this paper tells us that the pandemic had a negative impact on students. However, this impact was not the same for all students. Low-income students and first-in-family students faced the most serious impacts, but were impacted in different ways. Low-income students faced larger financial impacts because of their financial vulnerability, while first-in-family students faced larger educational impacts because of their educational vulnerability. There is nothing that that we should not have been able to anticipate.

Rodríguez-Planas's second article was published in the Journal of Public Economics (sorry, I don't see an ungated version of this one online), and focuses on the impact on student grades, using administrative data from over 11,000 QC students over the period from Spring 2017 to Spring 2020. In particular, Rodríguez-Planas focuses on the difference in impact between low-income and high-income students (again proxied by Pell grant receipt or not). Using a difference-in-differences framework, she finds that:

...lower-income students outperformed higher-income ones during Spring 2020 as they earned a 5.1% higher GPA and they failed 28% fewer credits (or exercised fewer NC grades) than their wealthier counterparts. Both effects are statistically significant at the 5% level or lower. Lower-income students’ higher relative academic performance during the first semester of the pandemic is largely associated with the flexible grading policy as the GPA differential by income status vanishes (β2 is close to zero and not statistically significant) when I estimate the effect of the pandemic with the GPA prior to students exercising the flexible grading option.

To clarify, higher-income students performed better in Spring 2020 than in other semesters, but the increase in performance was even greater for lower-income students. That's because lower-income students took greater advantage of flexible grading options where they otherwise would have received a low grade for a particular course. Rodríguez-Planas goes on to examine the mechanisms using data from her survey, and finds that:

...because of the pandemic, top-performing lower-income students were 9.7 percentage points more likely to report asking for an incomplete than their bottom-performing lower-income peers and 5.8 percentage points more likely to ask for an incomplete than their higher-income peers... Both estimates are significant at the 1% level.

Rodríguez-Planas concludes that:

...the flexible grading policy was able to counteract negative shocks, especially among the most disadvantaged students. Because low-income students regularly face idiosyncratic challenges, the results in this paper suggest that a higher use of the pass/fail grade (if not for all courses, for certain courses) may support students during critical moments.

I found this second article and its conclusion interesting, not because it was about the impact on students, but because it illustrated the impact on grades. Grades have an element of subjectivity to them, and they can be manipulated by scaling, by grading policies, or by the strictness or leniency of individual lectures. The impact of the flexible grading option chosen by QC was of particular interest because switching to pass/fail grading was something I advocated for at Waikato (unsuccessfully), when we first went into lockdown in 2020. In my view, this would have reduced student stress, reduced teacher stress, avoided the rampant grade inflation that eventually occurred as students were scaled upwards (even when the grade distribution was already high due to the move to online open book assessment), and kept student GPAs sensible and informative. I would suggest that this was a good learning opportunity for the institution, but hopefully we never find ourselves in that situation again!

Thursday, 17 March 2022

The war in Ukraine, and the markets for wheat and quinoa

Earlier this week, David Ubilava (University of Sydney) wrote in The Conversation:

Russia and Ukraine between them account for almost a quarter of the world’s wheat exports.

Russia and Ukraine are also big exporters of maize (corn), barley, and other grains that much of the world relies on to make food...

Since the start of February, as war became more likely, the grains and oilseed price index compiled by the International Grains Council has jumped 17%.

The big drivers have been jumps of 28% in the price of wheat, 23% in the price of maize and 22% in the price of barley.

Russia and Ukraine account for one fifth of the world’s barley exports. Maize is a common substitute for wheat and barley.

A simple supply and demand model can explain what is going on in the world market for wheat (with the same explanation serving for maize and barley as well). Consider the wheat market shown in the diagram below. Prior to the threat of war, the market was at equilibrium, with supply S0 and demand D0. The price of wheat was P0, with Q0 units of wheat traded. Then, with supply from Ukraine and Russia disrupted, supply decreases to S1. The price of wheat goes up to P1, with less wheat (Q1) now traded.

The higher price of wheat (and barley, and maize) will ripple through a lot of other markets. Products that use wheat will face higher costs of production, decreasing supply in those markets and raising prices (the same diagram applies as that shown above).

However, it will also affect the market for other grains, which are substitutes for wheat. As those substitutes are now relatively cheaper than wheat, at least some buyers will substitute to those other grains. Consider the market for quinoa, as shown in the diagram below. [*] Initially, the market was at equilibrium, with supply S0 and demand D0. The price of quinoa was P0, with Q0 units of quinoa traded. Then, with quinoa now relatively cheaper than wheat, the demand for quinoa increases to D1. The price of quinoa goes up to P1, with more quinoa (Q1) now traded.

Given that grains (not just wheat, maize, quinoa, but rice) are such a staple source of calories in every country, then it is likely that these higher prices of all grains will have negative implications for global food security, nutrition, and poverty in the immediate future. As Ubilava concluded: supply chains and global stability are certain to be tested.

It will take a village to stop this war and mitigate its repercussions. The rich and powerful of the village should do all they can to hold it together.


[*] I was originally going to use oats for this example, but then I found out that Russia is also the world's largest supplier of oats.

Tuesday, 15 March 2022

Coin-operated fountains as a public good

I'm in Whanganui this week on a writing retreat. This afternoon, we went for a walk around Rotokawau Virginia Lake. When we arrived, we noticed a pretty sad looking fountain in the front of the lake, called the Higginbottom Fountain. However, after walking around the lake and on our way back to the car, I noticed a sign that made it clear that the fountain was coin operated ($1 for 10 minutes, and $2 for twenty minutes). One $2 coin later:

That got me thinking about paying for public goods, which are goods that are non-rival (one person consuming the good doesn't reduce the amount of the good or service available for everyone else) and non-excludable (if it is available for anyone, then it is available for everyone). Fountains like the Higginbottom Fountain are a good example of a public good. One person admiring the fountain or taking a picture of it doesn't stop anyone else from doing the same (non-rival), and if the fountain is viewable by anyone, it is viewable by everyone (non-excludable).

The problem with public goods is that private willingness-to-pay for the public good is often not enough to ensure that the public good is provided to the socially efficient quantity (on a related point, see here). So, if no one is willing to pay the full cost of operating the fountain by themselves, then no one will pay and no one will benefit from it, even if everyone in total would be willing to pay the cost. The problem is that many people will elect to be free riders - receiving benefit from the public good, even though they aren't paying anything towards it. In this case, until we arrived no one was willing to pay the $2 cost by themselves, but everyone ends up benefiting after we paid the cost ourselves (and if we hadn't, the fountain would not have operated).

In discussing in my ECONS102 class the problem of paying for public goods when there are free riders, I often use street lights as an example. I then make the joke that street lights you could make street lights excludable (and limit the free rider problem) if they were coin-operated - pedestrians put a coin in the base of each light, in order to make the next one light up. Well, it appears that the joke is on me. You can sometimes fund public goods that way.

Friday, 11 March 2022

The effects of training policy makers in econometrics

David Card, Joshua Angrist, and Guido Imbens shared the Nobel Prize in economics last year, for their contributions to the 'credibility revolution' in economics. The revolution involved the adoption of a range of new econometric methods designed to extra causal estimates, and set a much higher standard of what constitutes sound evidence for policy making. However, policy makers have not necessarily caught up. It seems that there could be substantial gains in improved policy to be achieved from training policy makers to use the insights from the credibility revolution.

That is essentially what this 2021 paper by Sultan Mehmood (New Economic School), Shaheen Naseer (Lahore School of Economics) and Daniel Chen (Toulouse School of Economics) sets out to investigate. Mehmood et al. conducted a thorough randomised controlled trial involving deputy ministers in Pakistan. As they explain:

We conducted a randomized evaluation implemented through close collaboration with an elite training academy. The Academy in Pakistan is one of the most prestigious training facilities that prepares top brass policymakers—deputy ministers—for their jobs. These high-ranking policy officials are selected through a highly competitive exam: about 200 are chosen among 15,000 test-takers annually.

There are a lot of moving parts to this research, so it is difficult to summarise (but I'm going to try!). First, Mehmood et al.:

...conducted a baseline survey and asked the participants to choose one of two books (1) Mastering ’Metrics: The Path from Cause to Effect by Joshua Angrist and Jörn-Steffen Pischke or (2) Mindsight: The New Science of Personal Transformation by Daniel J. Siegel...

Actually, they asked the deputy ministers to choose a high or low probability of receiving each of the two books, and then (importantly) they randomised which book each deputy minister actually received (the randomisation is important, and a point we will return to later). The book Mastering 'Metrics (which I reviewed here) is important, because it is the effect of assigning that book that Mehmood et al. set out to test. Mastering 'Metrics is essentially an exposition of the methods that constitute the credibility revolution, and it presents the randomised controlled trial (RCT) as the 'experimental ideal'. However, the treatment doesn't stop only with the assignment of a book to read:

The meat of our intervention is intensive training where we aim to maximize the comprehension, retention, and utilization of the educational materials. Namely, we augmented the book receipt with lectures from the books’ authors, namely, Joshua Angrist and Daniel Siegel, along with high-stakes writing assignments... As part of the training program, deputy ministers were assigned to write two essays. The first essay was to summarize every chapter of their assigned book, while the second essay involved discussing how the materials would apply to their career. The essays were graded and rated in a competitive manner. Writers of the top essays were given monetary vouchers and received peer recognition by their colleagues (via commemorative shields, a presentation and discussion of their essays in a workshop within the treatment arm). Deputy ministers in each treatment group also participated in a zoom session to present, discuss the lessons and applications of their assigned book in a structured discussion.

Performance in the training programme is highly incentivised, not only because of the rewards on offer, but because the grades matter for the future career progress of each deputy minister. So, they had a strong incentive to participate fully. 

Mehmood et al. then test the effect of being assigned the Mastering 'Metrics treatment on a range of outcomes measured four to six months after the workshop, finding that:

While attitudes on importance of qualitative evidence are unaffected, treated individuals' beliefs about the importance of quantitative evidence in making policy decisions increases from 35% after reading the book and completing the writing assignment and grows to 50% after attending the lecture, presenting, discussing and participating in the workshop. We also find that deputy ministers randomly assigned to causal training have higher perceived value of causal inference, quantitative data, and randomized control trials. Metrics training increases how policymakers rate the importance of quantitative evidence in policymaking by about 1 full standard deviation... When asked what actions to undertake before rolling out a new policy, they were more likely to choose to run a randomized trial, with an effect size of 0.33 sigma after completing the book and writing assignment (partial training) and 0.44 sigma after attending the lecture, presentation, discussion and workshop (full training). We also observe substantial performance improvements in scores on national research methods and public policy assessments.

Mehmood et al. also conducted a field experiment to evaluate how much the deputy ministers would be willing to pay for different types of evidence (RCTs, correlational data, and expert bureaucrat advice), and find that:

...treated deputy ministers were much more willing to spend out of pocket (50% more) and from public funds (300% more) for RCTs and less willing to pay for correlation data (50% less). Demand for senior bureaucrats’ advice is unaffected.

Mehmood et al. then ran a second field experiment, where:

First, we elicited initial beliefs about the efficacy of deworming on long-run labor market outcomes. Then, they were asked to choose between implementing a deworming policy versus a policy to build computer labs in schools... Next, we provided a signal - a summary of a recently published randomized evaluation on the long-run impacts of deworming... After this signal, we asked the same deputy ministers about their post-signal beliefs and to make the policy choice again.

Mehmood et al. find substantial effects:

From this experiment, we observe that only those assigned to receive training in causal thinking showed a shift in their beliefs about the efficacy of deworming: the treated ministers became more likely to choose deworming as a policy after receiving the RCT evidence signal. The magnitudes are substantial - trained deputy ministers doubled the likelihood to choose deworming, from 40% to 80%. Notably, this shift occurs only for those ministers whose previously believed the impacts of deworming were lower than the effects found in the RCT study, while those who previously believed the effects were larger than the estimate reported in the signal did not shift their choice of policy.

Importantly, experimenter demand effects are likely to be limited, because as part of the experiment they recommended that the deputy ministers should choose the policy to build computer labs. If anything, these effects are going to be underestimates.

Next, Mehmood et al. test the effects on prosocial behaviour, which might be a concern if you think that studying economics makes students less ethical (see here, or here, or here). On this point:

The administrative data also included a suite of behavioral data in the field, for example, a choice of field visits to orphanages and volunteering in low-income schools. This allowed us to assess potential crowdout of prosociality, an oft-raised concern about the teaching of neoclassical economics... We detected no evidence of econometrics training crowding out prosocial behavior - orphanage field visits, volunteering in low-income schools and language associated with compassion, kindness and social cohesion is not significantly impacted. Scores on teamwork assessments as a proxy of soft skills were also unaffected...

Finally, the randomisation of deputy ministers to books was important, as that provides an indication of the initial preferences for each book. Ministers who preferred the 'Mastering Metrics book could differ in meaningful ways from those who preferred Mindsight, and that might affect the estimated impact of the treatment. Instead, Mehmood et al. note that:

A typical concern in RCTs is that the compliers respond to treatment and we estimate Local Average Treatment Effect (LATE) since we do not observe defiers. It is a plausible concern that people who demand to learn causal thinking may be more responsive to the treatment assignment. Thus estimates of the treatment impacts would be uninformative on those who are potential non-compliers. In our unique experimental set-up, we developed a proxy for compliers through those who demanded the metrics book; we show that the effects are the same for both the high and low demanders... we observe no significant differences between the treatment effects for low and high demanders of metrics training.

There is a huge amount of additional detail in the paper. The overall takeaway is that understanding the importance of causal estimates makes a significant difference to the preferences and decision-making of policy makers, and therefore can contribute to better decision-making. We have these results for Pakistan, but it would be interesting to see if they hold in other contexts. And if they do, then the training of policy-makers should include training in basic econometrics.

[HT: Markus Goldstein at the Development Impact blog]

Wednesday, 9 March 2022

Decision-makers don't dislike uncertain advice, but they do dislike advisors who are uncertain

One aspect of my research and consulting involves producing population projections, which local councils use for planning purposes. Population projections involve a lot of demographic changes, each of which are not perfectly known beforehand, so projections inherently have a lot of uncertainty (for more on that point, see here). [*] Understandably, while planners are trying to make plans for an uncertain future, reducing the uncertainty of that future makes their jobs easier. In my experience, planners are usually looking for projections that give them one single number for the total population (for each year) to focus their planning on. [**] So, it seems natural to me that decision-makers would be averse to uncertainty, and prefer to receive predictions or projections that convey a greater degree of certainty.

It turns out that might not be the case. This 2018 article by Celia Gaertig and Joseph Simmons (both University of Pennsylvania), published in the journal Psychological Science (ungated version here) demonstrates that decision-makers don't have an aversion to uncertain advice at all. Gaertig and Simmons conducted a number of experimental studies where they presented research participants with advice and asked them to make a decision. In the first six studies, using research participants recruited from Amazon Mechanical Turk:

...participants were asked to predict the outcomes of a series of sporting events on the day on which the games were played. Participants in Studies 1 and 2 predicted NBA games, and participants in Studies 3–6 predicted MLB games...

For each of the games that participants were asked to forecast, we told them that, “You will receive advice to help you make your predictions. For each question, the advice that you receive comes from a different person.” Importantly, participants always received objectively good advice, which was based on data from well-calibrated betting markets. For each game, we independently manipulated the certainty of the advice, and, in all but one study, we also manipulated the confidence of the advisor.

Gaertig and Simmons presented research participants with advice, some of which was certain, and some of which was uncertain, with uncertainty expressed in a variety of way, including probabilistically. Research participants were then asked about the quality of the advice they received, and they made an incentivised choice, where they were paid more if they correctly predicted the outcome of the sporting event (the specific outcomes to be predicted varied across the studies). They found that:

As predicted, and consistent with past research, these analyses revealed a large and significant main effect of advisor confidence... Advisors who said “I am not sure but . . .” were evaluated more negatively than advisors who expressed themselves confidently.

More importantly, participants did not evaluate uncertain advice more negatively than certain advice...

Thus, these studies provide no evidence that people inherently dislike uncertain advice in the form of ranges.

Moving on to probabilistic statements of uncertainty, Gaertig and Simmons found that:

As in the previous analysis, there was a large and significant main effect of advisor confidence in all regressions... Advisors who said “I am not sure but . . .” were evaluated more negatively than advisors who expressed themselves confidently. We also found, in Study 6, that advisors who preceded their advice by saying, “I am very confident that . . .” were evaluated more positively than advisors who did not express themselves with such high confidence...

Participants evaluated exact-chance advice (e.g., “There is a 57% chance that the Chicago Cubs will win the game”) more positively than certain advice (e.g., “The Chicago Cubs will win the game”)...

Participants also evaluated approximate-chance advice (e.g., “There is about a 57% chance that Chicago Cubs will win the game”) more positively than certain advice...

In Study 5, we introduced a percent-confident condition, in which participants received confident advice in the form of “I am X% confident that . . . ” We found that participants evaluated this advice the same as certain advice...

The results of the “probably” condition were different, as participants did evaluate advice of the form “The [predicted team] will probably win the game” more negatively than they evaluated certain advice...

Gaertig and Simmons then go on to show that the research participants were no less likely to follow the uncertain advice in their predictions than the certain advice. They also found similar results in a laboratory setting (in Study 7), which also showed that:

...people’s preference for uncertain versus certain advice was greater when the uncertain advice was associated with a larger probability.

In other words, people prefer for advice that demonstrates uncertainty, when the probability is very high (or very low), but not so much when the uncertainty says that the chances of an event are 50-50. That makes some sense. However, it doesn't really accord with the finding of a preference for uncertain advice over certain advice - do decision-makers really prefer advice that says something is 95% certain than saying something is 100% certain? Perhaps they feel that predictions expressed with 100% certainty lack credibility?

Finally, in the last two studies Gaertig and Simmons asked research participants to choose between two advisors, one of whom provided certain advice while the other provided uncertain advice. They found:

...a large and significantly positive effect of the uncertain-advice condition, indicating that more participants preferred Advisor 2 when Advisor 2 provided uncertain advice than when Advisor 2 provided certain advice. This was true both when the uncertain advice came in the form of approximate-chance advice and in the form of “more-likely” advice. When one advisor provided certain advice and the other approximate-chance advice, 82.4% of participants chose Advisor 2 when Advisor 2 provided approximate-chance advice, but only 16.2% of participants chose Advisor 2 when Advisor 2 provided certain advice...

Gaertig and Simmons conclude that:

Taken together, our results challenge the belief that advisors need to provide false certainty for their advice to be heeded. Advisors do not have a realistic incentive to be overconfident, as people do not judge them more negatively when they provide realistically uncertain advice.

It seems that I may have misjudged decision-makers' preferences for certainty. They don't prefer certain advice; they prefer advisors who are certain about their uncertainty. 


[*] Here I'm using uncertainty in its everyday broad sense. Financial economists distinguish between uncertainty that can be quantified (which they refer to as risk), and uncertainty that cannot be easily quantified.

[**] This is an exaggeration of course, in two ways. First, planners recognise that there is uncertainty. However, explaining that uncertainty to elected decision-makers is difficult, so having a single number makes their job easier in that way as well. Second, planners don't only want a single number for the total population. They usually want to know a bit more detail about the age distribution, etc.

Tuesday, 8 March 2022

The state of social science research in New Zealand, and a parting shot from Superu

Last year, the Government released a green paper entitled Te Ara Paerangi - Future Pathways, which set off a process of consultation on the future of New Zealand’s research system, including research priorities, funding, institutions, workforce, infrastructure, and Māori research. You can read Universities New Zealand's submission on the green paper here. I don't have too much to say on it, but it did prompt me to look back at a report that has been sitting on my (virtual) to-be-read pile for some time. That's this report by David Preston, written just as Superu (the relatively short-lived Social Policy Evaluation and Research Unit, which succeeded the Families Commission) was being disestablished.

Social science research has always been the poor sibling in research priority and funding in New Zealand, in comparison with physical and biological sciences, and medical research. There are some ironies to that, which I'll return to later in this post. Preston's report catalogues the failures (and modest successes) of publicly funded social science research institutions in New Zealand. In particular, he notes the demise of the New Zealand Planning Council (1977-1991), the Commission for the Future (1977-1982), the Social Sciences Research Fund Committee (1979-1990), the New Zealand Institute for Social Research and Development (1992-1995), and finally Superu (2014-2017, previously the Families Commission from 2004-2014). The short-lived BRCSS (Building Research Capacity in the Social Sciences) initiative (2004-2010), which I was on the steering group of as it wound down and became eSocSci, also rates a mention. The only two successes that Preston notes are the New Zealand Council for Education Research (NZCER) and the Health Research Council (HRC), both of which were set up in the 1930s are persist to this day.

Based on a study of documentary evidence from annual reports and other sources, as well as interviews with key informants, Preston identifies a number of factors that are associated with the success or failure of these institutions for social science research. In terms of the successful institutions:

Some of the factors which are present in successful social research bodies are those which are common to all successful professional organisations. These include competent professional staff, good management, and adequate resourcing and critical mass for the size of the task they faced. All Long Life institutions, however, also display four other characteristics:

1. a clearly defined field of research

2. well identified research priorities

3. a stable long term funding model, at least for base line funding

4. effective relations with the departmental policy and social service delivery agencies.

And in terms of the unsuccessful institutions:

For government departmental social research units which diminished or vanished in the course of public sector restructuring, no common factor other than the restructuring itself has been identified. They were located in ’less core’ parts of the public sector. Apart from this, no particular pattern is evident.

For social research and advisory bodies outside of main departments, the factors associated with a short existence are one or more of the following:

1. attempting to cover too many different areas of social research

2. not providing the type of information and advice wanted by the government of the day, or providing information or advice at odds with their policy direction

3. lack of an adequate long term base funding arrangement.

Probably the key thing that stands out from this report (aside from the fact that it is clearly a parting shot from Superu, which funded the report), is the highly political nature of social science funding. For example, Preston notes the problems associated with multi-sector research institutions that sit outside of core government services:

While this position outside of government proper gives the institution more independence, it also makes the entity more vulnerable to unfavourable reactions from the government of the day. This is especially so if it is providing advice or information on politically sensitive issues. The government cannot do without its core government departments, but it can do without particular advisory bodies or research institutions.

A related example is the closure of the Social Policy Journal of New Zealand, which had been set up and run by the Social Policy Agency, part of the Department of Social Welfare:

No official reason was ever given for the closure of the Journal. However, informal sources commented that an article about to be published included information which indicated that a statement made by a Minister was inaccurate. Publication of the issue was delayed until public interest in the topic died down and it was decided to cease publication of the Journal, apparently to avoid future difficulties with Ministers.

The development indicates the difficulties of maintaining the ability to publish research findings within a politically sensitive environment in a government department.

All of this suggests that social science research institutions are always in a precarious position, reliant on short-term funding sources and beholden to the political whims of government. Preston summarises the various reviews of social science research that have been undertaken since the 1970s (of which there have been many). One common theme across those reviews is the need for a social science research institution with a sufficient level of baseline funding to maintain core research activity. Preston uses the example of the Brookings Institution from the U.S., which admittedly is not funded by the government, but has had a lasting impact on policy development and is generally well respected.

New Zealand needs quality social science research capacity. At present, there is some capacity in various research units in government, including the Social Welfare Agency (previously the Social Investment Agency), Ministry of Social Development, and elsewhere. However, the main capacity lies in academic institutions and in the private sector, both of which have incentives that do not necessarily lead to good policy-relevant research. In the case of academics, the incentives are to publish in high quality international journals, which tend not to be interested in policy-focused New Zealand research. In the case of the private sector, their incentive is to cater to the needs of paying clients, such as market research and economic research.

The irony is that the government is increasingly focusing research funding and infrastructure on physical and biological sciences, agriculture, and engineering (broadly defined). There is a clear focus on research that provides economic returns. It might seem odd for an economist to argue that we need to refocus resources in a different direction than that, but there are important societal challenges where the natural sciences and engineering are of limited assistance. Improving social wellbeing, and better understanding the impacts of inequality, can't be achieved by ignoring the important contributions of social science. Climate change research also can't ignore social science, because ultimately it is the behaviour of people that will determine the success or otherwise of climate change initiatives. Engineers and natural scientists often have a 'build it and they will come' mindset to the solutions they design, which ignores the complex motivations that real people have.

If New Zealand truly has a goal of improving wellbeing for all New Zealanders, then social science research has an important role to play, and needs appropriate funding and infrastructure to support that role. This is the one thing that most needs to be picked up in the consultation on the government's green paper.