Saturday, 30 April 2022

Dani Rodrik on the benefits of economic populism

I just read an interesting 2018 article by Dani Rodrik (Harvard University), published in the AER Papers and Proceedings (ungated here). Rodrik starts by outlining a taxonomy of regimes, based on whether there are political restraints and/or restraints on economic policy, resulting in the following 2x2 matrix (Table 1 from the article):


Rodrik describes the four possibilities in the 2x2 matrix as:

Personalized regimes such as Vladimir Putin’s in Russia or Tayyip Erdogan’s in Turkey are characterized by the absence of restraints in both the political and economic domains (box 1). But it is possible to conceive of autocratic regimes where important aspects of economic policy are placed on automatic pilot or delegated to technocrats (box 2). Pinochet’s regime in Chile provides an example.

Alternatively, a regime can be populist in the economic sense without rejecting liberal, pluralist norms in the political domain (box 3). Finally, a regime that is constrained in both politics and economics might be called a “liberal technocracy” (box 4). The European Union may be an example of the last type of regime: economic rules and regulations are designed at considerable distance from democratic deliberation at the national level, which accounts for the frequent complaint of a democratic deficit.

The rest of the article then mostly discusses the differences between regime (3) and regime (4), drawing an important distinction between two types of restraints on economic policy. First, there are:

...restraints on economic policy that take the form of delegation to autonomous agencies, technocrats, or external rules. As described, they serve the useful function of preventing those in power from shooting themselves in the foot by pursuing short-sighted policies.

Rodrik provides the example of delegating monetary policy to an independent central bank, or constraining trade policy through the use of free trade agreements. In terms of the second type of restraint on economic policy:

Commitment to rules or delegation may also serve to advance the interests of narrower groups, and to cement their temporary advantage for the longer run. Imagine, for example, that a democratic malfunction or random shock enables a minority to grab the reins of power. This allows them to pursue their favored policies, until they are replaced. In addition, they might be able to bind future majorities by undertaking commitments that restrain what subsequent governments can do.

Rodrik again uses the example of monetary policy, where a central bank's rigid adherence to inflation targeting can make us worse off, and to trade policy, where rules on intellectual property are exported and this extends the market power of holders of intellectual property rights.

Then comes the crux of Rodrik's argument - that economic populism, where the constraints on economic policy are relaxed or removed, may actually be beneficial in some cases. In particular, since:

...delegation to independent agencies (domestic or foreign) occurs in two different contexts: (i) in order to prevent the majority from harming itself in the future; and (ii) in order to cement a redistribution arising from a temporary political advantage for the longer term. Economic policy restraints that arise in the first case are desirable; those that arise in the second case are much less so.

Rodrik uses the substantial economic policy changes wrought by Franklin D. Roosevelt and the New Deal as an illustrative example. This paper presents an interesting framework to think about when constraining government economic policy may be a good idea, and when it may be better to relax the constraints. However, the short format of the article prevents a deeper examination of all of the implications of this framework. As described, I'm sure it could be used opportunistically to argue in favour of relaxing constraints in almost any situation. Hopefully, this is a topic that Rodrik is going to follow through on, as it really needs a book-length treatment (and I've quite enjoyed some of his other books - see reviews here and here).

Friday, 29 April 2022

Māori alienation from land and intergenerational wellbeing

The most important, or certainly most used, model of Māori health is Te Whare Tapa Whā, which was developed by Sir Mason Durie in the 1980s (see here). The model says that Māori health is underpinned by four pillars: taha wairua (spiritual health), taha hinengaro (mental or emotional health), taha tinana (physical health) and te taha whānau (family health). Durie outlined that a key component of taha wairua is Māori connections with the land, and that alienation from land would lead to worse health across all four pillars.

So, I was interested to read this new article by Rowan Thom and Arthur Grimes (both Victoria University of Wellington), published in the journal Social Science and Medicine (open access). Thom and Grimes look at the impact on wellbeing (variously measured) in more modern times of land confiscations as a result of the New Zealand Settlements Act 1863 and the Suppression of Rebellion Act 1863, during the New Zealand Wars - these confiscations are referred to by Māori as the Raupatu. They also look at the impact of land alienation more broadly (not limited to confiscations). The theory is that, if they can identify a statistical relationship between land alienation or land confiscation and modern Māori health, then that would provide evidence of a long-lasting intergenerational trauma affecting Māori.

Thom and Grimes start with a database of Māori land from 1840 (assuming all land at the time of signing of the Treaty of Waitangi was Māori-owned), 1864, 1880, 1890, 1910, 1939, and 2017. They then compute the proportion of land loss for each iwi (or iwi grouping) in the North Island for each year as one measure. By 2017:

Three-quarters of these iwi groupings today hold less than 12.5% of their rohe [region] as Māori land, and a quarter hold less than 2.7%.

Thom and Grimes then look at the relationship between the proportion of land retained by Māori for each year, and several measures of health and wellbeing, including: (1) the proportion of each iwi that can speak te reo Māori well or very well; (2) the proportion who report that it is hard or very hard to get support with Māori cultural practices; (3) the proportion who report that involvement in Māori culture is very important or quite important; (4) the proportion who had visited their ancestral marae in last year; and (5) smoking prevalence. These data were drawn from the 2018 wave of Te Kupenga - the Māori social survey run by Statistics New Zealand, which had a sample size of 5548. They also look at the impact of an iwi having experienced the Raupatu (as a binary variable - yes or no). They find that:

In each case, landholdings are a significant predictor of current cultural wellbeing outcomes, and it is landholdings around the end of the nineteenth century (1890 or 1910) that have the greatest explanatory power. Iwi that retained a greater proportion of their land at that time now have higher rates of te reo proficiency, place greater importance on involvement in Māori culture and are more likely to have visited an ancestral marae over the previous year; they are less likely to find it hard getting support with Māori cultural practices. Thus, greater retention of land has – over a century later – assisted those iwi in the retention of their cultural roots.

In 1910, the lower and upper quartiles of landholdings for our estimation sample (covering North Island iwi groupings) were 6.0% and 30.7% respectively. The effect of moving from the lower to the upper quartile of land retention in 1910 is estimated to be an extra 1.6 percentage points (p.p.) of that iwi grouping being able to speak te reo proficiently (well/very well) today. The same change in landholdings is estimated to have led to an increase in the current proportion of iwi members who find it important to be involved in Māori culture of 1.8 p.p., and an increase in the proportion who have visited a marae of 2.8 p.p.; the proportion who finds it difficult to find support for Māori cultural practices is estimated to decline by 0.6 p. p. 

Land retention is estimated to have had no significant effect on smoking rates across iwi, but experience of confiscation does. An iwi that was subject to land confiscation during the Raupatu is estimated to have a smoking rate that is 2.6 p. p. higher than in an iwi for whom confiscation did not occur.

So overall, the proportion of land retained (as opposed to the proportion alienated) was associated with better cultural outcomes. However, there was no additional effect of their iwi having experienced land confiscation (Raupatu). For smoking, the opposite was the case - land confiscation was associated with higher smoking prevalence, but the proportion of land retained was not. In all cases, the point estimates were pretty consistent across measures at different points in time, although the level of statistical significance was not. I would have been interested to see to what extent statistical significance survived an adjustment for multiple hypothesis testing.

That minor gripe about multiple testing aside, this research is potentially important. However, it did disappoint me a little, as the direct correlation between land alienation and health will now prevent me from using land alienation as an instrument for Māori social capital in another research project, as I had intended.

Thom and Grimes conclude that:

The research indicates the importance of reconnecting people with their whenua and rohe, and the central role that they have in improving the wellbeing outcomes of iwi. The process of individuals reconnecting with their rohe is a form of active healing, in which they are strengthening and expressing culture, rebuilding relationships and addressing trauma and grief...

That raises an interesting possibility for follow-up research. To what extent has the return of land through the Treaty settlement process contributed to improved Māori health and wellbeing? I wonder if anyone has considered that idea?

[HT: Matt Roskruge]

Wednesday, 27 April 2022

Grade inflation and college completion in the U.S.

Grade inflation is one of those rare problems where the incentives are all wrong. Teachers have the incentive to raise grades, because they get better teaching evaluations (for example, see this post). Academic departments and university administrators have an incentive not to discourage inflated grades, because higher grades attract students. Students have no incentive to argue against grade inflation, because higher grades make them seem like they are doing better (even though, as noted in this post, the signalling value of grades is reduced by grade inflation, and grade inflation may actively harm students' learning).

What is grade inflation and why is it harmful? In this 2020 article published in the journal Economics of Education Review (ungated earlier version here), Adam Tyner (Thomas B. Fordham Institute) and Seth Gershenson (American University) outline several ways of thinking about grade inflation. However, to me their paper is a solution looking for a problem. I think most people would recognise grade inflation as what Tyner and Gershenson term dynamic grade inflation, that is the:

...relationship between grades & achievement changes over time...

In other words, if students with a given level of understanding this year receive a higher grade than students with the same level of understanding ten years ago would have, then that represents grade inflation. Simple.

Or not quite so simple. As I noted in this post, the evidence on the existence of grade inflation is somewhat patchy, and it isn't clear how much of it can be explained by changes over time in student quality, teaching quality, or course selection by students.

That's where this new NBER working paper by Jeffrey Denning (Brigham Young University) and co-authors comes in. They consider the puzzle of increasing college completion rates from the 1990s to today. This is a puzzle because:

Trends in the college wage premium, student enrollment, student preparation, student studying, labor supply in college, time spent studying, and the price of college would all predict decreasing college graduation rates. The patterns for enrollment by institution type yields an ambiguous prediction. Despite the bulk of the trends predicting decreasing graduation, we document that the college graduation rate is increasing.

A lower college wage premium should decrease the incentives for students to complete college. Greater enrolments should reduce the 'quality' of the marginal student, and reduce the proportion of students completing college. Students are now less prepared when leaving high school than previous generations (as noted in my review of the Goldin and Katz book, The Race between Education and Technology). Students work more, and consequently spend less time studying, and that is in turn related to the high cost of tertiary education, and those are all impediments to the completion of university study. And in spite of all of those trends, college completion rates in the U.S. have trended upwards since the 1990s.

Denning et al. implicate changing standards as the driver of these higher completion rates, that is, grade inflation. They use a range of data to support their case, including nationally-representative longitudinal data from the National Education Longitudinal Study of 1988 (NELS:88) and the Education Longitudinal Study of 2002 (ELS:2002), detailed administrative data from nine large public universities (Clemson, Colorado, Colorado State, Florida, Florida State, Georgia Tech, North Carolina State, Purdue, and Virginia Tech) for cohorts entering between 1990 and 2000, and detailed microdata from an unnamed 'Public Liberal Arts College'.

There is a lot of detail in the paper, so I'll just quickly describe some of the highlights. Denning et al. use a decomposition method based on the change over time from the two longitudinal studies, and find that:

...there is a 3.77 percentage point increase in the probability of graduation from the NELS:88 cohort to the ELS:2002 cohort. The total explained by observable characteristics is -1.92. This suggests that covariates would predict that graduation rates would decrease by 1.92 percentage points. Hence, the residual or unexplained change is 5.69 percentage points or 151 percent of the change is unexplained by covariates. Student preparedness would predict a decline in graduation rates of 1.26 percentage points. Student-faculty ratios explain a 0.28 percentage point decline and initial school type explains no change in graduation rates.

In other words, the probability of graduation has increased over time, but observable changes in student preparedness, student-faculty ratio, and school type all point in the wrong direction (as per the trends noted earlier).

The longitudinal studies are limited in the variables that are available, so Denning et al. next turn to the administrative data from the nine public universities. Looking at what explains the increase in GPA over time, they find that there is:

...a statistically significant increase of 0.019 per year in first-year GPA between 1990 and 2000. Controlling for demographic characteristics, school attended, and home zip code leave the coefficient unchanged. Including very flexible controls for SAT scores reduces the coefficient on year of entry only slightly to 0.014. We also include fixed effects for major by institution to account for the potential of changing major composition. Last, we include fixed effects for all first-semester courses and the coefficient is unchanged. We include these fixed effects to account for shifts in student course taking that may explain changes in GPA but find that courses taken cannot explain the change in GPA.

This evidence shows that rising grades cannot be meaningfully explained by demographics, preparation, courses, major, or school type. Put another way, equally prepared students in later cohorts from the same zip code, of the same gender and race, with the same initial courses, the same major, and at the same institution have higher first-year GPAs than earlier cohorts.

 That certainly smells like grade inflation. Finally, Denning et al. look at the data from unnamed Public Liberal Arts College, which has an advantage in that there is an objective measure of student performance. In two required science courses, students sat exactly the same exam in different years. IN that case:

...we control for course fixed effects, demographic characteristics, and final exam scores in these two science courses and find that a year later entry corresponds to a large and statistically significant 0.060-point increase in GPA... Students with the exact same score on the exact same final exam earned better grades in later years.

Finally, Denning et al. go back to the original decomposition analysis, and add in GPA, and find that:

...the change due to observables (including first-year GPA) is 2.49 percentage points or 66 percent of the total change. The change explained by GPA alone is 3.57 or 95 percent of the observed change...

Summing up this study, almost all of the observed change in college completion rates is explained by the change in GPA over time, and the change in GPA over time is not related to student performance, as well as not being related to changes in student demographics, choice of major, or choice of college. This is grade inflation, writ large.

Does any of this matter? Well, as I noted earlier, it does reduce the signalling value of education. It is harder for the genuinely good students to distinguish themselves from the not-quite-as-good students, when they are all receiving the same high grades. However, Denning et al. point to some further work that might help us to understand the consequences of grade inflation better:

...future work should consider the effects of grade inflation on learning, major choice, the decision to enroll in graduate school, the skill composition of the workforce, and the college wage premium.

All of that would be welcome evidence for why we should be concerned.

[HT: Marginal Revolution, for the Denning et al. paper]

Read more:

Monday, 25 April 2022

Book review: GDP - A Brief but Affectionate History

Gross Domestic Product (GDP) is often unfairly maligned as a poor measure of societal wellbeing. However, many of those critiques miss the point that GDP was never designed to be a measure of wellbeing (as I noted in my review a couple of days ago of Measuring What Counts). However, the way that economists use GDP also displays a misunderstanding (or, perhaps, an ignorance) of the origins of GDP and the array of limitations it has, even as a measure of production or income.

Users of GDP statistics, and would-be critics of its use, would benefit greatly from reading Diane Coyle's GDP - A Brief but Affectionate History. I really enjoy Coyle's writing style (for example, see my review of her earlier book The Soulful Science), which is clear and has just the right amount of storytelling to keep what might otherwise be a dry statistical topic interesting. GDP could be an incredibly complicated topic, and on that point Coyle notes that:

Understanding GDP is a bit like a video game with increasing levels of difficulty.

This short book is pitched to an audience of interested amateur players, and she doesn't get too bogged down in statistical minutiae. The book starts by running through the history of the development of summary statistics measuring the economy, from the 18th Century, through the invention of GDP during the Depression and World War Two, and on through its use (and misuse) from the 1970s to the start of the 21st Century. I found the historical development details particularly interesting. For instance, I hadn't realised just how much of historical GDP statistics were only developed since the 1980s, with data on only a handful of developed countries available before then. As someone who did all of their economics training in the 2000s, it seems like GDP statistics have been around for a lot longer.

Coyle uses the historical development to draw our attention to the many limitations of GDP. This is where the real value of the book lies. And Coyle is under no illusion that GDP is a measure of wellbeing (or welfare):

The lesson to draw from this discussion is that GDP is not, and was never intended to be, a measure of welfare. It measures production... If the aim instead is to develop a measure of national economic welfare, we shouldn't be starting with GDP.

This isn't the only place where Coyle's views on GDP as a measure of wellbeing accord with my own. On the Great Financial Crisis, and the resulting critiques of economics, she notes:

To the chagrin of many economists, who do not recognise their own work in the attacks made on the subject, economics gets the blame for the intellectual climate of advocacy for markets that made the financial excesses possible and, more broadly, seems to have made short-term profit the arbiter of most areas of life.

Coming back to the main limitations of GDP, Coyle outlines quite clearly the problems of home production, the informal economy, the measurement of financial services, the measurement of services provided by the government at non-market prices, quality improvements over time, as well as adjustments for inflation and for purchasing power parity (when comparing across countries). Many of these issues are touched on briefly by economics textbooks. My one disappointment is that Coyle doesn't explore the feminist critique of GDP in much detail, limiting it to a single sentence noting that unpaid housework is not included in GDP, "perhaps because it has been carried out mainly by women". However, Coyle doesn't resile from expanding on the other flaws in GDP, noting that there are three main issues that suggest a different approach may be needed:

Those three issues are:

  • the complexity of the economy now, reflected in innovation, the pace of introduction of new products and services, and also in globalization and the way goods are made in complicated global production chains;
  • the increasing share of advanced economies made up of services and "intangibles", including online activities with no price, rather than physical products, which makes it impossible to separate quality and quantity or even think about quantities at all; and
  • the urgency of questions of sustainability, requiring more attention to be paid to the depletion of resources and assets, which is undermining potential future GDP growth.

 On that last point (sustainability), Coyle writes that:

A regular, official indicator of sustainability is urgently needed, however. At present, governments have nothing to tell them whether the growth their policies are delivering is coming at the expense of growth and living standards in their future.

This book was published in 2014, before the 'dashboard' approach outlined in Measuring What Counts really started to be adopted or advocated for more widely. Otherwise, it would have been interesting to hear her take on the optimal breadth of measures of wellbeing (although, admittedly, that might be beyond the brief of this book).

Overall, I really enjoyed this book, and it should be required reading for many critics of GDP as a measure, and critics of economics as a profession. GDP serves a useful purpose as a measure of production, with recognised limitations. The problem is not mainly the measure, but how it is used and interpreted, and that suggests a different approach to measuring wellbeing is needed.

Saturday, 23 April 2022

Book review: Measuring What Counts

I've read a number of books that critique economics, or set up a strawman version of economics as a target as part of a critique of capitalism or market fundamentalism - for example, Doughnut Economics (which I reviewed here) or What Money Can't Buy (which I reviewed here). The problem with those books is that the strawman argument detracts from what might otherwise be a reasoned critique of current economic structures. Fortunately, that isn't the case for Measuring What Counts, by Joseph Stiglitz, Jean-Paul Fitoussi, and Martine Durand. This is a book written by well-respected economists, so its critique of GDP as a measure of wellbeing (which is the core of the book) is worth listening to.

However, let me take a step back. The origins of this book date back to the 2008 Commission on the Measurement of Economic Performance and Social Progress, instituted by French President Nicolas Sarkozy, and chaired by Joseph Stiglitz, with Amartya Sen and Jean-Paul Fitoussi. The purpose of the so-called Stiglitz-Sen-Fitoussi Commission was to reassess the adequacy of current metrics of performance and progress, and to suggest alternative approaches. The Commission's 292-page final report was released in 2009. The OECD followed that report up by convening a High-Level Expert Group (HLEG), chaired by Stiglitz, Fitoussi, and Durand. Its final reports were released in 2018. This book is a summary of those reports, written by the three co-chairs.

The book has a common theme that runs throughout, as summarised in the Overview section:

Its central message is that what we measure affects what we do. If we measure the wrong thing, we will do the wrong thing. If we don't measure something, it becomes neglected, as if the problem didn't exist.

The critique of GDP that Stiglitz, Fitoussi, and Durand present is well-crafted, but there is little new in there. However, in brief you can think of that as being an expansion on the idea of Goodhart's Law: That when a measure becomes a target, it ceases to be a good measure. Since GDP (and economic growth as measured by growth in GDP) has become a target of governments, with high growth seen as good and low growth (or negative growth) seen as bad, GDP has ceased to be a good measure of human wellbeing or progress. The truth, of course, is that GDP was never a measure of human wellbeing or progress, but a measure of production.

Stiglitz, Fitoussi, and Durand make the case that, rather than focusing on a single measure of progress (GDP), it would be better for governments to have a 'dashboard' of indicators covering all manner of indicators across many domains of human wellbeing. One example they present is the Sustainable Development Goals. However, they caution:

...the demand for comprehensiveness of the SDGs had an adverse effect: some 17 goals, with 169 targets and 232 indicators, were eventually listed - too many to be meaningfully comprehended or to be a focus of policy.

So, one of the things we are to take from this book is that a single measure (GDP) with a single goal (economic growth) is not enough, while 232 indicators associated with 17 goals is too many. However, we never get a sense of where the 'Goldilocks zone' is for the number of measures or goals that a government should employ. I felt that was one of the let-downs of the book. The problem with the dashboard approach is that having a large number of indicators can easily lead a policymaker to justify any policy on the basis of its effect on one of the indicators, or to trumpet any policy as a success because of its effects on one of the indicators, while ignoring or downplaying its effects on other indicators. To be fair, Stiglitz, Fitoussi, and Durand are not oblivious to these critiques. However, the book lacks a level of criticality, which surprises me somewhat given the previous books I have read by Stiglitz in particular. A broader section devoted to the challenges in this approach would have been a welcome addition.

Despite the obvious critiques of the dashboard approach, it does have the potential for making the trade-offs inherent in future policy, and in the evaluation of current and past policy, more transparent (although establishing the counterfactual will always present a tricky problem for evaluation). However, that relies on policymakers, politicians, the media, and the public being able to interpret the dashboard in terms of trade-offs. This was another thing that I felt was missing from the book.

Nevertheless, the book does a good job of summarising where things stand (as at 2019) with what the authors term the 'Beyond GDP' movement. Some governments are much further along than others. As one example, which the authors brief mention, consider the New Zealand Treasury's Living Standards Framework Dashboard. If you haven't already tried this out, you should. It is an excellent source of summary information on wellbeing for New Zealand.

Finally, the book provides some pointers to examples of current best practice, as well as an outline of future research and data needs. On that last point, in particular:

...economic insecurity, inequality of opportunity, trust, and resilience - currently lack a foundation in countries' statistical system. Greater investment is needed in all these areas, as the arguable lack of more-adequate metrics contributed to inadequate policy decisions.

As a summary, the book lacks some of the detail necessary to fully understand some of the points that were made. Fortunately, there is a companion edited volume, titled For Good Measure. I've ordered that one, and look forward to bring you a review of it soon. I enjoyed this book, and it does a far better job of critiquing GDP and presenting an alternative that is worth further exploration. And it didn't require a doughnut metaphor to do so.

Wednesday, 20 April 2022

Medical doctors and antidepressant and other prescription drug use

Are medical doctors more or less likely to use antidepressants (and other prescription drugs) than the general population? I can see a couple of theoretical reasons why they might. On the supply side, medical doctors have more ready access to prescription drugs, and therefore the non-monetary costs of obtaining a prescription (from a trusted colleague) might be lower (on the other hand, perhaps there are social stigma costs, in which case perhaps the 'full cost' of obtaining prescription medications is higher than for the general population). On the demand side, medical doctors should have a better idea of when they need to obtain treatment (of course, that doesn't mean that they will necessarily seek such treatment).

It turns out that medical doctors are more likely to use antidepressants (and other prescription drugs) than the general population, and not because of either of the (relatively benign) reasons I outlined above. As this recent NBER Working Paper (alternative ungated version here) by Mark Anderson (Montana State University), Ron Diris (Leiden University), Raymond Montizaan (Maastricht University), and Daniel Rees (Universidad Carlos III de Madrid) notes:

...there is evidence that physicians disproportionately suffer from substance use disorder (SUD) and mental health problems. Ten to 15 percent of physicians will misuse alcohol or prescription drugs during their career... more than 20 percent of physicians are depressed or exhibit the symptoms of depression... and at least one third of physicians describe themselves as suffering from “job burnout”... a syndrome closely linked to SUD and depression...

However, such descriptive analysis doesn't disentangle whether becoming a medical doctor leads to substance use disorder, or whether the types of people who become medical doctors are also those who are predisposed to substance use disorder. That is essentially the question that Anderson et al. attempt to address. Specifically, they use data on over 27,000 first-time applicants to Dutch medical schools over the period from 1987 to 1999. Importantly, because medical school places were allocated randomly on a basis of a lottery, Anderson et al. can use the lottery selection as an instrument to estimate the causal impact of becoming a medical doctor on later life outcomes. They focus on prescription drug use over the period from 2006 to 2018, including total prescriptions, antidepressants, anxiolytics (anti-anxiety medication), opioids, and sedatives, as well as treatment for mental health issues. They find:

...evidence of an across-the-board increase in the use of prescription drugs, including anxiolytics, opioids, and sedatives.

Specifically, medical doctors received 22 percent more drug prescriptions than non-medical doctors over that period, were 23 percent more likely to have been prescribed antidepressants, 20 percent more likely to have been prescribed anxiolytics, 25 percent more likely to have been prescribed an opioid, and 61 percent more likely to have been prescribed a sedative. All of these differences are statistically significant. Digging a bit deeper, Anderson et al. find that the results are larger for female medical doctors, and especially for female medical doctors at the bottom of the GPA distribution.

The results for mental health treatment are more mixed. Overall, there is no statistically significant effect, but when Anderson et al. stratify their analysis they find that female medical doctors with low GPA have a 31 percent higher probability of treatment, while male medical doctors with high GPA have a 52 percent lower probability of treatment.

The question that this research doesn't really answer is, why do medical doctors receive more prescriptions? Bear in mind that this is not self-prescription - the doctors are being prescribed these drugs by other doctors. Anderson et al. don't really have an answer for this question, although they note that their results are consistent with the descriptive literature:

This pattern of results is consistent with descriptions of female physicians being at elevated risk for depression and SUD because they are being exposed to on-the-job sex-based harassment and are under added pressure to balance professional and family responsibilities...

While the results are consistent with that narrative, it isn't the only explanation. Male doctors also receive more prescriptions than the general population (just not to quite the same extent as female doctors). And the results on mental health treatment should make us a little sceptical that higher prescription drug use is necessarily picking up substance use disorder, as Anderson et al. don't really make that link explicit in the paper.

Overall, this paper gives us some definitive evidence that medical doctors use more antidepressants and other prescription drugs than the general population, but for developing policy and practice solutions (either to the underlying mental health issues that doctors face, or over-prescription of these medications if that is the issue), we'd first need to understand more about why.

[HT: Marginal Revolution]

Tuesday, 19 April 2022

The comparative lack of socio-economic diversity in economics

I've written a number of posts on the large and persistent gender gap in economics (see this post for the latest, and the links at the bottom of that post for more). There is also a gap in ethnic diversity, with minority groups under-represented in economics as well (see this post, and the report it refers to). Apparently the gaps don't end there. This new working paper by Robert Schultz (University of Michigan) and Anna Stansbury (Peterson Institute for International Economics) looks at socioeconomic diversity, i.e. diversity in terms of parents' educational attainment.

Schultz and Stansbury use data from the National Science Foundation’s Survey of Earned Doctorates (SED), which covers virtually all PhD graduates over the period from 2000 to 2018 (over 470,000 graduates, of which a bit over 10,000 were in economics). Importantly, the survey collects data on parental education, and Schultz and Stansbury use this to categorise PhD graduates into three categories:

...those with at least one parent with a graduate degree (a master’s, professional, or research doctoral degree), those with at least one parent with a bachelor’s degree (BA) but no parent with a graduate degree, and those for whom no parent has a bachelor’s degree (this group includes those with a parent who has an associate’s degree or some college, is a high school graduate, or has less than a complete high school education).

Schultz and Stansbury then compare across 14 PhD fields (while separating economics out from the rest of social sciences) in terms of the educational background of the graduates. They find that:

Among US-born PhD recipients over 2010–18, 65 percent of economics PhD recipients had at least one parent with a graduate degree, compared with 50 percent across all PhD fields (and 29 percent for the population of US-born BA recipients over the same period). At the other end of the spectrum, only 14 percent of US-born economics PhD recipients in 2010–18 were first-generation college graduates, compared with 26 percent across all PhD fields (and 44 percent among all US-born BA recipients). This makes economics the least socioeconomically diverse of any major field for US-born PhD recipients. And its socioeconomic diversity appears to have worsened over time: while economics has consistently been less socioeconomically diverse than both the other social sciences and the biological and physical sciences, since 2000 it has also diverged from mathematics and computer science, the other two least socioeconomically diverse large PhD fields.

Their Figure 3 (reproduced below) summarises the changes over time. Economics is clearly higher than other fields in the proportion of PhD graduates with at least one parent with a graduate degree, and the lowest with no parents with a BA or higher degree, and the gaps have grown over time.

Schultz and Stansbury then undertake some regression analysis, and find that:

...even controlling for race, ethnicity, gender, BA field, BA institution, and PhD institution, economics PhD recipients are around 5 percentage points more likely to have a parent with a graduate degree as compared with the average US-born PhD recipient, and 5 percentage points less likely to have no parent with a BA or higher.

The raw coefficient without controls suggests a 15 percentage point difference in parents with a graduate degree, and a 13 percentage point difference in parents without a BA or higher. So, even after controlling for race, ethnicity, gender, BA field, BA institution, and PhD institution (all of which make some difference), about one-third of the gap in socioeconomic background (as proxied by parental education) remains unexplained. The question that raises, is, why? Schultz and Stansbury suggest four possibilities:

  1. The complexity of the path to a PhD in economics, due to the need for students to meet high-level mathematics pre-requisites to get into a PhD programme in economics;
  2. Disparate access to professional relationships, due to a lack of social capital, implicit or explicit bias from faculty, or mentoring relationships that tend to develop along sociodemographic lines;
  3. Financial circumstances and incentives, wherein students from lower socioeconomic backgrounds face high opportunity costs of PhD study; or
  4. The orientation, culture, and practice of economics as a discipline.

The second and third reason seem unlikely to me, and the complexity of the path to a PhD in economics may be true, but it is unlikely to be so obscure as to derail students from that pathway. That leaves the orientation, culture, and practice of economics as a discipline, which has already been implicated in the persistent gender gap. Schultz and Stansbury note that: informative stylized fact is the correlation across different types of diversity: among US-born PhDs, the share of first-generation college graduates is strongly correlated with both the URM [under-represented minority] share and the female share across PhD fields. This is consistent with the hypothesis that some of the same factors that limit access to economics PhDs in the United States for racial and ethnic minorities or for women also limit access to economics PhDs for those from less advantaged socioeconomic backgrounds.

That suggests to me that the socioeconomic disparity isn't necessarily a separate cause for concern for the discipline. And the work that is underway to address gender and ethnic disparities in economics may well also reduce socioeconomic disparities. How quick that change happens remains an open question.

[HT: David McKenzie at the Development Impact blog]

Sunday, 17 April 2022

An N-shaped relationship between GDP and suicide?

The Kuznets Curve is the hypothesised inverted-U-shaped relationship between inequality and development (or income per capita, or GDP per capita) (see this post for more). It implies that countries at low levels of development have low inequality, countries at middle levels of development have high inequality, and countries at high levels of development have low inequality. A similar inverted-U-shape has been hypothesised for subjective wellbeing or happiness (although this has yet to be definitely demonstrated empirically - see this post for more). So, I was interested to read this 2018 article, with the title "A suicidal Kuznets curve?", by Nikolaos Antonakakis (Webster Vienna Private University) and Alan Collins (Nottingham Trent University), and published in the journal Economics Letters (ungated earlier version here).

Antonakakis and Collins used cross-country data on suicide rates and per capita GDP for 73 countries over the period from 1990 to 2010, controlling for the unemployment rate and demographic and social characteristics of the population. They found that:

We observe that, generally, the coefficients of per capita income, including squared and cubic counterparts are positive, negative and positive, respectively, across males of all ages... Yet, they are only significant for the 25–34 (at the 10% level), 35–54 (at the 5% level) and 55–74 (at the 10% level) age groups... This is suggestive of the existence of an N-shaped Suicidal Kuznets curve in the case of the aforementioned age groups of the male population... Turning to the female population results... an N-shaped SKC is identified for females in the 55–74 age group...

Here is where things turn a bit weird. Why were Antonakakis and Collins testing for an N-shaped relationship in the first place? As far as I know, there is no theoretical reason why there would be an N-shaped relationship here. For clarity, the N-shaped relationship suggests that the suicide rate is low at low levels of GDP per capita, increases with GDP per capita to a peak at some level ($7727 for males 15-24 years, or about the level of Laos or Morocco), then decreases to a trough at some level of GDP per capita ($22,726 for males 15-24 years, or about the level of Mauritius or Argentina), before increasing again. People in Macao and Luxembourg (GDP per capita of approximately $127,000 and $114,000 respectively) are in real trouble!

Aside from the general weirdness of this result from a theoretical perspective, there's problems with the control variables as well. Antonakakis and Collins use 'demographic controls', including the fertility rate and life expectancy. Fertility seems like a weird thing to control for, but it turns out that there is a correlation between fertility and suicide at the country level (see here, where the authors interpret fertility as a measure of social cohesion). However, since suicide affects life expectancy, having life expectancy as a control variable effectively gets the causality backwards. Also, having both unemployment and GDP per capita in the analysis might be problematic because of endogeneity. These latter two issues could generate some bias and weirdness in the results, and may explain the N-shaped result.

This is clearly a study that is crying out for a replication with more attention to theory and to the use of appropriate control variables.

Saturday, 16 April 2022

More on increasing support for tertiary education students

Chloe Swarbrick wrote what is essentially a follow-up to her earlier column on support for tertiary education students, in the New Zealand Herald earlier this week:

Across the political divide, it seems we can all agree that education is critical for the wellbeing and productivity of our country.

The problem is, we have very differing views on whether someone should have to carry a lifetime of debt or suffer immense poverty, for their right to learn...

Let's have a look at some of the things that have changed in the last 30 years, then.

Student debt didn't exist before the 1990s. Before I was born, the cost of access to tertiary education was a nominal fee – akin to, say, the administration costs for a passport – that nobody need take out a loan for.

By 2004, average domestic student fees in Aotearoa across tertiary education institutions were NZ$2367. By 2019, they had increased 81 per cent, to an average of $4294 per equivalent full-time earner.

In 1992, the year after student loans were introduced following the let-rip of free-market competition, the average borrowed per year per student was $3628; by 2019, it had increased by 172 per cent, to $9867.

In 1999, 62,748 students received the Student Allowance. In 2021, despite our population growing by a million, 61,068 – yes, fewer than in 1999 – received the Student Allowance.

In 1999, the average amount of Student Allowance received per eligible student was $4420. In 2021, it was $6641. Adjusted for inflation, which would bring the 1999 value to approximately $8265, the minority of students receiving the allowance in 2021 are $1600 worse off than their counterparts 20 years ago.

We all know costs haven't gone down in the meantime.

Swarbrick's column then turns to the lack of political power of students, due to voluntary student association membership. She won't get any argument from me on that point. Students should definitely belong to a union, to represent their interests. They arguably have even less power in dealing with the education institutions than employees have in dealing with employers.

However, there is a broader point about the costs of education, and it relates back to the point I made in this earlier post (which refers to Swarbrick's earlier column). Things have changed, even since I did my degrees as a 'mature student' in the 2000s. The cost of living is higher in real terms (so, not just because of inflation), and middle class parents cannot afford to cover the full cost of sending their child through tertiary education. So, we end up in a situation where tertiary students are saddled with debt accrued simply from covering the costs of living week-to-week. That fewer receive the student allowance (both in absolute terms, and proportional to the number of students) contributes to this.

We have created a system that essentially forces students to work while they study. That sounds pretty benign, but I'm not talking about a part-time job on weekends for beer money. We literally have students trying to fit in study around an essentially full-time job, because that is what it takes for them to pay their bills when they are ineligible for government support, or where even the student loan system doesn't provide enough to pay the bills (on top of generating a mountain of debt).

As I've noted before, work has seriously negative impacts on students' performance at university, and especially large impacts on students who are working full-time. How can we seriously expect to develop the human capital of our future workforce, when we put such barriers in the way? As I noted in my recent review of the excellent book by Goldin and Katz, The Race between Education and Technology:

...there are significant financial barriers that not only stop students from enrolling in university, but also prevent those who do enrol from succeeding to their full potential and maximising their education gains...

My earlier post on this point suggested that the government needs to think about how we support tertiary students. I suggests, as a starting point, we could consider paying them all an amount equal to the full-time minimum wage during term time. The cost of that would be a potentially-unsupportable (politically) $7.6 billion per year. Current student allowances amount to $680 million per year, and the majority of students are ineligible.

To focus solely on the cost of such a system is to miss the substantial benefits that tertiary education brings. Some of those benefits are private (higher future earnings), and that is the rationale for having students pay the vast majority of the full costs (including opportunity costs) of their education. However, there are substantial social benefits of tertiary education as well (which I touched on in my earlier post). Students therefore should not have to pay all of the costs of their education, because they don't capture all of the benefits. I don't think we have the balance right, but perhaps it is time to try and quantify it (and suggests a future research project).

Swarbrick's focus on students' bargaining power (through student unions) may not be misplaced. Students need to stand up and demand a better deal. But the rest of us, who also benefit from a more-educated population, need to support them as well.

Read more:

Friday, 15 April 2022

The state of the art in happiness economics, and future directions

I've written a number of posts about happiness economics, which essentially involves the study of subjective wellbeing or life satisfaction. Not everyone is happy about the measurement of life satisfaction, but Andrew Clark (Paris School of Economics is). Back in 2018, he wrote this article, published in The Review of Income and Wealth (open access), which summarises the state of research in happiness economics, and proposes some future directions for this research.

Clark's summary of the last forty years of happiness economics research is difficult to succinctly summarise (since, as a summary itself, it extends to around 16 pages), but in short it covers: (1) What makes people happy, or what are the factors that are correlated with subjective wellbeing?; (2) What do happy people do, or what are the impacts of higher (or lower) subjective wellbeing?; and (3) What else can we do with subjective wellbeing data? If you're looking at understanding the current state of the research literature (as of 2018) on happiness economics, then this would be an excellent starting point.

However, Clark then looks forward, anticipating future directions for happiness economics research. Clark first laments the lack of diversity in the datasets that are used, with most research based on data from Australia (HILDA), Germany (SOEP), or the UK (BHPS). I think he over-states the issue here, as there is significant cross-country research using the Gallup World Poll, the World Values Survey, as well as research based on the US General Social Survey and other similar surveys in other countries.

Second, Clark highlights that:

It is undoubtedly true that we care about average wellbeing in a society, but we probably care about its distribution too: for given average satisfaction, we would prefer the variance of well-being to be lower, as this would imply fewer people with low well-being (and our social-welfare function may put more weight on those in misery than on those with high subjective wellbeing). There are very few contributions in this sense.

That definitely remains true, and it would be interesting, as one example, to know whether there is a happiness Kuznets Curve (see also this post). It would also be interesting to better understand the relationship (if any) between happiness inequality and income inequality (that is my interest, rather than anything Clark noted in his paper). Clark also raises better understanding quantile effects - that is, different relationships between subjective wellbeing and other variables at different points in the happiness distribution.

Third, Clark notes that:

Research has also been concentrated on the adult determinants of adult subjective well-being. There are at least two possible extensions here. One is to consider the distal (childhood and family) correlates of adult well-being... The other is to consider childhood well-being as an outcome in its own right.

These are interesting questions, with very real potential for real-world impact. Given the rise of a focus on wellbeing, particularly by governments across the more developed countries (more on that in a future post), the fact that we know little about subjective wellbeing in childhood, and how childhood circumstances (including subjective wellbeing) affect subjective wellbeing in adulthood, seems like an important research gap to fill. However, such research ideally would rely on long-term data, and careful research design to establish causal relationships. These are the next two points that Clark makes (although in the case of causality, he focuses on exogenous changes, but that is not the only research design that can extract causal estimates).

Clark then highlights research on brain activity and its links with subjective wellbeing, and the role of genetics. Neuroeconomics is growing in influence, but the role of genetics is broadly underexplored. Finally, Clark notes that we still don't really know the 'best' way to measure subjective wellbeing. Unfortunately, despite decades of research, we haven't really nailed the measurement problem (and that has been the source of many of the arguments against this field of research).

There is a lot of exciting potential in happiness economics, and I look forward to seeing what continues to come out of this field.

Wednesday, 13 April 2022

Alcohol impairment in the lab vs. in the bar

Getting people drunk and seeing what they do is an interesting strand of research (not nearly as interesting as getting crayfish drunk and seeing what they do, though!). Most experiments involving drunk people are conducted in controlled laboratory settings (for example, see here or here). The benefit of the lab environment is that the particular experimental treatment that participants are being exposed to (which could be their level of intoxication) can be cleanly controlled, and the experimental task they are completing can be stripped to its essentials. However, the external validity of lab experiments is always going to be in question - do those choices extend to the real world, where things are messier? When we're talking about the effects of alcohol, messy might be the best way to describe things.

The question of external validity of lab experiments is essentially the focus of this new working paper by Iain Long, Kent Matthews, and Vaseekaran Sivarajasingam (all Cardiff University). They run a series of experiments at a student bar in Cardiff, and then run the same experiments again with the same participants a week later, in the lab. Specifically, the experiments use Raven's Progressive Matrices (RPM) to test research participants' cognitive ability (technically, they are a test of fluid intelligence, but the difference need not concern us here). 

Long et al. argue that the difference in outcomes of the RPM test between the two observations (in the bar, when participants are intoxicated; and in the lab, when participants are not intoxicated) represents a combination of alcohol impairment of the research participants and the impact of the bar environment. They further note that previous lab experiments have failed to demonstrate a statistically significant impact on RPM performance, so if there is any impact observed in their experiment, then the bar environment must be the source. Long et al.'s reasoning is attractive, but I'm not sure it is correct (more on that in a moment).

Based on their sample of 106 research participants, they find that when they restrict their analysis only to the sessions in the bar, that breath alcohol concentration (BAC) has no significant impact on performance in the RPM test. However, when they analysis across both sessions (bar and lab), they find that:

...our results now appear highly significant and robust. Relative to being sober in the control environment, the average participant (whose BAC is 0.36) gets one fewer question correct when they have been drinking in the bar.

That's one fewer question correct, out of ten (when the average score in the lab was 7.8, and the average score in the bar was 6.6). The effect is quite sizeable, and because it only appears when they analyse the results comparing [bar+intoxicated] with [lab+not intoxicated] and not when they compare [bar+intoxicated] with [bar+slightly less intoxicated], Long et al. conclude that it shows:

...early evidence in favour of the hypothesis put forward by lab experiments that suggest that intoxication alone cannot explain the changes in behaviour that are commonly observed when people consume alcohol.

That's fair enough. They've shown that more intoxicated people perform essentially the same as less intoxicated people in the bar setting. They've shown that people perform worse in the bar setting than in the lab setting. However, there is a key missing piece of the puzzle here. They haven't shown that more intoxicated people perform essentially the same as less intoxicated people in the lab setting. Instead, they rely on other studies that show no difference, but those other studies were conducted in different settings (and, remember, external validity may be an issue).

Long et al. do posit some features of the bar environment that might explain the results:

Over-crowding, sexual competition... high temperatures... inaccessible bar and toilet facilities... noise levels... and competitive games... are all thought to contribute.

Possibly. They can't test any of those mechanisms. However, let me posit one more problem for their analysis. Their interpretation assumes that the bar environment treatment has the same linear impact for all research participants. However, it is possible that the bar environment exerts a larger negative effect on performance for less intoxicated people than more intoxicated people. Less intoxicated people may be more negatively affected by the noise, or the distractions, of the bar environment. More intoxicated people are already less focused than less intoxicated people. If that were the case, then the difference that Long et al. estimate would overstate the impact of the bar environment, because the bar environment narrows the difference in performance between more intoxicated and less intoxicated research participants. Again, they could have teased this out if they had a lab treatment at different levels of intoxication.

The working paper then goes on to look at impacts on over-confidence, finding that participants were not more overconfident when they were more intoxicated within the bar setting, but comparing the bar setting with the lab, participants in the bar were more over-confident. The same caveats apply to this analysis at with that based on performance in the RPM test.

Lab experiments suffer from perceived problems of external validity. Running so-called 'lab-in-the-field' experiments like this one are a solution to that problem. However, what this particular experiment demonstrates is that when the environment cannot be controlled, the results can get a bit messy.

[HT: Steve Tucker]

Read more:

Tuesday, 12 April 2022

The kids might as well have eaten that marshmallow

You've probably heard or read about the famous 'marshmallow experiment'. The experimenter leaves a young child alone in a room with a marshmallow, promising that if the child doesn't eat the marshmallow, they will get two marshmallows when the experimenter returns (there are various variations on this test, of course). This 'delayed gratification' test is supposedly predictive of a whole range of later life outcomes, presumably because children who show greater self-control and patience grow into adults who are less impulsive and better at planning, and tend to have various other positive traits associated with good outcomes.

Well, it turns out that the marshmallow test on its own may not be as predictive as originally thought. In this 2020 article published in the Journal of Economic Behavior and Organization (ungated version here), Daniel Benjamin (University of Southern California) and co-authors collected new data on mid-life outcomes for the children who were participants in the original 'Bing' experiment at Stanford in the 1970s. Specifically, they:

...revisit 113 individuals from the original Bing cohort, roughly 45 years after they participated in the original experiments. Within this sample, we examine associations between measures of self-regulation based on multiple assessments during the first four decades of life (including preschool delay) and a comprehensive array of mid-life measures of capital formation. In addition, we also study preschool waiting time on its own as a predictor of mid-life capital formation.

Benjamin et al. used various measures of capital formation in mid-life, including net wealth, permanent income (household income per adult), wealth-income ratio, high interest-rate debt (annual amount of interest paid over 6 percent on any debt), credit card misuse (including having been declined for a credit card, carrying credit card debt from month-to-month, and missing credit card payments), a survey measure of delayed choice, savings rate (as a proportion of income), self-assessed financial health, educational attainment, an index of forward-looking behaviours, and social status (measured on a ten-point scale from lowest status to highest). They pre-registered their analysis, which is probably a good thing given the amount of possibilities here. They focus on two explanatory variables: (1) a rank-normalised measure of self-regulation (RNSRI) based on measures taken at ages 17, 27, and 37, along with the marshmallow test in pre-school; and (2) a rank-normalised measure of delayed gratification from the marshmallow test alone. [*] Based on the first measure, Benjamin et al. find that:

Of the 11 capital formation measures, 10 are associated with the predicted sign. Six variables are significant at our FDR threshold of 0.1 (and the same six have p-values < 0.05): net worth, credit card misuse, financial health, forward-looking behaviors, educational attainment, and permanent income.

Things get interesting when they look at the second measure:

Of the 11 capital formation measures, 6 are positively correlated with RND, but none are significantly associated at our FDR threshold of 0.1 (and all have p-values > 0.05).

Taking those two results together (the broader index explains at least some capital formation variables at mid-life, but the marshmallow test does not), they posit three potential mechanisms:

1. The index of self-regulatory measures is comprised of 86 responses per participant, whereas the preschool delay of gratification task is a single behavioral task. An index of similar measures tends to have a higher signal-to-noise ratio than its components.

2. The preschool delay of gratification task is measured using a diagnostic variant of the task for 34 of our 113 participants; the remaining 79 participants experienced a non-diagnostic variant of the pre-school delay of gratification task. Pooling across diagnostic and non-diagnostic conditions weakens the correlation with outcome variables.

3. The index of self-regulatory measures is comprised of questions that are measured throughout the life course up to age 37 (specifically, ages 4, 17, 27, and 37), whereas the preschool delay of gratification task is measured at age 4. Self- regulation measured closer in time to the observed outcomes will be more strongly related to them.

In their additional analyses, they find support for the first mechanism, some suggestive evidence for the second, and no support for the third. In other words, the marshmallow test isn't very predictive of mid-life outcomes (in terms of capital formation) because it has a low signal-to-noise ratio, as one might expect of a single test conducted once with a pre-schooler. Of course, all of this is just correlations (and always has been, and the researchers are clear about this point), so we shouldn't read too much into it. And perhaps parents can stop stressing about whether their children ate the marshmallows in the test.

Finally, in a sad coda to the Benjamin et al. article, Walter Mischel, the originator of the Bing marshmallow experiments and a co-author on this article, passed away while it was going through the publication process.

[HT: UCLA Anderson Review, via Marginal Revolution]


[*] The rank-normalisation process struck me as rather odd at first, but Benjamin et al. argue that it is necessary to deal with small-sample issues (they only have 113 participants), and in the supplementary material online they present an analysis of the effects of the rank-normalisation process, which seems to suggest that it doesn't bias the results. Still, it is a bit of an oddity to me.

Monday, 11 April 2022

Online elite chess and cognitive performance during the pandemic

Does remote working increase productivity, or decrease productivity? The pandemic forced a lot of workers into remote working, so perhaps this natural experiment can give us some idea of the impacts of remote working. Do we gain more from avoiding commuting time, greater flexibility over work time and workspace, and fewer interruptions from colleagues, than we lose from reduced interaction, supervision and structure (in addition to whatever other effects might happen in either direction)? Despite the hype, the results so far are far from clear, especially in terms of what types of jobs or work improve in a remote setting.

An interesting new article by Steffen Künn, Christian Seel (both Maastricht University), and Dainis Zegners (Rotterdam School of Management), published in The Economic Journal (open access) provides a contribution towards answering those questions. Künn et al. look at the impact of the shift to online of elite chess tournaments. Specifically:

Our data consist of games from the World Rapid Chess Championships 2018–2019, played offline in Saint Petersburg and Moscow, and from the Magnus Carlsen Chess Tour and its sequel, the Champions Chess Tour, both played online from April to November 2020 on the internet chess platform the majority of players (20 out of 28) in the online tournaments also competed in at least one of the World Rapid Chess Championships in the years 2018–2019, enabling us to make within-player comparisons of performance for each of these 20 players.

Künn et al. measure the performance of each chess player for every move in every one of those tournaments (with a few exceptions, and excluding the first fifteen moves for each player in each game), relative to one of the top chess engines. As they explain:

To estimate the effect of playing online on chess players’ performance, we evaluate each move in each game in our sample using the chess engine Stockfish 11... 

For a given position in game g before individual move mig, the chess engine computes an evaluation of the position in terms of the pawn metric Pigm... The numerical value of the pawn metric indicates the size of the advantage from the perspective of player i, with one unit indicating an advantage that is comparable to being one pawn up...

Künn et al. use this evaluation to generate a measure of 'raw error', being the difference in the pawn metric between the player's choice of move, and the 'optimal' move as determined by Stockfish. They then compare this raw error between play in online tournaments and play in face-to-face tournaments, for the same players. They find that:

...playing online leads to a reduction in the quality of moves. The error variable... is, on average, 1.7 units larger when playing online than when playing identical moves in an offline setting. This corresponds to a 1.7% increase of the measure... or an approximately 7.5% increase in the RawError... The effect is statistically significant at the 5% level.

The effect is quite sizeable:

Playing online increases the error variable, on average, by 1.7 units, which corresponds to a loss of 130 points of Elo rating.

In reading the paper, my first thought was that the results would be contaminated by the psychological effects of the pandemic. Fear or anxiety could easily lead to suboptimal performance, and cause the observed increase in error, rather than reduced performance in the online format per se. However, Künn et al. anticipate this in their robustness checks, noting that: mitigate concerns that results are related to the pandemic, we add a control variable to the regression model to capture the severity of regulations implemented in a player’s home country during the tournament times... Although the aggregate online dummy reduces in size and significance (p-value of 0.172), presumably because lockdowns occurred only during the online tournaments, the effect pattern on the separate tournament dummies remains almost identical relative to the main results...

That doesn't quite allay my concerns, for two reasons. First, it assumes that all players react similarly to the local pandemic context, since it assumes all experience the average effect on their performance. That average effect is statistically insignificant. Second, including the pandemic variable renders the impact of online play statistically insignificant. Part of the problem is that the pandemic is happening at the same time as the switch to online play (for obvious reasons). Clearly, the natural experiment is not sufficient to disentangle the effects of online play from the effects of the pandemic. That really limits what we can learn from this study.

Finally, and interestingly, the negative effects (if we accept that there are some) decrease over time. As Künn et al. note:

...the negative effect of playing online on the quality of moves is strongest for the first (and second) online tournament. Thus, the adverse effect of playing online on the quality of moves decreases over time, possibly because players adapt to the remote online setting...

Perhaps the players have adapted to the online setting, or perhaps they have adapted to the pandemic, or perhaps the pandemic is becoming less severe over time. Given that we can't disentangle the effects of pandemic or online setting, we can't really tell.

I'm not trying to pick on this study, which uses an interesting setting to try and estimate the impact of remote work, in a case where performance can be measured reasonably accurately and consistently. In theory, that should provide as clean a measure of impact as we can find. However, once you recognise the problem in this study, it is easy to see why it would be even more difficult to use the pandemic natural experiment where the data on performance are not as clear.

Saturday, 9 April 2022

Book review: The Race between Education and Technology

In a post last August, I promised a review of The Race between Education and Technology, by Claudia Goldin and Lawrence Katz. After a pandemic-induced delivery delay, and clearing a few other books off my must-read-this-soon list, I've finally finished the book. The easiest way to describe this book is that it is a monograph - essentially, a book-length version of a journal article, with all of the technical detail (and more). It is not really a book written for a general audience. However, while I probably just made it sound negative, that is actually a good thing. Goldin and Katz take the time to fully develop most of their arguments, delving deeply into the data on US education from the 19th Century through until the early 21st Century.

The book's main thrust is an explanation of the changes in inequality in the US over the course of the 20th Century. It is neatly summarised as:

...technological change, education, and inequality... are intricately related in a kind of "race". During the first three-quarters of the twentieth century, the rising supply of educated workers outstripped the increased demand caused by technological advances. Higher real incomes were accompanied by lower inequality. But during the last two decades of the century the reverse was the case and there was sharply rising inequality. Put another way, in the first half of the century, education raced ahead of technology, but later in the century, technology raced ahead of educational gains... The skill bias of technology did not change much across the century, nor did its rate of change. Rather, the sharp rise in inequality was largely due to an educational slowdown.

In the first part of the book, Goldin and Katz review the data on inequality, and then show that skills-biased technological change did not change much over the course of the 20th Century. They then exhaustively review the data on educational change in the US from the 19th Century, through the 'High School movement', and into the later 20th Century where university and college education became the norm for most young people. I learned a lot about the development of the education system in the US, especially the private and public divide in education, both at high school and then at university level. Throughout most of the period, the US maintained a lead in average years of education among its citizenry, compared with other developed countries.

However, as noted in the final chapter of the book, the US has more recently lost its educational advantage, not only in terms of the quantity of education, but also in terms of its quality. Other countries have similar, if not greater, proportion of young people attaining university degrees, and the US is lagging in important measures of educational quality such as the PISA tests of high school reading, mathematics, and science literacy. In contrast with the rest of the book, this section was somewhat underdeveloped. But to be fair, it would require another book to really do the topic justice. Goldin and Katz note that:

Two factors appear to be holding back the educational attainment of many American youth... The first is the lack of college readiness of youth who drop out of high school and of the substantial numbers who obtain a high school diploma but remain academically unprepared for college... The second is the financial access to higher education for those who are college ready.

Although those statements aren't backed by the same depth of analysis and evidentiary support that the rest of the book exhibits, I found them to accord with my own views of the situation in New Zealand as well. Although our education system differs in important ways from the US system, the problems appear to be similar. High schools are not fully preparing students for university education, and there are significant financial barriers that not only stop students from enrolling in university, but also prevent those who do enrol from succeeding to their full potential and maximising their education gains (see my earlier post on this point). The underlying reasons for these problems are not explored in as much detail as they could (and should) be, but Goldin and Katz briefly outline a policy prescription:

The first policy is to create greater access to quality pre-school education for children from disadvantaged families. The second is to rekindle some of the virtues of American education and improve the operation of K-12 schooling so that more kids graduate from high school and are ready for college. The third is to make financial aid sufficiently generous and transparent so that those who are college ready can complete a four-year college degree or gain marketable skills at a community college.

Given the depth of the rest of the book, the policy prescription seems somewhat superficial and underwhelming to me. It would have been nice to have seen how the data supported those proposed policies, or at least a more detailed and robust case made for them.

Nevertheless, despite the final chapter, this is an excellent book, well-written and definitely an exemplar for the comprehensive treatment of historical data, with a strong underlying theoretical model. However, it is worth noting that their more recent update on the research (which I blogged about here), suggests that the theoretical model does not do as good a job of explaining the rise in income inequality in the US in the period from 2000 to 2017. Given that the recent article presents more recent data, for anyone interested in the topic, that article is a better place to start. However, for those wanting to go deeper into the data and the model, this book provides the detail.

Friday, 8 April 2022

Deterring cheating in online assessment requires more than cheap talk

Cheating is a serious problem in online assessments. The move to more online teaching, learning, and assessment has made it all the more apparent that teachers need appropriate strategies and tools to deal with cheating. However, once those tools are in place, they will only deter students if students know that there are cheating detection tools, and if students believe that they will be caught. How do teachers get students to believe they will be caught?

That is essentially the question that is addressed in this new article by Daniel Dench (Georgia Institute of Technology) and Theodore Joyce (City University of New York), published in the Journal of Economic Behavior and Organization (ungated earlier version here). Dench and Joyce run an experiment at a large public university in the US, to see if cheating could be deterred. As they explain:

The setting is a large public university in which undergraduates have to complete a learning module to develop their facility with Microsoft Excel. The software requires that students download a file, build a specific spreadsheet, and upload the file back into the software. The software grades and annotates their errors. Students can correct their mistakes and resubmit the assignment two more times. Students have to complete between 3 to 4 projects over the course of the semester depending on the course. Unbeknownst to the students, the software embeds an identifying code into the spreadsheet. If students use another student’s spreadsheet, but upload it under their name, the software will indicate to the instructor that the spreadsheet has been copied and identify both the lender and user of the plagiarized spreadsheet. Even if a student copies just part of another student’s spreadsheet, the software will flag the spreadsheet as not the student’s own work.

Focusing on four courses (one in finance, one in management, and two in accounting) that required multiple projects, Dench and Joyce randomised students into two groups (A and B):

One week before the first assignment was due, we sent an email to Group A reminding students to submit their own work and that the software could detect any work they copied from another spreadsheet. The email further stated that those caught cheating on the first assignment would be put on a watch list for subsequent assignments. Further violations of academic integrity would involve their course instructor for further disciplinary action. Group B received the same email one week before the second assignment. All students flagged for cheating in either of the two assignments were sent an email informing them that they were currently on a watch list for the rest of the semester’s assignments.

Dench and Joyce then test the effect of information about the software being able to detect cheating, and then they test the effect of students being sanctioned after having cheated on one (or both) of the first two assignments. They found that:

...warning students about the software’s ability to detect cheating has a practically small and statistically insignificant effect on cheating rates. Flagging cheaters, however, and putting them at risk for sanctions lowers cheating by approximately 75 percent.

The results were similar for the finance and management courses, but much smaller for the accounting courses, where the extent of cheating was much lower (interestingly), and where the first projects were due after the sanctions had occurred in the finance and management courses (and so, there may have been spillover effects).

Overall, these results suggest that simply telling students that you can detect cheating and that they will be caught is not effective. Students see it as 'cheap talk' and not credible. Students need to credibly believe that they will be caught, and that there will be consequences. One way to make this credible is to actually catch them cheating and call them out on it. And if this happens in one class, it appears that it can spill over to other classes. And to other semesters, as Dench and Joyce note in an epilogue to the article:

In Finance and Management the rate of cheating after the spring of 2019 was 80 to 90 percent lower than levels reported in the experiment. Given the complete lack of an effect of warnings in the experiment, we suspect that subsequent warnings were viewed as more credible based on the experience of students in the spring of 2019.

Student cheating in assessments is a challenging problem to deal with. If we want to deter students from cheating, we have to catch them, tell them they have been caught, and make them suffer some consequences. Only then will the statements that we make in order to deter students from cheating be credible deterrents.

[HT: Steve Tucker]

Read more:

Thursday, 7 April 2022

What landlords see as important when they set rents

 There is a famous quote, attributed to economics Nobel laureate Ronald Coase, that reads “If you torture the data long enough, it will confess to anything”. Unfortunately, based on my experience this week, that doesn’t appear to be the case. I’ve spent two full days playing with data from a survey a student of mine collected from landlords (members of the NZ Property Investors Federation) back in 2018. The goal was to derive some insights into the factors that landlords see as important when they set rents, and whether those that place a greater importance on tenant attributes are more likely to set rents that are below-market. Unfortunately, I’ve concluded that the data tell us nothing of substance. So, with that in mind, and no prospect of generating a compelling research article from the data, I’ve decided to dump the few interesting bits into this blog post instead.

The genesis of this research was this 2016 post I raised the possibility that landlords offer ‘efficiency rents’:

There are good tenants and bad tenants, and it is difficult for landlords to regulate tenants' behaviour after they have signed the rental agreement. Given this is moral hazard and efficiency wages is one way to deal with moral hazard in labour markets, is there a rental market equivalent of efficiency wages?

First, some context. In ECON100 and ECON110, we discuss moral hazard and agency problems. One such problem is where employees' incentives (after they have signed their employment agreement) are not aligned with those of the employer. The employer wants their employees to work hard, but working hard is costly for the employee so they prefer to shirk. One potential solution to this is efficiency wages (I've previously discussed efficiency wages here). With efficiency wages, employers offer wages that are higher than the equilibrium wage, knowing that this will encourage higher productivity and lower absenteeism from their workers. This is because if workers don't work hard (and avoid absenteeism), they may lose their jobs and have to find a job somewhere else at a much lower rate.

Which brings me to landlords and efficiency rents. As noted above, there is a moral hazard problem for landlords - tenants' incentives (to look after the property) are not aligned with the landlord's incentive (to keep the property in top condition). If the landlord instead offered an efficiency rent (a rent below the equilibrium market rent), then they would have many potential tenants applying for the property, allowing the landlord to pick the best (the least likely to damage the property). It also gives the tenants an incentive to look after the property after signing the tenancy agreement, because if they don't they get evicted and have to find another place to live at a much higher cost.

Maybe landlords offer efficiency rents already and we just don't realise it? 

That’s what we set out to test in 2018. We engaged the NZPIF, and they agreed to support the survey by sending it to their members. We don’t know how big the membership base it (possibly in the thousands), but we had 104 responses to the survey, and 93 of them gave us enough data to be useable for analysis. The median landlord had five properties, and the range was one property to 120 properties.

Do landlords offer below-market rents? Some clearly do (or at least they say that they do). We asked separately about existing tenancies and new tenancies, and 37 out of 93 told us they offer below-market rent to existing tenancies, while 14 out of 93 told us they offer below-market rent to new tenancies. So far, kind of interesting. There were 24 landlords who said that they offered below-market rent to existing tenancies, while also saying that they offered market rent of above-market rent to new tenancies. I took those as indicative of efficiency rents, reasoning that landlords have less imperfect information about existing tenants than new tenants, and so landlords would opt to offer lower rents as they don’t want to lose ‘good’ tenants (this approach has a theoretical basis too – see here).

Unfortunately, it turns out that my measure of efficiency rents is completely unrelated statistically to anything else we know from the survey. Large and small landlords, whether they use property managers, whether they engage in regular rent reviews, the location of the property, etc. are not correlated with my measure. Essentially all I can conclude from that is that whether a landlord offers below-market rent or not is based on unobserved characteristics of the tenant or the property. I guess that is the point of efficiency rents – we don’t observe the quality of the tenant, but the landlord will have discovered some information about tenant quality that we don’t observe. Still, that is pretty unsatisfying as it leaves the survey approach somewhat worthless.

We also asked landlords about what factors (of a total of 21 factors) were important in their rent-setting decisions. We asked these questions in two ways. First, we asked about setting rents ‘on average’ for their properties. We later asked them about a single property, selected a random (the randomisation mechanism here was quite cute – we asked them about the property that is located on a street starting with the letter that is closest to the first letter of their surname [*]), at the last time the property’s rent was set or reviewed. There aren’t systematic differences in the rankings between the two ways we asked (which may again point to idiosyncratic differences in rent setting related to unobserved characteristics of tenants or properties), so I’ll focus on the ‘on average’ results.

We asked the landlords to rate each factor on a five-point scale. Some rated all or most factors important, and others rated all or most factors unimportant, so I standardised the ratings within each landlord, to a measure with a mean of zero and a standard deviation equal to one. Summarising the results for the 93 landlords overall, we get this:

The bars represent how important (on average) each of the factors is. A positive number represents more important on average, and a negative number represents less important on average. The colours of the bars group the factors into different categories (profitability factors; cost factors; local demand factors; property factors; and tenant factors). Overall, on average it appears that the most important factors are the level of rents in surrounding areas, and local demand for rental property (no surprises there). After that, property factors (number of bedrooms, and location and amenities) are important. Least important is local demand from property buyers. That makes sense too. Potential capital gains don’t appear to matter, and property management costs are less important as well (probably because only 43 of 93 landlords used a property manager).

However, the importance of these factors doesn’t appear to differ much based on landlord characteristics (at least, not in a way that makes sense). And the importance ratings are not related to my measure of efficiency rents.

All up, this research didn’t tell us much (at least, not to an extent that makes it publishable other than in a blog post!). That is somewhat disappointing, because there isn’t a large literature on this, and most that exists is theoretical rather than empirical. A better approach for further research might be to look at matched tenant-landlord data, but it’s not clear that such data exists (tenancy bond data is available for New Zealand, but I’m unsure how much data on landlords is captured, or how much data on tenants). I’ll leave that for future work, if I have the energy and inclination (or a motivated student) to work on it again.


[*] This isn’t a perfect means of randomisation, of course. However, I reasoned that approach was better than asking about the property that they last conducted a rent review for (which seems an obvious choice for randomisation). That would be problematic, since the frequency of rent reviews may differ between good and bad tenants, and therefore we would be more likely to receive data on a low-quality tenant or low-quality property. Our approach avoided that problem.